Datadog Reviews and Pricing

Ajay Thomas

Engineering Manager at Dbt labs

Sep 30, 2024

Download

Great features and synthetic testing but pricing can get expensive

Pros and Cons

"We have been impressed with the uptime and clean and light resource usage of the agents."

"I like the idea of monitoring on the go yet it seems the options are still a bit limited out of the box."

What is our primary use case?

Our primary use case is custom and vendor supplied web application log aggregation, performance tracing and alerting. We run a mix of AWS EC2, Azure serverless, and colocated VMWare servers to support higher education web applications.

Managing a hybrid multi-cloud solution across hundreds of applications is always a challenge. Datadog agents on each web host, and native integrations with GitHub, AWS, and Azure gets all of our instrumentation and error data in one place for easy analysis and monitoring.

How has it helped my organization?

Through use of Datadog across all of our apps we were able to consolidate a number of alerting and error tracking apps and Datadog ties them all together in cohesive dashboards.

The breadth of coverage for any app type or situation is really incredible. It feels like there's nothing we can't monitor.

What is most valuable?

When it comes to Datadog, several features have proven particularly valuable. The centralized pipeline tracking and error logging provides a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly.

Synthetic testing has been a game-changer, allowing us to catch potential problems before they impact real users. Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most. And the ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders.

Together, these features form a powerful toolkit that helps us maintain high performance and reliability across our applications and infrastructure, ultimately leading to better user satisfaction and more efficient operations.

What needs improvement?

I'd like to see an expansion of the Android and IOS apps to have a simplified CI/CD pipeline history view. I like the idea of monitoring on the go yet it seems the options are still a bit limited out of the box. While the documentation is very good considering all the frameworks and technology Datadog covers, there are areas - specifically .NET Profiling and Tracing of IIS-hosted apps - that need a lot of focus to pick up on the key details needed. In some cases the screenshots don't match the text as updates are made. I spent longer than I should figuring out how to correlate logs to traces, mostly related to environmental variables.

Buyer's Guide

Datadog

July 2026

Free Report: Datadog Reviews and More

Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: July 2026.

DOWNLOAD NOW

903,257 professionals have used our research since 2012.

For how long have I used the solution?

I've used the solution for about three years.

What do I think about the stability of the solution?

We have been impressed with the uptime and clean and light resource usage of the agents.

What do I think about the scalability of the solution?

The solution is very scalable, very customizable.

How are customer service and support?

Service is always helpful in tuning our committed costs and alerting us when we start spending outside the on-demand budget.

Which solution did I use previously and why did I switch?

We used a mix of a custom error email system, SolarWinds, UptimeRobot, and GitHub actions. We switched to find one platform that could give deep app visibility regardless of whether it is Linux or Windows or Container, cloud or on-prem hosted.

How was the initial setup?

The setup was generally simple. However, .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.

What about the implementation team?

We implemented the solution in-house.

What was our ROI?

I'd count our ROI as significant time saved by the development team assessing bugs and performance issues.

What's my experience with pricing, setup cost, and licensing?

Set up live trials to asses cost scaling. Small decisions around how monitors are used can have big impacts on cost scaling.

Which other solutions did I evaluate?

NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.

What other advice do I have?

I'm excited to dig further into the new offerings around LLM and continue to grow our footprint in Datadog.

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Kenneth Dozier

Associate Software Engineer at H&R Block, Inc.

Sep 30, 2024

Download

Easy to use with good speed and helpful dashboards

Pros and Cons

"Watchdog is a favorite feature among a lot of the devs. It catches things they didn't even know were an issue."

"I would like to see the integration between PagerDuty and Datadog improved. The tags in Datadog don't match those in PagerDuty, and we have to make it work."

What is our primary use case?

We are using Datadog to improve our cloud monitoring and observability across our enterprise apps. We have integrated a lot of different resources into Datadog, like Kubernetes, App Gateways, App Service Environments, App Service Plans, and other Web App resources.

I will be using the monitoring and observability features of Datadog. Dashboards are used very heavily by teams and SREs. We really have seen that Datadog has already improved both our monitoring and our observability.

How has it helped my organization?

The ease and speed of which you can create a dashboard has been a huge improvement.

The different types of monitors we can create have been huge, too. We can do so many different things with monitors that we couldn't do before with our alerts.

Being able to click on a trace or log and drill down on it to see what happened has been great.

Some have found the learning curve a bit steep. That said,they are coming around slowly. There is just a lot of information to learn how to navigate.

What is most valuable?

The different types of monitors have been very valuable. We have been able to make our alerts (monitors) more actionable than we were able to previously.

Watchdog is a favorite feature among a lot of the devs. It catches things they didn't even know were an issue.

RUM is another feature a lot of us are looking forward to seeing how it can help us improve our customer experience during tax season.

We hope to enable the code review feature at some point to so we can see what code caused the issue.

What needs improvement?

I would like to see the integration between PagerDuty and Datadog improved. The tags in Datadog don't match those in PagerDuty, and we have to make it work. Also, I would like to see if the ability to replicate a KQL query in Datadog is made easier or better.

I would like to see the alert communications to email or phones made better so we could hopefully move off PagerDuty and just use Datadog for that.

There are also a lot of features that we haven't budgeted for yet and I would like for us to be able to use them in the future.

For how long have I used the solution?

I've used the solution for about two years.

Which deployment model are you using for this solution?

Hybrid Cloud

Disclosure: My company has a business relationship with this vendor other than being a customer. H&R Block has recently signed with DataDog.

Buyer's Guide

Datadog

July 2026

Free Report: Datadog Reviews and More

Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: July 2026.

DOWNLOAD NOW

903,257 professionals have used our research since 2012.

reviewer3796153

Software Engineer 2 at Modernizing Medicine

Oct 2, 2024

Download

Intuitive user interface with good log management and a helpful Log Explorer feature

Pros and Cons

"The ease of use allowed me to get up to speed with log management since it's my first time using Datadog."

"Interactive tutorials could be a game changer."

What is our primary use case?

In our fast-paced environment, managing and analyzing log data and performance metrics is crucial. That’s where Datadog comes in. We rely on it not just for monitoring but for deeper insights into our systems, and here’s how we make the most of it.

One of the first things we appreciate about Datadog is its ability to centralize logs from various sources—think applications, servers, and cloud services. This means we can access everything from one dashboard, which saves us a lot of time and hassle. Instead of digging through multiple platforms, we have all our log data in one place, making it much easier to track events and troubleshoot issues.

How has it helped my organization?

Before Datadog, we faced the common challenge of fragmented data. Our logs, metrics, and traces were spread across different tools and platforms, making it difficult to get a complete picture of our system’s health.

With Datadog, we now have a centralized monitoring solution that aggregates everything in one place. This has streamlined our workflow immensely. Whether it’s logs from our servers, metrics from our applications, or traces from user transactions, we can access all this information easily. This unified view has made it simpler for our teams to identify and troubleshoot issues quickly.

What is most valuable?

In my experience with Datadog, one feature stands out above the rest is the Log Explorer. It has completely transformed the way I interact with our log data and has become an essential part of my daily workflow.

The user interface is incredibly intuitive. When I first started using it, I was amazed at how easy it was to navigate. The design is clean and straightforward, allowing me to focus on the data rather than getting lost in complicated menus. Whether I’m searching for specific log entries or filtering by certain criteria, everything feels seamless.

This ease of use allowed me to get up to speed with log management since it's my first time using Datadog.

What needs improvement?

Interactive tutorials could be a game changer. Instead of just reading about how to use query filters, users could engage with step-by-step guides that walk them through the process. For example, a tutorial could start with a simple query and gradually introduce more complex filtering techniques, allowing users to practice along the way. These tutorials could include pop-up tips and hints that provide additional context or best practices as users work through examples. This hands-on approach not only reinforces learning but also builds confidence in using the tool.

For how long have I used the solution?

My company has recently made Datadog available to it's software engineers and I personally have been using it for almost a year now.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Jason Karuza

Engineering Manager at Paystand

Oct 1, 2024

Download

Great dashboards, lots of integrations, and heps trace data between components

Pros and Cons

"The most valuable aspects of the solution include log search to help triage specific problems that we get notified about (whether by alerts we have configured or users that have contacted us)."

"In some ways, the tool has a pretty steep learning curve. Discovering the various capabilities available, then learning how to utilize them for particular use cases can be challenging."

What is our primary use case?

We use the product for instrumentation, observability, monitoring, and alerting of our system.

We have multiple environments and a variety of pieces of infrastructure including servers, databases, load balancers, cache, etc. and we need to be able to monitor all of these pieces, while also retaining visibility into how the various pieces interact with each other.

Tracing data between components and user interactions that trigger these data flows is particularly important for understanding where problems arise and how to resolve them quickly.

How has it helped my organization?

It provides a lot of options for integrations and tooling to observe what is happening within the system, making diagnosis and triage easier/faster.

Each user can set up their own dashboards and share them with other users on the team. We can instrument monitors based on various patterns that we care about, then notify us when an event triggers an alert with platforms such as Slack or PagerDuty.

Our ability to rapidly become aware of problems focused on the symptoms being observed and entry points into the tool to rapidly identify where to investigate further is important for our team and our users.

What is most valuable?

The most valuable aspects of the solution include log search to help triage specific problems that we get notified about (whether by alerts we have configured or users that have contacted us), APM traces (to view how user interactions trace through the various layers of our infrastructure and services to be able to reproduce and identify the source of problems), general performance/system dashboards (to regularly monitor for stability or deviation), and alerting (to be automatically informed when a problem occurs). We also use the incident tools for tracking production incidents.

What needs improvement?

In some ways, the tool has a pretty steep learning curve. Discovering the various capabilities available, then learning how to utilize them for particular use cases can be challenging. Thankfully, there is a good amount of documentation with some good examples (more are always welcome), and support is very helpful.

While DataDog has started adding more correlation mapping between services and parts of our system, it is still tricky to understand what is the ultimate root cause when multiple views/components spike. Additionally, there are lots of views and insights that are available but hard to find or discover. Some of the best ways to discover is to just click around a lot and get familiar with views that are useful, but that takes time and isn't ideal when in the middle of fighting a fire.

For how long have I used the solution?

I've used the solution for about four years.

What do I think about the stability of the solution?

It seems stable.

What do I think about the scalability of the solution?

It seems to scale well. Performance for aggregating or searching is usually very fast.

How are customer service and support?

Technical support is helpful and pretty responsive.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We did not use a different solution.

What was our ROI?

It's hard to say what ROI would be as I have not managed our system without it to compare to.

What's my experience with pricing, setup cost, and licensing?

I don't manage licensing.

Which other solutions did I evaluate?

We did not evaluate other options.

What other advice do I have?

It's a great tool with new features and improvements continuously being added. It is not simple to use or set up, however, if you have the right personnel, you can get a lot of value from what DataDog has to offer.

Which deployment model are you using for this solution?

Public Cloud

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Traci Ortiz

Works at Koddi

Oct 1, 2024

Download

Improved response time and cost-efficiency with good monitoring

Pros and Cons

"The server monitoring, service monitoring, and user session monitoring are extremely helpful, as they allow us to be alerted ahead of time of issues that users might experience."

"I would also like to see an improvement in the server's data extraction times, as sometimes it can take up to ten minutes to download a report for a critical issue that is costing us money."

What is our primary use case?

We monitor our multiple platforms using Datadog and post alerts to Slack to notify us of server and end-user issues. We also monitor user sessions to help troubleshoot an issue being reported.

We monitor 3.5 platforms on our Datadog instance, and the team always monitors the trends and Dashboards we set up. We have two instances to span the 3.5 platforms and are currently looking to implement more platform monitoring over time. The user session monitoring is consistent for one of these platforms.

How has it helped my organization?

Datadog has improved our response time and cost-efficiency in bug reporting and server maintenance. We're able to track our servers more fluidly, allowing us to expand our outreach and decrease response time.

There are many different ways that Datadog is used, and we monitor three and a half platforms on the Datadog environment at this time. By monitoring all of these platforms in one easy-to-use instance, we're able to track the platform with the issue, the issue itself, and its impact on the end user.

What is most valuable?

The server monitoring, service monitoring, and user session monitoring are extremely helpful, as they allow us to be alerted ahead of time of issues that users might experience. More often than not, an issue is not only able to be identified, but solved and released before an end user notices an issue.

We are currently using this as an investigative tool to notice trends, identify issues, and locate areas of our program that we can improve upon that haven't been identified as pain points yet. This is another effective use case.

What needs improvement?

I would like to see a longer retention time of user sessions, even if by 24 to 48 hours, or even just having the option to be configurable. By doing this, we're enabled to store user sessions that have remained invisible for a long time, and identify issues that people are working around.

I would also like to see an improvement in the server's data extraction times, as sometimes it can take up to ten minutes to download a report for a critical issue that is costing us money. Regardless, I am very happy with Datadog and love the uses we have for the program so far.

For how long have I used the solution?

I've used the solution for more than four years.

Which solution did I use previously and why did I switch?

We did not previously use a different solution.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Ravel Leite

Head of DevOps at Traveltek Ltd.

Sep 30, 2024

Download

Proactive, provides user trends, and works harmoniously

Pros and Cons

"Each component complements the other, creating a cohesive system where data, logs, and metrics are seamlessly integrated."

"Datadog is too pricey when compared to its competitors, and this is something that its always on my mind during the decision-making process."

What is our primary use case?

From day one, we have seamlessly integrated our new product into Datadog, a comprehensive monitoring and analytics platform. By doing so, we are continuously collecting essential data such as host information, system logs, and key performance metrics. This enables us to gain deep insights into product adoption, monitor usage patterns, and ensure optimal performance. Additionally, we use Datadog to capture and analyze errors in real-time, allowing us to troubleshoot, replay, and resolve production issues efficiently.

How has it helped my organization?

It has proven invaluable in helping us identify early issues within the product as soon as they occur, allowing us to take immediate action before they escalate into more significant problems. This proactive approach ensures that potential challenges are addressed in real-time, minimizing any impact on users. Furthermore, the system allows us to measure product adoption and usage trends effectively, providing insights into how customers are interacting with the product and identifying areas for improvement or enhancement.

What is most valuable?

There isn't any single aspect that stands out in particular; rather, everything is interconnected and works together harmoniously. Each component complements the other, creating a cohesive system where data, logs, and metrics are seamlessly integrated. This interconnectedness ensures that no part operates in isolation, allowing for a more holistic view of the product's performance and health. The way everything binds together strengthens our ability to monitor, analyze, and improve the product efficiently.

What needs improvement?

At the moment, nothing specific comes to mind. Everything seems to be functioning well, and there are no immediate concerns or issues that I can think of.

The system is operating as expected, and any challenges we've faced so far have been successfully addressed. If anything does come up in the future, we will continue to monitor and assess it accordingly, but right now, there’s nothing that stands out requiring attention or improvement.

Datadog is too pricey when compared to its competitors, and this is something that its always on my mind during the decision-making process.

For how long have I used the solution?

I've used the solution for nearly two years now.

Which deployment model are you using for this solution?

Public Cloud

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Lin Qui

Works at Berkeley Research Group, LLC

Sep 30, 2024

Download

Excellent APM, RUM and dashboards

Pros and Cons

"The pricing model makes more sense than what we paid for against other competitors."

"Logging is not a great experience."

What is our primary use case?

We use the solution for APM, anomaly detection, resource metrics, RUM, and synthetics.

We use it to build baseline metrics for our apps before we start focusing in on performance improvements. A lot of times that’s looking at methods that take too long to run and diving into db queries and parsing.

I’ve used it in multiple configurations in aws and azure. I’ve built it using terraform and hand rolled.

I’ve used it predominantly with Ruby and Node and a little bit of Python.

How has it helped my organization?

The solution provides deep insights into our stack. It gives us the ability to measure and monitor before making decisions.

We're using it to make informed decisions about performance. Being able to show how across a timeline we increased performance from a release via a visual indication of p50+ metrics is almost magical.

Another way we use it is for leading indicators of issues that might be happening. So for example, anomaly detection on gauge metrics across the app and having synthetics build in with alerting configurations are both ways we can get alerted sometimes even before a big issue is about to happen.

What is most valuable?

The most valuable aspects include APM, RUM and dashboards.

I think of Datadog as an analytics company first. And that the integrations around notifications and alerts as a part of insight discoverability.

Everything Datadog offers for me is around knowledge building and how much do I know about the deep details of my stack.

The pricing model makes more sense than what we paid for against other competitors. I was at one job where we used two competing services because DD didn’t have BAA for APM. And then when it offered it, we immediately dumped the other solution for Datadog.

What needs improvement?

Logging is not a great experience. Searching for specific logs and then navigating around the context of the results is slow and cumbersome. Honestly that is my only gripe for Datadog. It’s a wonderful product outside of log searching. I have had better experience using other services that aggregate logs for search.

My use case for it is around discoverability. Log search is fine if I’m just looking for something specific. That said, if it’s something else targeted and I am wandering around looking for possible issues, it’s really unintuitive.

For how long have I used the solution?

I've used the solution for more than eight years.

What do I think about the stability of the solution?

Very stable.

What about the implementation team?

We always implement the solution in-house.

Which deployment model are you using for this solution?

Private Cloud

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

reviewer1974104

Software Engineering Manager at Finalsite

Sep 23, 2024

Download

Centralized pipeline with synthetic testing and a customized dashboard

Pros and Cons

"The ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders."

"I spent longer than I should have figuring out how to correlate logs to traces, mostly related to environmental variables."

What is our primary use case?

Our primary use case is custom and vendor-supplied web application log aggregation, performance tracing and alerting.

We run a mix of AWS EC2, Azure serverless, and colocated VMWare servers to support higher education web applications. Managing a hybrid multi-cloud solution across hundreds of applications is always a challenge.

Datadog agents on each web host, and native integrations with GitHub, AWS, and Azure gets all of our instrumentation and error data in one place for easy analysis and monitoring.

How has it helped my organization?

Through the use of Datadog across all of our apps, we were able to consolidate a number of alerting and error-tracking apps, and Datadog ties them all together in cohesive dashboards.

Whether the app is vendor-supplied or we built it ourselves, the depth of tracing, profiling, and hooking into logs is all obtainable and tunable. Both legacy .NET Framework and Windows Event Viewer and cutting-edge .NET Core with streaming logs all work. The breadth of coverage for any app type or situation is really incredible. It feels like there's nothing we can't monitor.

What is most valuable?

Centralized pipeline tracking and error logging provide a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly.

The ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders.

These features form a powerful toolkit that helps us maintain high performance and reliability across our applications and infrastructure, ultimately leading to better user satisfaction and more efficient operations.

What needs improvement?

I'd like to see an expansion of the Android and IOS apps to have a simplified CI/CD pipeline history view.

I like the idea of monitoring on the go, yet it seems the options are still a bit limited out of the box. While the documentation is very good considering all the frameworks and technology Datadog covers, there are areas - specifically .NET Profiling and Tracing of IIS-hosted apps - that need a lot of focus to pick up on the key details needed.

In some cases the screenshots don't match the text as updates are made. I spent longer than I should have figuring out how to correlate logs to traces, mostly related to environmental variables.

For how long have I used the solution?

I've used the solution for about three years.

What do I think about the stability of the solution?

We have been impressed with the uptime and clean and light resource usage of the agents.

What do I think about the scalability of the solution?

The solution has been very scalable and customizable.

How are customer service and support?

Sales service is always helpful in tuning our committed costs and alerting us when we start spending outside the on-demand budget.

Which solution did I use previously and why did I switch?

How was the initial setup?

Generally simple, but .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.

What about the implementation team?

We implemented the solution in-house.

What was our ROI?

I'd count our ROI as significant time saved by the development team assessing bugs and performance issues.

What's my experience with pricing, setup cost, and licensing?

Set up live trials to asses cost scaling. Small decisions around how monitors are used can have big impacts on cost scaling.

Which other solutions did I evaluate?

NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.

What other advice do I have?

Excited to dig further into the new offerings around LLM and continue to grow our footprint in Datadog.

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Software Engineer at a computer software company with 201-500 employees

Sep 23, 2024

Download

Very good custom metrics, dashboards, and alerts

Pros and Cons

"The dashboards provide a comprehensive and visually intuitive way to monitor all our key data points in real-time, making it easier to spot trends and potential issues."

"One key improvement we would like to see in a future Datadog release is the inclusion of certain metrics that are currently unavailable. Specifically, the ability to monitor CPU and memory utilization of AWS-managed Airflow workers, schedulers, and web servers would be highly beneficial for our organization."

What is our primary use case?

Our primary use case for Datadog involves utilizing its dashboards, monitors, and alerts to monitor several key components of our infrastructure.

We track the performance of AWS-managed Airflow pipelines, focusing on metrics like data freshness, data volume, pipeline success rates, and overall performance.

In addition, we monitor Looker dashboard performance to ensure data is processed efficiently. Database performance is also closely tracked, allowing us to address any potential issues proactively. This setup provides comprehensive observability and ensures that our systems operate smoothly.

How has it helped my organization?

Datadog has significantly improved our organization by providing a centralized platform to monitor all our key metrics across various systems. This unified observability has streamlined our ability to oversee infrastructure, applications, and databases from a single location.

Furthermore, the ability to set custom alerts has been invaluable, allowing us to receive real-time notifications when any system degradation occurs. This proactive monitoring has enhanced our ability to respond swiftly to issues, reducing downtime and improving overall system reliability. As a result, Datadog has contributed to increased operational efficiency and minimized potential risks to our services.

What is most valuable?

The most valuable features we’ve found in Datadog are its custom metrics, dashboards, and alerts. The ability to create custom metrics allows us to track specific performance indicators that are critical to our operations, giving us greater control and insights into system behavior.

The dashboards provide a comprehensive and visually intuitive way to monitor all our key data points in real-time, making it easier to spot trends and potential issues. Additionally, the alerting system ensures we are promptly notified of any system anomalies or degradations, enabling us to take immediate action to prevent downtime.

Beyond the product features, Datadog’s customer support has been incredibly timely and helpful, resolving any issues quickly and ensuring minimal disruption to our workflow. This combination of features and support has made Datadog an essential tool in our environment.

What needs improvement?

One key improvement we would like to see in a future Datadog release is the inclusion of certain metrics that are currently unavailable. Specifically, the ability to monitor CPU and memory utilization of AWS-managed Airflow workers, schedulers, and web servers would be highly beneficial for our organization. These metrics are critical for understanding the performance and resource usage of our Airflow infrastructure, and having them directly in Datadog would provide a more comprehensive view of our system’s health. This would enable us to diagnose issues faster, optimize resource allocation, and improve overall system performance. Including these metrics in Datadog would greatly enhance its utility for teams working with AWS-managed Airflow.

For how long have I used the solution?

I've used the solution for four months.

What do I think about the stability of the solution?

The stability of Datadog has been excellent. We have not encountered any significant issues so far.

The platform performs reliably, and we have experienced minimal disruptions or downtime. This stability has been crucial for maintaining consistent monitoring and ensuring that our observability needs are met without interruption.

What do I think about the scalability of the solution?

Datadog is generally scalable, allowing us to handle and display thousands of custom metrics efficiently. However, we’ve encountered some limitations in the table visualization view, particularly when working with around 10,000 data points. In those cases, the search functionality doesn’t always return all valid results, which can hinder detailed analysis.

How are customer service and support?

Datadog's customer support plays a crucial role in easing the initial setup process. Their team is proactive in assisting with metric configuration, providing valuable examples, and helping us navigate the setup challenges effectively. This support significantly mitigates the complexity of the initial setup.

Which solution did I use previously and why did I switch?

We used New Relic before.

How was the initial setup?

The initial setup of Datadog can be somewhat complex, primarily due to the learning curve associated with configuring each metric field correctly for optimal data visualization. It often requires careful attention to detail and a good understanding of each option to achieve the desired graphs and insights

What about the implementation team?

We implemented the solution in-house.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

reviewer2553732

Staff Full-Stack Engineer at OMERS

Sep 30, 2024

Download

Prompt support with good logging and helps with standardization

Pros and Cons

"The initial setup was straightforward from my own experience, helping integrate within the application and service levels."

"In production, we intend to use trace IDs generated by RUM to attach to support tickets when a user experiences a traceable network error, and we want to display this trace ID to the user so if they were to contact us about a specific issue, they can provide us an exact ID displayed to them back to us. Currently, this is not possible out-of-the-box client-side without inventing our own solution for capturing these trace IDs, such as shimming the native fetch or returning the ID from the service response."

What is our primary use case?

Internally our primary usage of Datadog pertains around APM/tracing, logging, RUM (real user monitoring), synthetic testing of service/application health and state, overall general monitoring + observability, and custom dashboards for aggregate observability. We also are more frequently leveraging the more recent service catalog feature.

We have several microservices, several databases, and a few web applications (both external and internal facing), and all of these within our systems are contained within several environments ranging from dev, sit, eat, and production.

How has it helped my organization?

Datadog has had a massive impact on our department. Before, we had loose logging dumped into a sea of GCP logs with haphazard custom solutions for traceability between logs and network calls. Datadog has helped standardize and normalize our processes around observability while providing fantastic tools for aggregating insight around what is monitored regularly, all wrapped in an easy-to-use UI.

Additionally, a range of types of users exist within our department, each with its own positive impact on Datadog. DevOps leverages it to easily manage infra, developers leverage it to easily monitor/debug services and applications, and business leverages it for statistics.

What is most valuable?

Personally I've found the RUM (real user monitoring) to be above and beyond what I've worked with before. Client-side monitoring has always been on the short end of the stick but the information collected and ease of instrumentation provided by Datadog is second to none.

Having a live dynamic service map is also one of my favourite features; it provides real-time insights into which services/applications are connected to which.

We are also investigating the new API catalog feature set, which I believe will provide a high-value impact for real-time documentation and information about all of our shared microservices that other dev teams can use.

What needs improvement?

In production, we intend to use trace IDs generated by RUM to attach to support tickets when a user experiences a traceable network error, and we want to display this trace ID to the user so if they were to contact us about a specific issue, they can provide us an exact ID displayed to them back to us. Currently, this is not possible out-of-the-box client-side without inventing our own solution for capturing these trace IDs, such as shimming the native fetch or returning the ID from the service response.

For how long have I used the solution?

I've used the solution for approximately two years across our department and around a year or so of it being used practically and fully integrated into our systems.

What do I think about the stability of the solution?

Aside from one very brief bad update from the Datadog team around RUM where they broke the native 'fetch' for node in an update to RUM (which was resolved quickly) as it used to -- and may still -- modified the global 'fetch'; Datadog as a whole solution has been highly stable.

What do I think about the scalability of the solution?

It's easy to implement and scale provided a there's a solid IaC solution in place to integrate across your system.

How are customer service and support?

The Datadog support team is prompt and helpful when tickets have been submitted from our end. When their support team have been unsure, they've properly reached out internally to the relevant SME to help answer any questions we've had prior.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

I've personally dabbled with some other open-source observability and monitoring solutions; however, prior to Datadog, our department did not have any solutions other than log dumps to GCP.

How was the initial setup?

The initial setup was straightforward from my own experience, helping integrate within the application and service levels; however, our DevOps team handled most of the infra process with minimal complaints.

What about the implementation team?

We handled the solution in-house.

What's my experience with pricing, setup cost, and licensing?

I personally am not involved in the decision around costing; however, I am aware that when we first set up Datadog, we explicitly configured our services/applications to have a master switch to enable Datadog integration so that we can dynamically enable/disable targeted environments as need due to the costs being associated on a per service basis for APM/logging/etc.

Which other solutions did I evaluate?

I was not involved in the decision-making regarding the evaluation of other options.

What other advice do I have?

I highly recommend Datadog, and I would explore it for my own individual projects in the future, provided the cost is within reason. Otherwise, I would highly recommend it for any medium-to-large-sized org.