Try our new research platform with insights from 80,000+ expert users
reviewer2003202 - PeerSpot reviewer
Architect at a comms service provider with 10,001+ employees
Real User
Good for monitoring and following metrics with a helpful flame graph
Pros and Cons
  • "Flame graphs are pretty useful for understanding how GraphQL resolves our federated queries when it comes to identifying slow points in our requests. In our microservice environment with 170 services."
  • "I often have issues with the UI in my browser."

What is our primary use case?

We use the solution primarily for distributed tracing, service insight and observability, metrics, and monitoring. We create custom metrics from outbound service calls to trace the availability of back-office systems. 

We use the flame graph to get insights into our GraphQL implementation. It helps highlight how resolvers work. 

However, it's lacking in tracing which GraphQL queries are run, and we use custom spans for that.

How has it helped my organization?

Prior, the team only had Instana, and few people used it. The main barriers to entry were the access (since it was not integrated into our SSO) and the user experience, which made it hard to follow. We had an on-prem version, and it wasn't the snappiest. The APM has made observability and tracing more accessible to developers.

What is most valuable?

Flame graphs are pretty useful for understanding how GraphQL resolves our federated queries when it comes to identifying slow points in our requests. In our microservice environment with 170 services. There are complex transactions over the course of a single user request since we essentially operate as a middle layer with 90 back office systems we integrate to.

What needs improvement?

I often have issues with the UI in my browser. I tend to have a lot of tabs open, yet have issues with it not responding or not showing data. A couple of times, pasting the URL into an incognito window shows the data that's there.

Buyer's Guide
Datadog
May 2025
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: May 2025.
853,118 professionals have used our research since 2012.

For how long have I used the solution?

I've used the solution for two years. 

How was the initial setup?

The initial setup was complex and required a bit of tweaking to get everything configured correctly and into our pipelines.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.

PeerSpot user
Rich text editor
    Site Reliability Engineer at a computer software company with 201-500 employees
    Real User
    They have a good ecosystem for their integrations
    Pros and Cons
    • "Their interface is probably one of the easiest things to use because it lets non-developers and non-engineers quickly get access to metrics and pull business value out of them. We could put together dashboards and give it to people who are non-technical, then they can see the state of the world."
    • "We have been able to set very specific CPU and memory alerts, at the very base level, then we started to pull real business value, like 99th percentile response rates for our API calls."
    • "It has turned into an operational dashboard. If you felt something is going wrong, you can immediately open up Datadog. It has been our go to application because we know the answer will be there."
    • "The way data is represented can be limiting. When I first tried it out a long time ago, you could graph a metric and another metric, and they'd overlay, but you couldn't take the ratio between the two."
    • "When I started using it years ago, it had stability problems. I remember, specifically, we ran everything in Docker containers. There were some problems getting it into a Docker container with very specific memory limits."

    What is our primary use case?

    We use it for custom metrics of our applications and monitoring of our systems.

    How has it helped my organization?

    My current company didn't have very good monitoring in the past. We had been using basic CPU monitoring. We have been able to set very specific CPU and memory alerts, at the very base level, then we started to pull real business value, like 99th percentile response rates for our API calls. 

    It has turned into an operational dashboard. If you felt something is going wrong, you can immediately open up Datadog. It has been our go to application because we know the answer will be there.

    What is most valuable?

    Their interface is probably one of the easiest things to use because it lets non-developers and non-engineers quickly get access to metrics and pull business value out of them. We could put together dashboards and give it to people who are non-technical, then they can see the state of the world. 

    They have a very good ecosystem for their integrations. They have a lot of different integrations, and we use a lot of them. We have integrations with Amazon for ECS, RDS, and all of the subsystems of Amazon. We also have Docker and Splunk integrations. The integrations are great because they're definitely vetted and not third-party integrations. They're part of the Datadog ecosystem and seamless.

    What needs improvement?

    The way data is represented can be limiting. They have added their own little query language that you can use to manipulate things, so you can graph and relate two different metrics together. This is relatively new this year. When I first tried it out a long time ago, you could graph a metric and another metric, and they'd overlay, but you couldn't take the ratio between the two. However, it looks like this is the direction that they're going, and that's a good direction. I think they should continue adding things that way.

    I like being able to put the formulas in myself. I don't want the average. I want a rolling average over three minutes, not five minutes. They're getting better at letting the user customize this.

    For how long have I used the solution?

    Three to five years.

    What do I think about the stability of the solution?

    When I started using it years ago, it had stability problems. I remember, specifically, we ran everything in Docker containers. There were some problems getting it into a Docker container with very specific memory limits. We couldn't nail down exactly what the limits and the application needed. Once we did that, we were good. However, it was tricky to get the limit in the first place.

    What do I think about the scalability of the solution?

    It has always scaled for us. Cost scales up too, but that is not necessarily a bad thing. It's reasonable for what they're providing. I haven't had any concerns about scaling.

    We use between a 100 to 500 servers at any given point in time.

    How is customer service and technical support?

    For the most part, the technical support is pretty good. Every now and again, you will get stuck with a support rep who could have better training, but in general, they are very good and responsive. They're willing to talk about new features, etc.

    How was the initial setup?

    The integration and configuration processes have been very smooth because everything is very well-documented. The documentation is phenomenal. 

    What was our ROI?

    We can see trends a lot easier than if we didn't have the solution. The management can see the changes which are being made, whether it being performance or in the number of hosts that went down. We recently made internal improvements to some of our internal APIs, so we reduced the number of servers that we needed. So, you could see that the load on the system went down and the number of servers went down. Thus, it was easy to visualize.

    What's my experience with pricing, setup cost, and licensing?

    Pricing and licensing are reasonable for what they give you. You get the first five hosts free, which is fun to play around with. Then it's about four dollars a month per host, which is very affordable for what you get out of it. We have a lot of hosts that we put a lot of custom metrics into, and every host gives you an allowance for the number of custom metrics. We have not had a problem with it.

    Which other solutions did I evaluate?

    My company now is pretty good at looking at alternatives. Also, I evaluated alternative solutions at my last company. 

    There are some other competitors. For example, I know one of them started doing metrics and their licensing is very cheap because the metric size is very small and it's per megabyte. They charge you per storage, and it's very small. However, the interface and integrations aren't there. and there are some other competitors, 

    The other thing is granularity. Datadog gives you one second granularity for a year. Whereas, some of the competitors would roll up, so after about a week you don't have one second, you have five seconds. Then, after a month, you don't have five seconds, you have a minute. So, you start to lose the granularity, whether it be that it averages it or maxes it, you start to lose the ability to see incidents historically, which is super valuable. If we have an incident, which we think we've seen this before, and want to look back historically, we can zoom right in and see in the database where it peaked.

    What other advice do I have?

    Give Datadog a try. It's the leader in this space. 

    I have only used the AWS version of the product.

    They have a thing for the color purple, but it is all good.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.

    PeerSpot user
    Rich text editor
      Buyer's Guide
      Datadog
      May 2025
      Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: May 2025.
      853,118 professionals have used our research since 2012.
      Victor Chen1 - PeerSpot reviewer
      Software Engineer at Zip
      Real User
      Top 20
      Good for log ingestion and analyzing logs with easy searchability of data
      Pros and Cons
      • "The feature I've found most valuable is the log search feature."
      • "More helpful log search keywords/tips would be helpful in improving Datadog's log dashboard."

      What is our primary use case?

      We use Datadog as our main log ingestion source, and Datadog is one of the first places we go to for analyzing logs. 

      This is especially true for cases of debugging, monitoring, and alerting on errors and incidents, as we use traffic logs from K8s, Amazon Web Services, and many other services at our company to Datadog. In addition, many products and teams at our company have dashboards for monitoring statistics (sometimes based on these logs directly, other times we set queries for these metrics) to alert us if there are any errors or health issues.

      How has it helped my organization?

      Overall, at my company, Datadog has made it easy to search for and look up logs at an impressively quick search rate over a large amount of logs. 

      It seamlessly allows you to set up monitoring and alerting directly from log queries which is convenient and helps for a good user experience, and while there is a bit of a learning curve, given enough time a majority of my company now uses Datadog as the first place to check when there are errors or bugs. 

      However, the cost aspect of Datadog is tricky to gauge because it's related to usage, and thus, it is hard to tell the relative value of Datadog year to year.

      What is most valuable?

      The feature I've found most valuable is the log search feature. It's set up with our ingestion to be a quick one-stop shop, is reliable and quick, and seamlessly integrates into building custom monitors and alerts based on log volume and timeframes. 

      As a result, it's easy to leverage this to triage bugs and errors, since we can pinpoint the logs around the time that they occur and get metadata/context around the issue. This is the main feature that I use the most in my workflow with Datadog to help debug and triage issues.

      What needs improvement?

      More helpful log search keywords/tips would be helpful in improving Datadog's log dashboard. I recently struggled a lot to parse text from raw line logs that didn't seem to match directly with facets. There should be smart searching capabilities. However, it's not intuitive to learn how to leverage them, and instead had to resort to a Python script to do some simple regex parsing (I was trying to parse "file:folder/*/*" from the logs and yet didn't seem to be able to do this in Datadog, maybe I'm just not familiar enough with the logs but didn't seem to easily find resources on how to do this either). 

      For how long have I used the solution?

      I've used the solution for 10 months.

      What's my experience with pricing, setup cost, and licensing?

      Beware that the cost will fluctuate (and it often only gets more expensive very quickly).

      Disclosure: I am a real user, and this review is based on my own experience and opinions.
      Flag as inappropriate

      PeerSpot user
      Rich text editor
        reviewer2561892 - PeerSpot reviewer
        Principal. Performance Engineering at Invitation Homes
        User
        A go-to tool for analyzing, understanding, and investigating application performance
        Pros and Cons
        • "Log analytics give us a powerful mechanism for error tracking, research, and analysis."
        • "Network device and performance monitoring could be improved, as we've faced some limitations in this area."

        What is our primary use case?

        The soluton is used for full stack enterprise performance monitoring for our primarily cloud-based stack on AWS. We have implemented monitoring coverage using RUM for critical apps and websites and utilize APM (integrated with RUM) for full stack traceability.  

        We use Datadog as our primary log repository for all apps and platforms, and the advanced log analytics enable accurate log-based monitoring/alerting and investigations. 

        Additionally, we some advanced RUM capabilities and metrics to track and optimize client-side user experience. We track SLO's for our critical apps and platforms using Datadog.

        How has it helped my organization?

        We now have full-stack observability, which allows us to better understand application behavior, quickly alert users about issues, and proactively manage application performance.  

        We've seen value by implementing observability coordinated across multiple applications, allowing us to track things like customer shopping and orders across multiple applications and services.  

        For critical application launches, we've built dashboards that can track user activity and confirm users are able to successfully utilize new features, tracking user activities in real-time in a war-room situation.  

        Datadog is our go-to tool for analyzing, understanding, and investigating application performance and behavior.

        What is most valuable?

        APM accurately tracks our service performance across our ecosystem. RUM gives us client-side performance and user experience visibility, and the rate of new features implemented in the Digital Experience area recently has been high. Log analytics give us a powerful mechanism for error tracking, research, and analysis.  

        Custom metrics that we've created allow us to track KPIs in real-time on dashboards. All of these have proven valuable in our organization.  Additionally, Datadog product support teams are responsive and have provided timely support when needed.

        What needs improvement?

        Agent remote configuration should be provided/improved and streamlined, allowing for config changes/upgrades to be performed via the portal instead of at the host.   

        Cost tracking via the admin portal is a bit lacking, even though it has gotten better.  I'm looking for usage trends (that drive cost) across time and better visibility or notifications about on-demand charges.  

        Network device and performance monitoring could be improved, as we've faced some limitations in this area.  

        The Datadog usage-based cost model, while giving us better transparency, is difficult to follow at times and is constantly evolving.  

        For how long have I used the solution?

        I've used the solution for three years.

        How are customer service and support?

        Support has been responsive and helpful.  

        How would you rate customer service and support?

        Positive

        What's my experience with pricing, setup cost, and licensing?

        Pricing is straightforward. That said, it's sometimes difficult to estimate usage volumes.

        Which other solutions did I evaluate?

        We evaluated Datadog and New Relic in detail and chose Datadog due to their straightforward and competitive pricing model, and their full coverage of monitoring features that we desired, and an easy-to-use UI.  

        Which deployment model are you using for this solution?

        Public Cloud
        Disclosure: I am a real user, and this review is based on my own experience and opinions.
        Flag as inappropriate

        PeerSpot user
        Rich text editor
          SecOps Engineer at Ava Labs
          User
          Helpful support, with centralized pipeline tracking and error logging
          Pros and Cons
          • "Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most."
          • "While the documentation is very good, there are areas that need a lot of focus to pick up on the key details."

          What is our primary use case?

          Our primary use case is custom and vendor-supplied web application log aggregation, performance tracing and alerting. 

          How has it helped my organization?

          Through the use of Datadog across all of our apps, we were able to consolidate a number of alerting and error-tracking apps, and Datadog ties them all together in cohesive dashboards. 

          What is most valuable?

          The centralized pipeline tracking and error logging provide a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly. 

          Synthetic testing is great, allowing us to catch potential problems before they impact real users. Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most. And the ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders. 

          What needs improvement?

          While the documentation is very good, there are areas that need a lot of focus to pick up on the key details. In some cases the screenshots don't match the text when updates are made. 

          I spent longer than I should trying to figure out how to correlate logs to traces, mostly related to environmental variables.

          For how long have I used the solution?

          I've used the solution for about three years.

          What do I think about the stability of the solution?

          We have been impressed with the uptime.

          What do I think about the scalability of the solution?

          It's scalable and customizable. 

          How are customer service and support?

          Support is helpful. They help us tune our committed costs and alert us when we start spending out of the on-demand budget.

          Which solution did I use previously and why did I switch?

          We used a mix of SolarWinds, UptimeRobot, and GitHub actions. We switched to find one platform that could give deep app visibility.

          How was the initial setup?

          Setup is generally simple. .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.

          What about the implementation team?

          We implemented the solution in-house.

          What was our ROI?

          There has been significant time saved by the development team in terms of assessing bugs and performance issues.

          What's my experience with pricing, setup cost, and licensing?

          I'd advise others to set up live trials to asses cost scaling. Small decisions around how monitors are used can have big impacts on cost scaling. 

          Which other solutions did I evaluate?

          NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.

          What other advice do I have?

          We are excited to dig further into the new offerings around LLM and continue to grow our footprint in Datadog. 

          Which deployment model are you using for this solution?

          Hybrid Cloud

          If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

          Amazon Web Services (AWS)
          Disclosure: I am a real user, and this review is based on my own experience and opinions.
          Flag as inappropriate

          PeerSpot user
          Rich text editor
            reviewer2045004 - PeerSpot reviewer
            Software Engineering Manager at a hospitality company with 1,001-5,000 employees
            Real User
            Easy to implement with great passive and active monitoring
            Pros and Cons
            • "It is easy to implement and scale applications with standardized visibility, monitoring and alerting"
            • "Datadog is so feature-rich that it is often hard to onboard new folks and tough to decide where to invest time."

            What is our primary use case?

            We primarily use the solution for application monitoring (APM, logs, metrics, alerts).

            It's useful for active monitoring (static monitors, threshold monitors). We get a lot of value out of anomaly detection as well. SLOs and monitoring of SLOs have been another value add.

            In terms of metrics, the out-of-the-box infrastructure metrics that come with the Datadog agent installation are great. We have made use of both the custom metrics implementation as well as the log-based metrics which are extremely convenient.

            We also leverage Datadog for use of RUM and want to explore session replay.

            How has it helped my organization?

            It is easy to implement and scale applications with standardized visibility, monitoring and alerting

            We get a lot of value out of passive and active monitoring. While different teams across our organization have used different services (metrics, logs, APM, RUM), almost all teams have been able to use the dashboards to report and track high-level metrics and active monitoring. 

            Active monitoring (static monitors, threshold monitors) is great. We get a lot of value out of anomaly detection as well. SLOs and monitoring of SLOs have been another value add for our organization.

            What is most valuable?

            The APM and tracing provide visibility and the ability to get right to root cause issues while being able to deploy new services without much need for custom instrumentation quickly

            The active monitoring (static monitors, threshold monitors) has been very helpful. We get a lot of value out of anomaly detection. SLOs and monitoring of SLOs have been extremely valuable.

            The metrics and out-of-the-box infrastructure metrics that come with the Datadog agent installation are quite helpful to the organization. We have made use of both the custom metric implementation as well as the log-based metrics which are extremely convenient.

            What needs improvement?

            Datadog is so feature-rich that it is often hard to onboard new folks and tough to decide where to invest time. 

            The APM is a perfect example of this. This feature alone has so much (profiling, tracing, span summary, flame graphs). I would love to see more of the insight and automation-focused features, such as the log patterns, where I can spend time more efficiently.

            The cost of Datadog at scale can get very expensive very quickly. I would like to see a better usage/cost dashboard with breakdowns like the AWS cost explorer.

            For how long have I used the solution?

            I've used the solution for three years.

            Disclosure: I am a real user, and this review is based on my own experience and opinions.

            PeerSpot user
            Rich text editor
              reviewer2000466 - PeerSpot reviewer
              Senior Cloud Engineer, Vice President of Monitoring at a financial services firm with 10,001+ employees
              Real User
              Good ServiceNow integration, helpful API crawlers, and useful APM metrics
              Pros and Cons
              • "The seamless integration between Datadog and hundreds of apps makes onboarding new products and teams a breeze."
              • "It seems that admin cost control granularity is an afterthought."

              What is our primary use case?

              We are using the solution for migrating out of the data center. Old apps need to be re-architected. We are planning on moving to multi-cloud for disaster recovery and to avoid vendor lockouts. 

              The migration is a mix between an MSP (Infosys) and in-house developers. The hard part is ensuring these apps run the same in the cloud as they do on-premises. Then we also need to ensure that we improve performance when possible. With deadlines approaching quickly it's important not to cut corners - which is why we needed observability

              How has it helped my organization?

              Using the product has caused a paradigm shift in how we deploy monitoring. Before, we had a one-to-one lookup in ServiceNow. This wouldn't scale, as teams wouldn't be able to create monitors on the fly and would have to wait on us to contact the ServiceNow team to create a custom lookup. Now, in real-time, as new instances are spun up and down, they are still guaranteed to be covered by monitoring. This used to require a change request, and now it is automatic.

              What is most valuable?

              For use, the most valuable features we have are infrastructure and APM metrics.

              The seamless integration between Datadog and hundreds of apps makes onboarding new products and teams a breeze. 

              We rely heavily on the API crawlers Datadog uses for cloud integrations. These allow us to pick up and leverage the tags teams have already deployed without having to also make them add it at the agent level. Then we use Datadog's conditionals in the monitor to dynamically alert hundreds of teams. 

              With the ServiceNow integration, we can also assign tickets based on the environment. Now our top teams are using the APM/profiler to find bottlenecks and improve the speed of our apps

              What needs improvement?

              The real issue with this product is cost control. For example, when logs first came out they didn't have any index cuts. This caused runaway logs and exploding costs. 

              It seems that admin cost control granularity is an afterthought. For example, synthetics have been out for over four years, yet there is no way to limit teams from creating tests that fire off every minute. If we could say you can't test more than once every five minutes, that would save us 5X on our bill.

              For how long have I used the solution?

              I've used the solution for about three years. 

              What do I think about the stability of the solution?

              The solution is very stable. There are not too many outages, and they fix them fast.

              What do I think about the scalability of the solution?

              It is easy to scale. That is why we adopted it.

              How are customer service and support?

              Before premium support, I would avoid using them as it was so bad.

              How would you rate customer service and support?

              Neutral

              Which solution did I use previously and why did I switch?

              We previously used AppDynamics. It isn't built for the cloud and is hard to deploy at scale.

              How was the initial setup?

              The initial setup was not difficult. We just had to teach teams the concept of tags.

              What about the implementation team?

              We did the implementation in-house. It was me. I am the SME for Datadog at the company.

              What was our ROI?

              The solution has saved months of time and reduced blindspots for all app teams.

              What's my experience with pricing, setup cost, and licensing?

              I'd advise users to be careful with logs and the APM as those are the ones that can get expensive fast.

              Which other solutions did I evaluate?

              We looked into Dynatrace. However, we found the cost to be high.

              Which deployment model are you using for this solution?

              Hybrid Cloud
              Disclosure: I am a real user, and this review is based on my own experience and opinions.

              PeerSpot user
              Rich text editor
                Operations Manager at TodayTix
                User
                Good dashboards, easy troubleshooting, and integrations
                Pros and Cons
                • "The dashboards are super convenient to us for a more zoomed out view of what is going on with each integration that we utilize."
                • "There could be more easily identifiable documentation on how to find different things on the platform."

                What is our primary use case?

                We utilize Datadog mainly to monitor our API integrations and all of the inventory that comes in from our API partners. Each event has its own ID, so we can trace all activity related to each event and troubleshoot where needed.

                How has it helped my organization?

                Datadog gives non-dev teams insights as to what all is happening with a particular event as well as flags any errors so that we can troubleshoot more efficiently.

                What is most valuable?

                The dashboards are super convenient to us for a more zoomed out view of what is going on with each integration that we utilize.

                What needs improvement?

                There could be more easily identifiable documentation on how to find different things on the platform. It can be overwhelming at first glance, and it's hard to find appropriate documentation on the site to lead you to where you need to be. 

                For how long have I used the solution?

                I've used the solution for about 1.5 years.

                Disclosure: I am a real user, and this review is based on my own experience and opinions.
                Flag as inappropriate

                PeerSpot user
                Rich text editor
                  Buyer's Guide
                  Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
                  Updated: May 2025
                  Buyer's Guide
                  Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
                  ...
                  ...