Datadog OverviewUNIXBusinessApplication

Datadog is the #1 ranked solution in APM tools, #1 ranked solution in best Network Monitoring Tools, #1 ranked solution in Infrastructure Monitoring tools, #1 ranked solution in top Cloud Monitoring Software, #1 ranked solution in top AIOps tools, and #2 ranked solution in Log Management Software. PeerSpot users give Datadog an average rating of 8.6 out of 10. Datadog is most commonly compared to Dynatrace: Datadog vs Dynatrace. Datadog is popular among the large enterprise segment, accounting for 65% of users researching this solution on PeerSpot. The top industry researching this solution are professionals from a computer software company, accounting for 21% of all views.
Datadog Buyer's Guide

Download the Datadog Buyer's Guide including reviews and more. Updated: December 2022

What is Datadog?

Datadog is a cloud monitoring solution that is designed to assist administrators, IT teams, and other members of an organization who are charged with keeping a close eye on their networks. Administrators can use Datadog to set real-time alerts and schedule automated report generation. They can deal with issues as they arise and keep up to date with the overall health of their network while still being able to focus on other tasks. Users can also track the historical performance of their networks and ensure that they operate at the highest possible level.

Datadog Benefits

Some of the ways that organizations can benefit by deploying Datadog include:

  • Gain an integrated view of the services and programs that IT teams are employing across their networks. Users can view and monitor all of the disparate programs that they have running across their networks with this one solution. They can track these programs across the entirety of the data’s life cycle.
  • Analyze and utilize massive amounts of data in real time. Datadog’s dashboards gather data in real time. Administrators can utilize their network’s data the minute that it becomes relevant to them. Decisions can be made based on the most current information available.
  • Keep your cloud network secured against digital threats. Datadog enables users to create alerts that will notify the minute that threats arise. IT teams and administrators can rapidly address any issue that comes up and prevent any existing problem from growing worse.
  • Easily get it up and running. Users can set up Datadog, configure it, and employ API integrations to connect it to external solutions with ease.

Datadog Features

  • Customizable and prefabricated monitoring dashboards. Administrators are supplied with two different types of dashboards that they can choose from when they are setting up Datadog. They can customize the dashboards to fit any specialized monitoring need. Additionally, users can choose to use prefabricated dashboards that come with the solution.
  • Disaster recovery feature. Datadog has a built-in feature that enables organizations to continue functioning if some disaster strikes their network. If the network suffers damage, Datadog can restore lost data and infrastructure. Should a digital threat do damage to the network, Datadog ensures that the damage is not irreparable.
  • Vulnerability scanning tool. Users can keep ahead of threats to their networks by employing Datadog’s vulnerability scanning feature. This tool scans the entirety of a user’s network and warns them if a vulnerability is detected. Users can then move to patch these holes in their security before the threat to their network can escalate.

Reviews from Real Users

Datadog is a solution that stands out when compared to many of its competitors. It can offer organizations many advantages. Two major advantages are the dashboards that users can create and the monitoring capability that it gives system administrators.

A senior manager in charge of site reliability engineering at Extra Space Storage writes, “The dashboards we created are core indicators of the health of our system, and it is one of the most reliable sources we have turned to, especially as we have seen APM metrics impacted several times lately. We can usually rely on logs to tell us what the apps are doing.”

Housecall Pro’s senior director of DevOps writes, “We value the monitoring capability since it allows us to be pushed alerts, rather than having to observe graphs continually.

Datadog Customers

Adobe, Samsung, facebook, HP Cloud Services, Electronic Arts, salesforce, Stanford University, CiTRIX, Chef, zendesk, Hearst Magazines, Spotify, mercardo libre, Slashdot, Ziff Davis, PBS, MLS, The Motley Fool, Politico, Barneby's

Datadog Video

Datadog Pricing Advice

What users are saying about Datadog pricing:
  • "My advice is to really keep an eye on your overage costs, as they can spiral really fast."
  • "If you do your homework, you'll find that if you're really concerned with cost, it's good."
  • "It is easy to run up a large bill, so become familiar with the cost of each piece of your bill and use the metrics they supply to estimate and monitor your bill."
  • "Pricing seemed easy until the bill came in and some things were not accounted for."
  • "Pricing is somewhat affordable compared to other solutions but in order to really lower the costs of other products you need to plan very carefully your resources usage, otherwise, it can get expensive real quick."
  • Datadog Reviews

    Filter by:
    Filter Reviews
    Industry
    Loading...
    Filter Unavailable
    Company Size
    Loading...
    Filter Unavailable
    Job Level
    Loading...
    Filter Unavailable
    Rating
    Loading...
    Filter Unavailable
    Considered
    Loading...
    Filter Unavailable
    Order by:
    Loading...
    • Date
    • Highest Rating
    • Lowest Rating
    • Review Length
    Search:
    Showingreviews based on the current filters. Reset all filters
    reviewer1494894 - PeerSpot reviewer
    Senior Manager, Site Reliability Engineering at Extra Space Storage
    Real User
    Top 10
    Provides insightful analytics and good visibility that assist with making architectural decisions
    Pros and Cons
    • "Datadog has given us near-live visibility across our entire cloud platform."
    • "We have recently had a number of issues with stability and delays on logging, monitoring, metric evaluation, and alerts."

    What is our primary use case?

    We primarily use Datadog for logs, APM, infrastructure monitoring, and lambda visibility.

    We have built a number of critical dashboards that we display within our office for engineers to have a good understanding of the application performance, as well as business partners to understand at a high level the traffic flowing through the app.

    We started with logging, as our primary monitor, and have shifted to APM to get a deeper understanding of what our system is doing, and how the changes we are making impact the apps.

    How has it helped my organization?

    Datadog has given us near-live visibility across our entire cloud platform. We are finally in a state where we are alerting our users about degraded performance well before the helpdesk tickets start rolling in.

    We are making major architectural decisions based on the data we are getting from Datadog. It also gives us an idea of where the complexity really lies in some older, monolithic apps. 

    We have used the APM endpoint monitoring to prioritize work on slower endpoints because we can see the total count, as well as the latency. That has been a big driver in our refactor work prioritization.

    We have struggled to get more business-centric measures in our code to surface actual business values in our reports, but that is our next initiative.

    What is most valuable?

    We started with Log analytics in the beginning stages of our monitoring journey. Those were very insightful, but obviously only as useful as we made them with good logging practices.

    The dashboards we created are core indicators of the health of our system, and it is one of the most reliable sources we have turned to, especially as we have seen APM metrics impacted several times lately. We can usually rely on logs to tell us what the apps are doing.

    APM and Traces have been crucial to understanding how users are actually using the app. That drives a lot of our decisions around refactoring and focusing our limited engineering resources.

    What needs improvement?

    Continued improvement around cost and pricing model is needed. It is pretty complex and takes a fair amount of intimate knowledge to know exactly how turning on a single function is going to impact your bill, especially when you don't see the metrics for a day or two. 

    We have recently had a number of issues with stability and delays on logging, monitoring, metric evaluation, and alerts. More often than not in the past month, it seems that we get the banner across the to of our dashboards that some service is impacted. They don't always show up on the incident page, either.

    Buyer's Guide
    Datadog
    December 2022
    Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: December 2022.
    656,862 professionals have used our research since 2012.

    For how long have I used the solution?

    We have been using Datadog for two years.

    What do I think about the stability of the solution?

    Overall, it has been fairly stable for us. There are the occasional issues with importing data, that has usually been resolved in a short time. We have never had an issue where that data was lost, just delayed, and eventually backfilled. 

    It seems (anecdotally, of course) that there have been a few more stability issues lately. We have noticed several days that we are getting in-app alert banners indicating that some metric or log ingestion was delayed, or the web app itself was experiencing severe slowness. 

    Overall, these issues are resolved rather quickly - kudos to their engineering teams. I hear that they actually use Datadog to monitor Datadog. 

    What do I think about the scalability of the solution?

    Datadog is very scalable but just watch the cost.

    How are customer service and support?

    Technical support is hit and miss; there are a number of nuances to how this tool should be implemented, and it is difficult to re-explain how our infrastructure and applications are set up every time we need an in-depth investigation to understand what is broken.

    Which solution did I use previously and why did I switch?

    Previously, we used AppDynamics. The pricing model didn't seem to fit with actual cloud spend. Now we may have swung the pendulum a little too far, and seem to be dealing with pricing on every facet of the application. 

    How was the initial setup?

    The initial setup was pretty straightforward. Additional tweaks and configuration have been a bit more difficult as we get deeper and deeper into the guts of the integrations. Making sure we are keeping up with a rapid release schedule, and keeping our server clients in sync with our app packages has been troublesome. There have been some major changes in the APM that have introduced a number of bugs and broken some of our dashboards and alerts.

    What about the implementation team?

    Our in-house team handled the deployment, with a lot of tickets created for the Datadog team.

    What was our ROI?

    ROI is difficult to measure completely. Our first year spend compared to our second and now going into the third year spend have been significantly different.

    What's my experience with pricing, setup cost, and licensing?

    My advice is to really keep an eye on your overage costs, as they can spiral really fast. We turned on some additional span measures and didn't realize until it was too late that it had generated a ton.

    Frankly, we love the visibility it gives us into our applications, but it is a bit cumbersome to ensure we are paying for the right stuff. Overall, the cost is worth it, as it helps us keep system-critical applications up and running, and reduces our detection and correction times significantly.

    Which other solutions did I evaluate?

    We evaluated Dynatrace and AppD before choosing this product.

    What other advice do I have?

    Datadog requires pretty close supervision on the usage page to ensure you aren't going out of control. They have provided a bunch of new features to assist in retention percentage, but it can be a bit confusing on what is being retained, and what can be viewed again after triggering an alert. It's a difficult balance of making sure you are getting the right data for alerts, and still having the correct information still available for research after the fact.

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Amazon Web Services (AWS)
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    BrianHeisler - PeerSpot reviewer
    Principal Enterprise Systems Engineer at a healthcare company with 10,001+ employees
    Real User
    Top 10
    An out-of-the-box solution that allows you to quickly build dashboards
    Pros and Cons
    • "I like that you can build out a dashboard pretty quickly. There are some things that come out of the box that you don't really need to do, which is great because they're default settings."
    • "I think better access to their engineers when we have a problem could be better."

    What is our primary use case?

    We deploy agents on-premise to collect data on on-premise VM instances. We don't use Datadog in our cloud network. We do have some Cloud apps that we have it on and we also have Containers. We have it on their headquarters, the main software for them is on their own Cloud.

    Eventually, we're building out the process now and using it better. We plan to use Datadog for root cause analysis relating to any kinds of issues we have with software, with applications going down, latency issues, connection issues, etc. Eventually, we're going to use Datadog for application performance, monitoring, and management. To be proactive around thresholds, alerts, bottlenecks, etc. 

    Our developers and QA teams use this solution. They use it to analyze network traffic, load, CPU load, CPU usage, and then Tracey NPM, API calls for their application. There are roughly 100 users right now. Maybe there's 200 total, but on a given day, maybe 13 people using this solution.

    How has it helped my organization?

    It hasn't improved the way our organization functions yet, because there's a lot of red tape to cut through with cultural challenges and changes. I don't think it's changed the way we do things yet, but I think it will — absolutely it will. It's just going to take some time.

    What is most valuable?

    I like that you can build out a dashboard pretty quickly. There are some things that come out of the box that you don't really need to do, which is great because they're default settings. Once you install the agent on the machine, they pick up a lot of metrics for you that are going to be 70 or more percent of what you need. Out of the box, it's pretty good.

    For how long have I used the solution?

    I have been using Datadog every day since September 2020. I also used it at a previous company that I worked for.

    What do I think about the stability of the solution?

    Stability-wise, it's great.

    What do I think about the scalability of the solution?

    It seems like it'll scale well. We're automating it with Ansible scripts and service now so that when we build a new virtual machine it will automatically install Datadog on that box.

    How are customer service and technical support?

    The tool itself is pretty good and the customer service is good, but I think they're a growing company. I think better access to their engineers when we have a problem could be better. For example, if I asked the question, "Hey, how do I install it on this type of component?" We'll try to get an engineer on the phone with us to step us through everything, but that's a challenge because they're so busy.

    Technically-wise, everything's fine. We don't need any support, everything that I need to do, I can do right out of the box. But as far as, in the knowledge of their engineers on how to configure it on given systems that we have, that's maybe at six because they're just not as available as I would've hoped.

    Which solution did I use previously and why did I switch?

    We were using AppDynamics. Technically, we still have it in-house because it's tightly wound into certain systems, but we'll probably pull that off slowly over time. The reason we added Datadog and eventually we'll fully switch over is due to cost. It's more cost-friendly to do it with Datadog.

    Which other solutions did I evaluate?

    Yes, we looked at Dynatrace, AppDynamics, and New Relic. Personally, I wouldn't have chosen Datadog for the POC if it were up to me. Datadog was a leader, but New Relic was looking really good. In the end, the people above me decided to go with Datadog — it's a big company, so they wanted to move fast, which makes sense.

    What other advice do I have?

    If you're interested in using Datadog, just do your homework, as we did. We're happy so far I think; time will tell as we are still rolling things out. It's a very good company. It's going to be a year before we really can tell anything. If you do your homework, you'll find that if you're really concerned with cost, it's good.

    There are some strengths that AppDynamics and Dynatrace have that Datadog I don't think will have down the road, but they're not things we necessarily need — they're outliers. It would be nice to have them, but we can manage without them.

    Know what you want. There is no need to pay for solutions like Dynatrace or AppDynamics that are more expensive or things that are just nice to have if you don't absolutely need to have them. That's something people need to understand. You just have to make sure you understand what it is that you need out of the tool — they are all a little different, those three. I would say to anybody that's going with Datadog: you just have to be patient at the beginning. It's a very busy company right now. They're very hot in the market.

    Overall, on a scale from one to ten, I would give Datadog a rating of eight. It does what we need it to do, and it seems to be pretty user-friendly in terms of setting things up.

    Features-wise, I'd give them a rating of ten out of ten. The better access we get to assistance from the engineers on how to configure dashboards and pulling metrics that we need, that would bring it up a little bit. So overall it would be harder and it would have to be perfect for it. I would say maybe they could bring it to a nine.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Buyer's Guide
    Datadog
    December 2022
    Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: December 2022.
    656,862 professionals have used our research since 2012.
    reviewer1479957 - PeerSpot reviewer
    Senior Director of DevOps at Housecall Pro
    Real User
    Top 20
    Good graphing and dashboards, and it improves visibility for developers
    Pros and Cons
    • "Having a wealth of information has helped us investigate outages, and having historical data helps us tune our system."
    • "Datadog has a lot of documentation, but a lot of that documentation assumes you know how the service works, which can lead to confusion."

    What is our primary use case?

    We primarily use Datadog for the monitoring of EC2 and ECS containers running mostly Rails applications that host a SaaS product. We also monitor ElasticSearch and RDS, and we are working on adding their Application Performance Monitoring solution to monitor our applications directly.

    We use DataDog to create dashboards, graphs, and alerts based on interesting metrics. DataDog is our first place to look to find the performance of our system.

    We also use their logging platform and it works well. Especially useful is that the logs and metrics are tightly integrated so you can jump between them easily.

    How has it helped my organization?

    Developers are able to see how code is running in production, where this was mostly opaque previous to us implementing DataDog. We are able to emit custom metrics that are specific to our business, and the built-in metrics have also proven useful. Having a wealth of information has helped us investigate outages, and having historical data helps us tune our system.

    DevOps engineers are able to put sensors around our system to proactively detect problems, whereas before, our engineers heard about problems from customers. Logs are easier to find for developers.

    What is most valuable?

    Metric graphing and Dashboards are the most valuable features because they give us good observability into our system and work well to alert us when interesting things happen. We use this functionality daily.

    We value the monitoring capability since it allows us to be pushed alerts, rather than have to observe graphs continually. The integrations with Slack and PagerDuty enable us to be interrupted appropriately and keep a running tab on the system without bothering us unnecessarily.

    The online process monitoring has been extremely helpful, as it gives engineers the ability to see the live status of all the processes running our systems without them having to log in.

    What needs improvement?

    Their logging solution is expensive for our use case. They do have the capability to rehydrate old or incomplete logs, and it works, but I would rather not have to think about that operation.

    Datadog has a lot of documentation, but a lot of that documentation assumes you know how the service works, which can lead to confusion. Positive note is that they do have lots of documentation, it just needs better curation.

    Their APM solution still needs some work, but they are actively developing it. I would also like to see more database-specific application monitoring.

    For how long have I used the solution?

    I have been using Datadog for five years across two companies.

    What do I think about the stability of the solution?

    Any issues are addressed and communicated very quickly. I have not had any issues with uptime.

    What do I think about the scalability of the solution?

    If you do not need 100% of data such as logs, APM traces, etc., this scales well. It does not scale as well if you want 100% of your logs indexed. You should understand any other usage-based bills before using any part of their service as it is very easy to run up a large bill.

    The performance of the system scales very well, and host monitoring and APM are relatively cheap.

    How are customer service and technical support?

    Account support is excellent.

    Customer support is good if you get them to go beyond pointing out the right documentation.

    Which solution did I use previously and why did I switch?

    Previously, I used homebuilt solutions with Nagios and Cacti but found that there was far too much work to understand them and keep them up and fed compared to the value that I got. They also did not integrate well with existing data sources without a lot of effort.

    I also previously used StackDriver and found it too opinionated. I like that DataDog gives you tools to work with certain types of data and make your own graphs, monitors, etc., whereas, with StackDriver, I felt like there were a limited number of ways you could accomplish goals.

    How was the initial setup?

    The basic setup is easy. A more advanced setup can be tricky because the documentation assumes you know how the system works already. Support is somewhat helpful, but mostly points out the documentation you should already have found.

    What about the implementation team?

    We implemented in-house.

    What's my experience with pricing, setup cost, and licensing?

    My advice is to understand what number of hosts and data you want to commit to. Beware that usage-based billing is both a blessing and a curse. It is easy to run up a large bill, so become familiar with the cost of each piece of your bill and use the metrics they supply to estimate and monitor your bill.

    I have had good luck with their support team helping us to figure out the correct commit levels. Their account support is excellent in this regard. I have heard their sales team can be aggressive, but I have not experienced it personally.

    Which other solutions did I evaluate?

    I originally chose Datadog because of my previous experience. We recently considered moving over to New Relic because we liked their APM solution better. However, the pricing of New Relic and our familiarity with Datadog won over. New Relic is a good product but it didn't fit our overall needs as well as Datadog.

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Nuno Rosa - PeerSpot reviewer
    Principal Consultant at Infosys
    MSP
    Top 5Leaderboard
    Easy to set up and good UI but needs better customization capabilities
    Pros and Cons
    • "The many dozens of integrations that the solution brings out of the box are excellent."
    • "Deploying the agents is still very manual."

    What is our primary use case?

    The solution is basically used for servers and applications.

    What is most valuable?

    The UI, basically, is the most valuable aspect of the solution. I really like the look and feel of the solution. It's not very distinctive now since other players have caught up, however, they were the first in the market to present such an effective UI. 

    The many dozens of integrations that the solution brings out of the box are excellent.

    It's easy to set up.

    What needs improvement?

    Deploying the agents is still very manual. 

    Network monitoring could be better or rolled into this solution so that you do not have to buy a different product.

    Customization of the tool itself should be taken into account. At the moment, although what they provide out of the box is good, they don't offer many customization possibilities. I know it's difficult, however, it's something that they would need to look at. When the customer gets some customization, they want customized requirements. We cannot do it. 

    For how long have I used the solution?

    I've been dealing with the solution for five years. 

    What do I think about the stability of the solution?

    It's quite stable. I have never had an issue in regard to reliability, so it's very stable.

    What do I think about the scalability of the solution?

    It's very scalable. I have not reached the limits at any time, never in the solution. I've never seen any performance degradation in large environments. I would say it's very scalable.

    Each client has its own instance. We do not share instances with multiple customers. There's usually between 20 and 30, depending on the customer.

    How are customer service and support?

    I never use technical support, to be honest.

    How was the initial setup?

    The initial setup for the solution itself is quite straightforward. You just set it up and that's it. However, when it comes to, for instance, deploying the agents to the servers, or at least the target machines, it's still a manual task. They still do not have centralized management of the FD agents, which basically delays the deployment of the solution. It's very manual still.

    How long it takes to deploy is difficult to pin down. It will vary based on the environment size. Obviously, if it's ten servers, it will basically take half an hour or one hour. If it's 5,000, obviously, besides the number of notes, other considerations will need to be taken into account. If t's a large environment, it will take much longer. We would need to basically develop a solution, or an effective process to deploy the agent and configure them in a standardized manner. This is something that the tool itself or the tool provider does not offer out of the box. You need to build it. That's a drawback.

    How many people you need for the deployment and maintenance processes depends on the environment's size and geographical area. On average,  I would usually require for every 500 notes, one resource for implementation. Then for overall support, I usually put one resource per 1500.

    What was our ROI?

    Before, the ROI was much higher as you would not have to compete with any kind of tool since they were very good in the space. However, with time, other companies have picked up the slack. Now, you have other tools which provide a higher ROI. I cannot give a specific ROI percentage since I don't use it for personal use with deployment. We deploy it on behalf of customers. Obviously, depending on the deal, depending on the size, and the ROI will vary. If people are looking for a global monitoring solution in the same tool as Datadog network monitoring, they are always hindered as Datadog does not provide an adequate solution for it. That kind of decreases the ROI since you still need to get another tool to do the network monitoring.

    What's my experience with pricing, setup cost, and licensing?

    The licensing is a bit complicated. When you pay for it on a note basis, that's perfectly fine. However, when you put log analytics on top of it, it's based on traffic. This is actually an issue. It gets complicated.

    What other advice do I have?

    I'm providing Datadog. I'm a retailer.

    I would recommend the solution. 

    I would suggest if their environment is in the cloud, companies have their environments in the public cloud, such as GCP, Azure, or AWS. Datadog is a very good candidate to provide an overview of the monitoring. If you want to consider a hybrid solution where systems and servers and applications also provide a good solution and have a lot of APM capabilities, the only drawback will be network monitoring. When you grab a tool that you want to basically monitor the entire environment at a single point of contact, with Datadog, it's possible, however, there's not an effective tool to do network monitoring.

    I'd rate the solution seven out of ten.

    Which deployment model are you using for this solution?

    Public Cloud
    Disclosure: My company has a business relationship with this vendor other than being a customer:
    Flag as inappropriate
    PeerSpot user
    reviewer1476039 - PeerSpot reviewer
    Network Engineer / AWS Cloud Engineer / Network Management Specialist at CareFirst
    Real User
    Top 10
    Good visualizations and dashboards help to minimizes downtime and resolve issues quickly
    Pros and Cons
    • "The most valuable feature is the dashboards that are provided out of the box, as well as ones we were able to configure."
    • "More pre-configured "Monitor Alerts" would be helpful."

    What is our primary use case?

    We were in need of a cloud monitoring tool that was operationally focused on the AWS Platform. We wanted to be able to responsibly and effectively monitor, troubleshoot, and operate the AWS platform, including Server, Network, and key AWS Services.

    Tooling that highlighted and detected problems, anomalies, and provided best practice recommendations. Tooling that expedites root-cause analysis and performance troubleshooting.

      Datadog provided us the ability to monitor our cloud infrastructure (network, servers, storage), platform/middleware (database, web/applications servers, business process automation), and business applications across our cloud providers.

      How has it helped my organization?

      Datadog provided us the tooling to help us effectively monitor, troubleshoot, and operate the AWS platform, including Server, Network, Database, and key AWS Services. It highlights detected problems and anomalies and provides best practice recommendations, expedites root-cause analysis, and performance troubleshooting.

      Datadog provides analytics and insights that are actionable through out-of-the-box visualizations, dashboards, aggregation, and intuitive searching that shortens the time to value and account for our limited time & resources we have to operate in production.

      What is most valuable?

      The most valuable feature is the dashboards that are provided out of the box, as well as ones we were able to configure. Specific Dashboards that were provided that made things easier were EC2, RDSKubernetes dashboards.

      We also use the logging tool, which makes searching for specific error logs easier to do.

      Datadog Logging provides the capability for us to use AWS logs such as VPC Flow Logs, ELB, EC2, RDS, and other logs that provide lots of relevant operational data but are not actionable. Datadog provides a tool that can provide us analytics and insights that are actionable for visualizations, dashboards, alerting, and intuitive searching.

        What needs improvement?

        More pre-configured "Monitor Alerts" would be helpful. Datadog's knowledge of its customers and what they are looking for in terms of monitoring and alerting could be taken advantage of with pre-canned alerts. They have started this with "Recommended Monitors".  That feature was very helpful when configuring our Kubernetes alerts. More would be even better. 

        Datadog tech support is very good. One area that could be more helpful is actually talking to someone or sharing your screen to help troubleshoot issues that arise. For new cloud engineers just coming into the cloud monitoring field, there is a learning curve. There is a lot to learn and figure out. For example, we still ran into some issues configuring the private link and more videos of how to do things could be of use.

        For how long have I used the solution?

        We have been using Datadog for one year.

        What do I think about the stability of the solution?

        We have not run into any issues with stability.

        What do I think about the scalability of the solution?

        The scalability of Datadog is very good.

        How are customer service and technical support?

        Customer service has been excellent.  I communicate weekly a Datadog Customer Success Manager.  He helps me followup on any open issues or questions that we may have.  Technical support has been very good. Opening tickets is easy.  Sometimes a Tech Engineer may take a bit of time to get back with you.  Communicating with Tech Engineer has to be done via ticket/email - no phone assistance is available.

        Which solution did I use previously and why did I switch?

        we did not.

        How was the initial setup?

        Procedures for setup seemed straightforward but once you got going, there were some issues. For us, getting our private link to work needed additional tech support. They were able to help us resolve the issue we were experiencing. I think the procedures could be done a bit better to help you with setup.

        What about the implementation team?

        We deployed it ourselves.

        What was our ROI?

        Datadog helps us minimize downtime and helps us resolve issues quickly.  

        What's my experience with pricing, setup cost, and licensing?

        Pricing seemed easy until the bill came in and some things were not accounted for. The issue may have been that we didn't realize what was being accounted for, such as the number of servers and the number of logs being ingested.

        Datadog had really good pre-sale reps that work with us but need to make sure all the details are covered.

        Which other solutions did I evaluate?

        The solution we were looking for needed to provide out-of-the-box capabilities that shorten the time to value. We had limited time & limited resources. Datadog had high recommendations in these areas, so we decided to do a trial with them.

          What other advice do I have?

          We are very pleased with Datadog overall.

          Datadog has assigned an account rep to us that meets with us regularly to make sure all our needs are being met and help us get answers to any questions or issues we are running up against. They have been of great helping us standup monitoring of our Kubernetes environment.

          Which deployment model are you using for this solution?

          Hybrid Cloud

          If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

          Amazon Web Services (AWS)
          Disclosure: I am a real user, and this review is based on my own experience and opinions.
          PeerSpot user
          reviewer1477686 - PeerSpot reviewer
          Senior DevOps Engineer at DigitalOnUs
          Real User
          Top 10
          Affordably-priced and improves visibility of infrastructure, apps, and services
          Pros and Cons
          • "Having a clear view, not only of our infrastructure but our apps and services as well, has brought a great added value to our customers."
          • "The pricing model could be simplified as it feels a bit outdated, especially when you look at the billing model of compute instances vs the containers instances."

          What is our primary use case?

          Our primary use of Datadog includes: 

          • Keeping a close look into our AWS resources. Monitoring our multiple RDS and ElastiCache instances play a big role in our indicators.
          • Kubernetes. We aren't using all of the available Kubernetes integrations but the few of them that work out of the box adds great value to our metrics.
          • Monitoring and alerting. We wired our most relevant monitoring and alerts to services like PagerDuty, and for the rest of them, we keep our engineers up to date with constant Slack updates. 

          How has it helped my organization?

          Observability is something that a lot of Companies are trying to achieve. Having a clear view, not only of our infrastructure but our apps and services as well, has brought a great added value to our customers.

          For a logging solution, we use to have Papertrail. It did the trick but having a single point that manages and indexes all the logs is a BIG improvement. Also, having the option to generate metrics from logs is a game-changer that we're trying to include in our monitoring strategy.

          I would like to say the same about APM but the support for PHP seems to be somewhat lacking. It works but I think this service could provide us more information.

          What is most valuable?

          With respect to logs, we used to integrate various kinds of tools to achieve very basic tasks and it always felt like a very fragile solution. I think logs are by far the most useful feature and at the same time, the one that we could improve.

          APM - This is either a hit or miss, allow me to explain: we use various programming languages, mainly PHP and Ruby, and the traces generated don't always provide all of the information we want. For example, we get a great level of detail for the SQL queries that the app generates but not so much for the PHP side. It's hard to track where exactly where all of the bottlenecks are, so some analysis tools for APM could make a good addition.

          What needs improvement?

          Please add PHP profiling; you already have it for other popular programming languages such as Python and Java, which is great because we have a little bit of those, but our main app is powered by PHP and we don't have profiling for this yet. I guess it's only a matter of time for this to be added, so in the meanwhile, you can consider this review as a vote for the PHP profiling support.

          The pricing model could be simplified as it feels a bit outdated, especially when you look at the billing model of compute instances vs the containers instances.

          For how long have I used the solution?

          We have been using Datadog for one year.

          What do I think about the stability of the solution?

          It's pretty stable for the main integrations. There was only one time where Datadog was down and that was scary since all of our monitoring is handled by Datadog. There was a lot of uncertainty while the outage was in place.

          What do I think about the scalability of the solution?

          For everyday use, it's adequate, but for very specific tasks, not so much. There was a time where I had to do a big export and as expected, the API is somewhat limited. Since it was a one-time task, it was not a big deal but if this was a regular task, I wouldn't be happy about it.

          How are customer service and technical support?

          For small tasks, I think it's great. For specialized support, it feels like you're under-staffed, having to wait days/weeks for a solution is a big NO-NO.

          Which solution did I use previously and why did I switch?

          I've used a few other products such as NewRelic and AppDynamics. The switch is usually affected by two factors: pricing and convenience.

          How was the initial setup?

          Getting APM metrics out of Kubernetes is always a painful task. We got support to take a look at this and we had to go through various iterations to get it right, and then AGAIN the next year. This was a bad experience.

          What about the implementation team?

          It was all implemented in-house. The documentation is fairly up to date, for the most part.

          What's my experience with pricing, setup cost, and licensing?

          Pricing is somewhat affordable compared to other solutions but in order to really lower the costs of other products you need to plan very carefully your resources usage, otherwise, it can get expensive real quick.

          Which other solutions did I evaluate?

          Unfortunately, it wasn't my call to include Datadog for this Company but sure I'm glad that the Lead Architect took this decision. It brought many improvements in a small span of time.

          What other advice do I have?

          Please add PHP profiling soon!

          Which deployment model are you using for this solution?

          Public Cloud
          Disclosure: I am a real user, and this review is based on my own experience and opinions.
          PeerSpot user
          LuWang - PeerSpot reviewer
          DevOps Engineer at Screencastify
          Real User
          Customizable and helpful for isolating and filtering environments
          Pros and Cons
          • "We have way more observability than what we had before - on the application and the overall system."
          • "Auto instrumentation on tracing has not been very easy to find in the documentation."

          What is our primary use case?

          We use Datadog for observability and system/application health, mainly for product support, triaging, debugging, and incident responses.

          We use a lot of the logging and the Datadog agent to collect logs, metrics, and traces from our GKE workloads. We use APM and continuous profiling for latency and performance measurement. We use RUM to observe frontend user events, such as tracing on request and what actions they take before errors occur. We also use error tracking and source maps to debug production failures.

          We are still relatively new to the product, and we are planning to use more of the notebook functionality and power packs to record run books and break knowledge silos. We also need to utilize dashboards and continuous profiling more for performance measurement and integrate Datadog alerts for incident response.

          How has it helped my organization?

          We have way more observability than what we had before - on the application and the overall system. That includes the GKE cluster, nodes, and pods. It's helped with our cloud-run instances, databases, and data storage.

          We also started observability in the CI pipeline to measure our CI performance, as it was a pain point for us. We are aiming to do incremental deployments and releases, and the bottleneck so far has been our CI performance. The visibility on which actions or functions take the most time allows us to pinpoint and focus on improving configurations on these.

          What is most valuable?

          We use structure logging a lot to triage production issues. The querying, attributes and tags manipulation, and customization have been very helpful in isolating and filtering environments. The integration with Winston logger has also been a breeze.

          First and foremost, was that structured logging, tags, and attributes have not only allowed us to narrow down to a problem quickly in production, they have also let us create dashboards from these logs to understand more user behaviors, such as how many users stop and leave our application before an upload has completed. That helps us understand how important processing time is to a user.

          We also intend to use distributed tracing more to understand where the error has occurred in a particular request.

          What needs improvement?

          Definitely, documentation could use improvement. As I navigated and try to find instrumentation and implementation details, I discovered inconsistency among SDKs based on languages. 

          There are also places where highlighting can be improved. I once created an issue on GitHub, and it was resolved right away by an engineer. He pointed out that it was actually in the documentation. I looked again and found it was not very obvious. We were stuck on the problem for days.

          Auto instrumentation on tracing has not been very easy to find in the documentation. We ended up using OpenTelemetry, yet the conversion between tracing contexts has been difficult.

          For how long have I used the solution?

          We've used the solution between six months and a year. 

          How are customer service and support?

          Customer service and support are generally very fast. I did experience one ticket, which involved changing the log index retention period, not being responded to. Any support tickets related to technical issues were resolved pretty fast.

          How would you rate customer service and support?

          Positive

          Which solution did I use previously and why did I switch?

          We used to use GCP Stackdriver for logging and monitoring since our infrastructure is all GCP based. It was lacking a lot, particularly on tracing and structured logging. We often had a lot of trouble triaging and diagnosing a production problem. Datadog's specialty is observability. Since we started using the product, we were able to create dashboards, and utilize APM, continuous profiling, RUM, and distributed tracing for production support and user trends.

          Datadog also offers labs and workshops for its products, which is very helpful.

          What about the implementation team?

          We implemented the product ourselves.

          What was our ROI?

          I'm not sure what our ROI would be.

          What's my experience with pricing, setup cost, and licensing?

          We started with on-demand pricing as we were re-writing our product, and we weren't sure about the total usage. After we went into production and released the product, we experienced a price surge. Fortunately, our Datadog account manager reached out to us and suggested a monthly subscription, which is what we'll be switching to.

          I'd advise keeping an eye on the usage and possibly setting up some monitoring on price. We didn't have much of a setup cost; we started with a free trial and continued with on-demand after the trial ended.

          Which other solutions did I evaluate?

          We didn't evaluate many of the other options. However, we do also use OpenTelemetry, which is vendor agnostic and integrates with Datadog.

          What other advice do I have?

          We always keep the Datadog agent to the latest version.

          Which deployment model are you using for this solution?

          Public Cloud

          If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

          Google
          Disclosure: I am a real user, and this review is based on my own experience and opinions.
          Flag as inappropriate
          PeerSpot user
          Ramon Snir - PeerSpot reviewer
          CTO at a tech vendor with 1-10 employees
          Real User
          Increases delivery velocity with les manual testing and good integrations
          Pros and Cons
          • "Since we integrated Datadog, we have had increased confidence in the quality of our service, and we had an easier time increasing our delivery velocity."
          • "Since the Datadog platform has so many separate features, solving so many use cases, there are often inconsistencies in feature availability and interoperability between products."

          What is our primary use case?

          We use Datadog for three main use cases, including:

          • Infrastructure and application monitoring. It is ensuring that our services are available and performant at all times. This allows us to proactively address incidents and outages without customers contacting us. This includes monitoring of cloud resources (databases, load balancers, CPU usage, etc.), high-level application monitoring (response times, failure rates, etc.), and low-level application monitoring (business-oriented metrics and functional exceptions to customer experience.
          • Analyzing application behavior, especially around performance. We often use Datadog's application performance monitoring on non-production environments to evaluate the impact of newly introduced features and gain confidence in changes.
          • End-to-end regression testing for APIs and browser-based experiences. Using Datadog's synthetic testing checks periodically that the system behaves in the exact correct way. This is often used as a canary to detect issues even before users reach them organically.

          How has it helped my organization?

          Since we integrated Datadog, we have had increased confidence in the quality of our service, and we had an easier time increasing our delivery velocity. 

          We have seen time after time that the monitors we have carefully created based on all ingested data are detecting issues quickly and accurately. 

          This means we allow ourselves to manually test things less frequently. We have also had an easier time investigating application errors and slowness using Datadog's APM and log explorer products which allow us to introspect any part of the system, in its execution context.

          What is most valuable?

          The most valuable features include:

          • Integrated observability data ingestions: All data that Datadog collects is connected. This allows easily connected logs with failed requests, and slow database questions with services and requests.
          • Broad integrations allow us to monitor our entire production environment in a single place, not just cloud resources. Since all parts stream metrics, logs, and events to Datadog, we can have unified dashboards and manage monitors and incidents all from the same page.
          • A high level of configuration. We can configure and modify many parts, from how data is collected from our applications to how Datadog parses and visualizes it. This means that we always get the best experience, and we don't need to find ten different products that do small things well or settle on one product that does everything badly.

          What needs improvement?

          Since the Datadog platform has so many separate features, solving so many use cases, there are often inconsistencies in feature availability and interoperability between products. 

          Older, more mature products tend to be complete (many features, customization, broad integrations, etc.), while newer products will often be at a "just above minimum viable product" phase for a long time, doing what's intended yet missing valuable customizations and integrations.

          For how long have I used the solution?

          We've used the solution for 12 months.

          What do I think about the scalability of the solution?

          The solution scales very well on technical aspects, being able to ingest large quantities of data from many services. However, the pricing often doesn't scale naturally, and effort has to be put in to keep ongoing costs at a reasonable amount.

          How are customer service and support?

          Customer service and support are generally very high-quality. In most cases, they reply very quickly and offer well-researched and relevant responses. This is contrasted with many vendors who take a long time to reply and send links to documentation instead of understanding the problem.

          However, we had cases where support took several weeks to reply to a complicated request and sometimes eventually responded that the issue cannot be resolved. These are rare edge-case occurrences.

          How would you rate customer service and support?

          Positive

          How was the initial setup?

          A large part of the initial setup was straightforward. We were able to collect about 80% of the relevant and 90% of the meaningful insights from just a couple of hours of connecting the AWS integration and the Datadog APM agent. 

          Getting it to 100% and configuring and customizing things to our unique situation, took about two weeks. Datadog's documentation and support team were extremely helpful during both phases.

          What about the implementation team?

          We handled the setup in-house.

          What was our ROI?

          From the number of outages stopped or shortened (which lead to lost revenue from non-renewals) and the number of hours saved on investigations (which correlates to engineering salaries), I estimate that the ROI of the implementation time and monthly charges to be between 10x and 20x.

          What other advice do I have?

          We use the solution as a SaaS deployment.

          Disclosure: I am a real user, and this review is based on my own experience and opinions.
          Flag as inappropriate
          PeerSpot user
          Buyer's Guide
          Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
          Updated: December 2022
          Buyer's Guide
          Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.