Try our new research platform with insights from 80,000+ expert users
reviewer9864210 - PeerSpot reviewer
DevOps Engineering Lead at Hellenic Bank Public Company Ltd
User
Top 20
Good dashboards and observability capabilities but pricing needs improvement
Pros and Cons
  • "Dashboards are the most valuable."
  • "The monitors need improvement."

What is our primary use case?

We have multiple nodes integrated into our Azure infrastructure and our AKS clusters. These nodes are integrated with traces (as APM hosts).

We also have infrastructure Hosts integrated to see the metrics and the resources of each hosts mainly for Azure VMs and AKS nodes. Additionally, we also have hosts from our VMs in Azure which act as Activemq and we integrate them as messaging queues to show up in the Activemq dashboard.

We have recently added Activemq as containers in the AKS and we are also integrating those as messaging queues to show up in the Activemq dashboard integration 

How has it helped my organization?

Logs are great. Having all services with different teams sending the logs to Datadog and having all logs in the same place is very helpful for us to understand what is going on in our app; filtering of the logs a huge help and adding special custom filters is easy, filters are fast. Documentation is better than average, with little room for improvement.

Dashboards are simple, and monitors are very easy to configure and get notified if something is wrong.

With the aggregated logs, we can now see logs from other systems and identify problems in other areas in which we had no visibility before.

What is most valuable?

Dashboards are the most valuable. We need the observability. We have given the dashboards to a dedicated team to monitor them off working hours and they are reporting whatever they see going red. This helps us since people without any knowledge can understand when there is a problem and when to react and when to inform others by simply looking if the monitor (showing the dashboards) turns up red. 

Traces being connected to each other and seeing that each service is connected through one API call is very helpful for us to understand how the system works.

What needs improvement?

The monitors need improvement. We need easier root cause analysis when a monitor hits red. When we get the email, it's hard to identify why the trigger has gone red and which pod exactly is to blame in a scenario where the pod is restarting, for example.

Prices are a very difficult thing in Datadog. We have to be very mindful of any changes we make in Datadog, and we are a bit afraid of using new features since, if we change something, we might get charged a lot. For example, if we add a network feature to our nodes, we might get charged a lot simply by changing one flag, even though we are only going to use one small feature for those network nodes. However, due to the fact that we have more than 50 nodes, all of the nodes will be charged for the feature of "Network hosts".

This leads us to not fully utilize the capabilities of Datadog, and it's a shame. Maybe we can have a grace period to test features like a trial and then have datadog stop that for us to avoid paying more by mistake.

Buyer's Guide
Datadog
October 2025
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: October 2025.
872,029 professionals have used our research since 2012.

For how long have I used the solution?

I've used the solution for five years.

What do I think about the stability of the solution?

The solution is stable enough. We found it to be down only a few times, and it's reasonable.

What do I think about the scalability of the solution?

The solution offers very good scalability. When we added more logs and more hosts, we did not notice any degradation in the service.

How are customer service and support?

Support is very good. They answer all of our questions, and with a few emails, we get what we need

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We previously used Elastic. We had to set up everything and maintain it ourselves.

How was the initial setup?

Datadog has very good support and it is not so complicated to set up.

What about the implementation team?

We set up the solution in-house. We integrated everything on our own.

What was our ROI?

We found the product to be very valuable.

What's my experience with pricing, setup cost, and licensing?

I'd advise others to start small and then integrate more stuff. Be mindful when using Datadog.

Which other solutions did I evaluate?

We evaluated Splunk and ELK.

What other advice do I have?

Be careful of the costs. Set up only the important things.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Jordan Lee - PeerSpot reviewer
Site Reliability Engineer at Icario
User
Top 20
Good centralization with helpful monitoring and streamlined investigation capabilities
Pros and Cons
  • "The two most valuable aspects are the Terraform provider for Datadog and the K8s Orchestrator."
  • "A big problem with Datadog is the billing. They need to make the billing more user-friendly."

What is our primary use case?

We utilize Datadog to monitor both some legacy products and a new PaaS solution that we are building out here at Icario which is Micro-Service arch. 

All of our infrastructure is in AWS with very few legacies being rackspace. For the PaaS we mainly just utilize the K8s Orchestrator which implements the APM libraries into services deployed there as well as giving us infra info regarding the cluster. 

For legacies, we mainly just utilize the Agent or the AWS integration. With APM in specific places. We monitor mainly prod in Legacy and the full scope in the PaaS for now.

How has it helped my organization?

Datadog has greatly improved the time needed to investigate issues. Putting everything into a single pane of glass. Allowing us to get ahead of infra/app-based issues before they affect customer experience with our products. 

Outside of that, the ease of management, deployment of agents, integrations etc. has greatly helped the teams. There isn't much leg work needed by the devs to manage or deploy Datadog into their stacks. This is with the use of Terraform, pipelines and the orchestrator. All in all, it has been an improvement.

What is most valuable?

The two most valuable aspects are the Terraform provider for Datadog and the K8s Orchestrator. People don't take that into account when buying into a tooling product like Datadog in this age where scalability, management, and ease of implementation is key. Other tools not having good IaC products or options is a ball drop. Orchestration for the tools agent is good. Not having to use another tool to manage the agents and config files in mutiple places/instances is a huge win!

What needs improvement?

A big problem with Datadog is the billing. They need to make the billing more user-friendly. I know it like the back of my hand at this point, yet trying to explain it to the C-suite as to why costs went up or are what they are is many times more complicated than it needs to be. I can't even say "why" due to of the lack of metadata tied to billing. For instance, with the AWS Integration Host ingestion, I cant say well this month THESE host got added and thats what caused cost to go up. The billing visibility really needs to be resolved!

For how long have I used the solution?

I'd rate the solution for more than four years.

What do I think about the stability of the solution?

Datadog has always been extremely stable, with outages really only ever creating delays, never actual downtime of the service, which is amazing and impressive.

What do I think about the scalability of the solution?

The solution is very scalable if implemented right and not on top of complicated architecture.

How are customer service and support?

Support is excellent. They are always looking for a resolution, and a ticket is never left unresolved unless the feature just can't exist or isn't currently possible.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We did have New Relic, Datadog, Sumo Logic, Pingdom, and some other custom or third-party tooling. We switched because we wanted everything to be in a single pane and because Datadog is a better solution than the competitors.

How was the initial setup?

For us, set-up is a mixed bag as we support legacy apps and architectures as well as a new microservice architecture. That being said, legacy is somewhat complex just due to the nature of how those apps stack and the underlying infra and configuration and setup. Microservice is a breeze and straight-forward for most of the out-of-the-box stuff.

What about the implementation team?

Our Team of SRE Engineers, Platform Engineers and Cloud Engineers implemented the solution.

What was our ROI?

I can't really speak to ROI; however, from my perspective, we definitely get our money's worth from the product.

What's my experience with pricing, setup cost, and licensing?

Users just just really need to make sure they stay on top of costs and don't let all of the engineers do as they please. Billing with Datadog can get out of hand if you let them. Not everything needs to be monitored.

Which other solutions did I evaluate?

We didn't really need to evaluate other options.

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Datadog
October 2025
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: October 2025.
872,029 professionals have used our research since 2012.
Corey Peoples - PeerSpot reviewer
Senior Custom Software Development Consultant at a tech vendor with 501-1,000 employees
Real User
Top 20
Has improved our ability to identify cloud application issues quickly using trace data and detailed log filtering
Pros and Cons
  • "Datadog has impacted our organization positively because the general feeling is that it's superior to the ELK stack that we used to use, being significantly faster in searching and filtering the information down, as well as providing links to our search criteria that our development teams and cloud operations teams can use to look at the same problems without having to set up their own search and filter criteria."
  • "The hardest thing we experience is just training people on what to search for when identifying a problem in Datadog, and having some additional training that might be easily accessible would probably be a benefit."

What is our primary use case?

My team and I primarily rely on Datadog for logs to our application to identify issues in our cloud-based solution, so we can take the requests and information that's being presented as errors from our customers and use it to identify what the errors are within our back-end systems, allowing us to submit code fixes or configuration changes.

I had an error when I was trying to submit an API request this morning that just said unspecified error in the web interface. I took the request ID and filtered a facet of our logs to include that request ID, and it gave me the specific examples, allowing me to look at the code stack that we had logged to identify what specifically it was failing to convert in order to upload that data.

My team doesn't utilize Datadog logs very often, but we do have quite a few collections of dashboards and widgets that tell us the health of the various API requests that come through our application to identify any known issues with some of our product integrations. It's useful information, but it's not necessarily stuff that our team monitors directly as we're more of a reactionary team.

What is most valuable?

The best features Datadog offers, in my experience, are the ability to filter down by facets very quickly to identify the problems we're experiencing with our individual customers using our cloud application. I really enjoy the trace option so that I can see all of the various components and how they communicate with each other to see where the failures are occurring.

The trace option helps us spot issues by giving access to see if the problem is occurring within our Java components or if it's a result of the SQL queries, allowing us to look at the SQL queries themselves to identify what information it's trying to pull. We can also look at other integrations, whether that's serverless Lambda functions or different components from our outreach.

Datadog has impacted our organization positively because the general feeling is that it's superior to the ELK stack that we used to use, being significantly faster in searching and filtering the information down, as well as providing links to our search criteria that our development teams and cloud operations teams can use to look at the same problems without having to set up their own search and filter criteria.

What needs improvement?

For the most part, the issues that we come across with Datadog are related to training for our organization. Our development and operations teams have done a really good job of getting our software components into Datadog, allowing us to identify them. However, we do have reduced logging in our Datadog environment due to the amount of information that's going through.

The hardest thing we experience is just training people on what to search for when identifying a problem in Datadog, and having some additional training that might be easily accessible would probably be a benefit.

At this point, I do not know what I don't know, so while there may be options for improvements, Datadog works very well for the things that we currently use it for. Additionally, the extra training that would be more easily accessible would be extremely helpful, perhaps something within the user interface itself that could guide us on useful information or how to tie different components or build a good dashboard.

For how long have I used the solution?

I have worked for Calabrio for 13 years.

What do I think about the stability of the solution?

Datadog is very stable.

What do I think about the scalability of the solution?

Datadog's scalability is strong; we've continued to significantly grow our software, and there are processes in place to ensure that as new servers, realms, and environments are introduced, we're able to include them all in Datadog without noticing any performance issues. The reporting and search functionality remain just as good as when we had a much smaller implementation.

Which solution did I use previously and why did I switch?

Previously, we used the ELK stack—Elasticsearch, Logstash, and Kibana—to capture data. Our cloud operations team set that up because they were familiar with it from previous experiences. We stopped using it because as our environment continued to grow, the response times and the amount of data being kept reached a point where we couldn't effectively utilize it, and it lacked the capability to help us proactively identify issues.

What other advice do I have?

A general impression is that Datadog saves time because the ability to search, even over the vast amount of AWS realms and time spans that we have, is significantly faster compared to other solutions that I've used that have served similar purposes.

I would advise others looking into using Datadog to identify various components within their organization that could benefit from pulling that information in and how to effectively parse and process all of it before getting involved in a task, so they know what to look for. Specifically, when searching for data, if a metric can be pulled out into an individual facet and used, the amount of filtering that can be done is significantly improved compared to a general text search.

I would love to figure out how to use Datadog more effectively in the organization work that I do, but that is a discussion I need to have with our operations and research and development teams to determine if it can benefit the customer or the specific implementation software that I work with.

On a scale of one to ten, I rate Datadog a ten out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Flag as inappropriate
PeerSpot user
Patrick Lynch - PeerSpot reviewer
DevOps Solutions Architect at Magnolia CMS
Real User
Top 20
Has improved visibility into performance metrics and helped reduce cloud spend
Pros and Cons
  • "Datadog has positively impacted our organization by allowing us to look at things such as Cloud Spend and make sure our services are running at an optimal performance level."
  • "I rate Datadog an eight out of ten because the expense of using it keeps it from being a nine or ten."

What is our primary use case?

My main use case for Datadog is dashboards and monitoring.

We use dashboards and monitoring with Datadog to monitor the performance of our Nexus Artifactory system and make sure the services are running.

What is most valuable?

The best features Datadog offers are the dashboarding tools as well as the monitoring tools.

What I find most valuable about the dashboarding and monitoring tools in Datadog is the ease of use and simplicity of the interface.

Datadog has positively impacted our organization by allowing us to look at things such as Cloud Spend and make sure our services are running at an optimal performance level.

We have seen specific outcomes such as cost savings by utilizing the cost utilization dashboards to identify areas where we could trim our spend.

What needs improvement?

To improve Datadog, I suggest they keep doing what they're doing.

Newer features using AI to create monitors and dashboards would be helpful.

For how long have I used the solution?

I have been using Datadog for six years.

What do I think about the stability of the solution?

Datadog is stable.

What do I think about the scalability of the solution?

I am not sure about Datadog's scalability.

How are customer service and support?

Customer support with Datadog has been great when we needed it.

I rate the customer support a nine on a scale of 1 to 10.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We did not previously use a different solution.

What was our ROI?

In terms of return on investment, there is a lot of time saved from using the platform.

What's my experience with pricing, setup cost, and licensing?

I was not directly involved in the pricing, setup cost, and licensing details.

Which other solutions did I evaluate?

Before choosing Datadog, we evaluated other options such as Splunk and Grafana.

What other advice do I have?

I rate Datadog an eight out of ten because the expense of using it keeps it from being a nine or ten.

My advice to others looking into using Datadog is to brush up on their API programming skills.

My overall rating for Datadog is eight out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Flag as inappropriate
PeerSpot user
reviewer2507895 - PeerSpot reviewer
Software Architect at Keller Williams Realty, Inc.
Real User
Top 20
Good RUM and APM with good observability
Pros and Cons
  • "We also use APM and metrics to view the status of our Pub/Sub topics and queues, especially when dealing with undelivered messages."
  • "The cost is pretty high."

What is our primary use case?

We use Datadog across the enterprise for observability of infrastructure, APM, RUM, SLO management, alert management and monitoring, and other features. We're also planning on using the upcoming cloud cost management features and product analytics.

For infrastructure, we integrate with our Kube systems to show all hosts and their data.

For APM, we use it with all of our API and worker services, as well as cronjobs and other Kube deployments.

We use serverless to monitor our Cloud Functions.

We use RUM for all of our user interfaces, including web and mobile.

How has it helped my organization?

It's given us the observability we need to see what's happening in our systems, end to end. We get full stack visibility from APM and RUM, through to logging and infrastructure/host visibility. It's also becoming the basis of our incident management process in conjunction with PagerDuty.

APM is probably the most prominent place where it has helped us. APM gives us detailed data on service performance, including latency and request count. This drives all of the work that we do on SLOs and SLAs.

RUM is also prominent and is becoming the basis of our product team's vision of how our software is actually used.

What is most valuable?

APM is a fundamental part of our service management, both for viewing problems and improving latency and uptime. The latency views drive our SLOs and help us identify problems.

We also use APM and metrics to view the status of our Pub/Sub topics and queues, especially when dealing with undelivered messages.

RUM has been critical in identifying what our users are actually doing, and we'll be using the new product analytics tools to research and drive new feature development.

All of this feeds into the PagerDuty integration, which we use to drive our incident management process.

What needs improvement?

Sometimes thesolution changes features so quickly that the UI keeps moving around. The cost is pretty high. Outside of that, we've been relatively happy.

The APM service catalog is evolving fast. That said, it is redundant with our other tools and doesn't allow us to manage software maturity. However, we do link it with our other tools using the APIs, so that's helpful.

Product analytics is relatively new and based on RUM, so it will be interesting to see how it evolves.

Sometimes some of the graphs take a while to load, based on the window of data.

Some stock dashboards don't allow customization. You need to clone them first, but this can lead to an abundance of dashboards. Also, there are some things that stock dashboards do that can't yet be duplicated with custom dashboards, especially around widget organization.

The "top users" widget on the product analytics page only groups by user email, which is unfortunate, since user ID is the field we use to identify our users.

For how long have I used the solution?

I've used the solution for three and a half years.

What do I think about the stability of the solution?

The solution is pretty stable.

What do I think about the scalability of the solution?

The solution is very scalable.

How are customer service and support?

Support was excellent during the sales process, with a huge dropoff after we purchased the product. It has only recently (within the past year) they have begun to reach acceptable levels again.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We did not have a global solution. Some teams were using New Relic.

How was the initial setup?

The instructions aren't always clear, especially when dealing with multiple products across multiple languages. The tracer works very differently from one language to another.

What about the implementation team?

We handled the setup in-house.

What's my experience with pricing, setup cost, and licensing?

We have built our own set of installation instructions for our teams, to ensure consistent tagging and APM setup.

Which other solutions did I evaluate?

We did look at Dynatrace.

What other advice do I have?

The service was great during the initial testing phase. However, once we bought the product, the quality of service dropped significantly. However, in the past year or so, it has improved and is now approaching the level we'd expect based on the cost.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Google
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
reviewer2767266 - PeerSpot reviewer
Manager, Security Engineering at a tech vendor with 51-200 employees
Real User
Has improved incident response time through centralized log monitoring and infrastructure automation
Pros and Cons
  • "Even if something goes wrong and the Datadog tenant becomes completely compromised or if all our monitors were to get erased for whatever reason, we can always restore all our monitoring setup through Terraform, which provides peace of mind."
  • "Datadog can be improved by addressing billing and spend calculation methods, as it would be better if these were more straightforward."

What is our primary use case?

My main use case for Datadog is for security SIEM, log management, and log archiving.

In my daily work, we send all our logs from different cloud services and SaaS products, including Okta, GCP, AWS, GitHub, as well as virtual machines, containers, and Kubernetes clusters. We send all this data to Datadog, and we have numerous different monitors configured. This allows us to create different security features, such as security monitoring and escalate items to a security team on call to create incident response. Archiving is significant because we can always restore logs from the archive and go back in time to see what happened on that exact day. It is very helpful for us to investigate security incidents and infrastructure incidents as well.

Regarding our main use case, we use the Terraform provider for Datadog, which is probably one of the biggest benefits of using Datadog over any other similar tool because Datadog has great Terraform support. We can create all our security monitoring infrastructure using Terraform. Even if something goes wrong and the Datadog tenant becomes completely compromised or if all our monitors were to get erased for whatever reason, we can always restore all our monitoring setup through Terraform, which provides peace of mind.

What is most valuable?

The best features Datadog offers are not necessarily about having the best individual features, but rather the sheer quantity of different features they offer. I appreciate how you can reuse a query across different indexes for logs or security monitoring. The syntax remains consistent for everything, so you do not have to learn multiple languages. Similarly, for different types of monitors, you can always reuse the same templating language, which makes things much more efficient.

Datadog positively impacted our organization by making us more cautious about how we manage our logs. Before Datadog, we would ingest substantial amounts of data without considering indexing priorities. We became more strategic about what we index, particularly for security and cloud audit logs. We improved our approach to indexing retention and determining which types of logs are important. Overall, we enhanced our internal log management practices.

After implementing Datadog, we observed specific improvements in outcomes and metrics. We started analyzing our logs more thoroughly than before, identifying different patterns, and determining log importance levels. We began looking for more signals from audit logs and distinguishing between critical and non-critical information. The most significant metric improvement has been reduced incident investigation time.

What needs improvement?

Datadog can be improved by addressing billing and spend calculation methods, as it would be better if these were more straightforward. Currently, these calculations can be complex. Additionally, while we use Terraform extensively, not everything is available in Terraform. It would be beneficial to have more features supported in Terraform, particularly some security features that have been available for a while but still lack Terraform support.

For how long have I used the solution?

I have been using Datadog for about four years.

What do I think about the stability of the solution?

Datadog is very stable.

What do I think about the scalability of the solution?

Datadog's scalability is excellent. We have never encountered any issues.

How are customer service and support?

The customer support is good. I have never had any issues.

I would rate the customer support as nine out of ten.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We previously used New Relic and switched because it was not very effective.

How was the initial setup?

My experience with pricing, setup cost, and licensing indicates that it was somewhat expensive.

What was our ROI?

I have seen a return on investment with Datadog, particularly in time saved responding to incidents. Regarding staffing requirements, that metric isn't applicable for our use case since log management and security monitoring inherently require personnel to respond. However, it has definitely improved our efficiency in terms of response time, though this isn't a hard metric but rather based on experience.

Which other solutions did I evaluate?

I do not remember evaluating other options before choosing Datadog as it was a long time ago.

What other advice do I have?

I would rate Datadog an eight out of ten because while it is expensive, it offers numerous features, though sometimes it attempts to do too much.

My advice to others considering Datadog is to explore other products and calculate potential spending carefully. If Terraform support is important to your organization, then Datadog is an excellent choice. However, keep in mind that costs will increase significantly as you scale, and different features have varying pricing structures.

Overall rating: 8/10

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Flag as inappropriate
PeerSpot user
reviewer2543758 - PeerSpot reviewer
Engineering Manager at RVshare
User
Top 20
Good visibility into application performance, understanding of end-user behavior, and a single pane of glass view
Pros and Cons
  • "The single pane of glass view with maneuvering between products has helped us to truly understand root causes after incidents."
  • "The wide range of products Datadog now offers can be a bit intimidating to developers."

What is our primary use case?

The primary use case for this solution is to enhance our monitoring visibility, determine the root cause of incidents, understand end-user behaviour from their point of view (RUM), and understand application performance.

Our technical environment consists of a local dev env where Datadog is not enabled, we have deployed environments that range from UAT testing with our product org to ephemeral stacks that our developers use to test there code not on there computer.  We also have a mobile app where testing is also performed.

How has it helped my organization?

Datadog has greatly improved our organization in many ways. Some of those ways include greater visibility into application performance, understanding of end-user behavior, and a single pane of glass view into our entire infrastructure.  

Regarding visibility, our organization previously used New Relic, and when incidents or regressions happened, New Relic's query language was very hard to use. End-user behavior in RUM has improved our ability to know what to focus on. Lastly, the single pane of glass view with maneuvering between products has helped us truly understand root causes after incidents.

What is most valuable?

APM has been a top feature for us. I can speak for all developers here: they use it more often than other products. Due to a standard in tracing (even though it is customizable), engineers find it easier to walk a trace than to understand what went wrong when looking at logging.  

Another feature that I find valuable, though it isn't the first one that comes to mind, is Watchdog. I have found that has been a good source of understanding anomalies and where maybe we (as an organization) need more monitoring coverage.

What needs improvement?

I am not 100% sure how this is done or if it can be though I've had a lot of education I've had to do to ramp developers up on the platform. This feels like the nature of just the sheer growth and number of products Datadog now offers.  

When I first started using the Datadog platform, I thought that was a big pro of the company that the ramp-up time was much quicker, not having to learn a query language. I still believe that to be true when comparing the product to someone like New Relic though with the wide range of products Datadog now offers it can be a bit intimidating to developers to know where to go to find what they want.

For how long have I used the solution?

I have been using the solution at my current company for almost four years, and have used it at my previous company as well.

Which solution did I use previously and why did I switch?

A while ago, we used New Relic, and we switched due to Datadog being a better product.

What about the implementation team?

We did the implementation in-house.

What's my experience with pricing, setup cost, and licensing?

The value compared to pricing is reasonable, though it can be a bit of a sticker shock to some.

Which other solutions did I evaluate?

We did not evaluate other options. 

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
reviewer2767254 - PeerSpot reviewer
Staff Software Engineer at a tech vendor with 1,001-5,000 employees
Real User
Has created intuitive dashboards and streamlined monitoring across teams
Pros and Cons
  • "When an alert fires, our on-call engineer can see the infrastructure metric spike (like CPU), pivot directly to the application traces (APM) running on that host, and see the exact, correlated logs from the services causing the problem—all in one place."
  • "It's not just that Datadog is expensive—it's that the cost is incredibly complex and hard to predict."

What is our primary use case?

Our main use case for Datadog is collecting metrics, specifically things such as latency metrics and error metrics for our services at Procore.

To give a specific example of how I use Datadog for those metrics in my daily work, I had to create a new service to solve a particular problem, which was an API. I used Datadog to get metrics around successful requests, failure requests, and 400 requests. I then created dashboards that showed those metrics along with some latency metrics from the API, and I also built a monitor that triggers and sends an alert whenever we're over a certain number of the failure metrics.

How has it helped my organization?

The single biggest improvement has been breaking down the silos between our teams. Before we adopted it, our developers, operations, and SRE teams all lived in separate tools. Ops had their infrastructure graphs, Devs had their log files, and no one had a complete picture.

Here’s where we’ve seen the most significant impact:

  1. We Find and Fix Problems Drastically Faster: The "single pane of glass" is a real thing for us. When an alert fires, our on-call engineer can see the infrastructure metric spike (like CPU), pivot directly to the application traces (APM) running on that host, and see the exact, correlated logs from the services causing the problem—all in one place. We've cut our Mean Time to Resolution (MTTR) significantly because we're no longer "swivel-chairing" between three different tools trying to manually line up timestamps.
  2. We Are More Proactive and Less Reactive: Features like Watchdog (its anomaly detection) have been crucial. We've been alerted to a slow-building memory leak and an abnormal spike in error rates on a specific API endpoint before they breached our static thresholds and caused a user-facing outage. It's helped us move from a "firefighting" culture to one where we can catch problems before they escalate.

What is most valuable?

The best features of Datadog include a great dashboard, a super simple and easy to use Python library, and an easy monitor, which together provide a really great UI experience.

What makes the dashboard and Python library stand out for me is that they save a lot of time, getting right to the point and being super intuitive.

Datadog has positively impacted my organization by allowing us to have a link to a dashboard for most services.

We have dashboards across the company, which can easily be passed around, making it super easy for everyone to understand the metrics they are looking at.

What needs improvement?

Oh, that's a great question. We actually have a running list of things we'd love to see. Even though we get a ton of value from it, no tool is perfect. Our feedback generally falls into two categories: making the current experience less painful and adding new capabilities we think are the logical next step.

Honestly, our biggest frustrations aren't about a lack of features, but about the management of the platform itself.

  1. Cost Predictability and Governance: This is, without a doubt, our number one issue. It's not just that Datadog is expensive—it's that the cost is incredibly complex and hard to predict. Our bill can fluctuate wildly based on custom metrics, log ingestion, and traces from a new service. We've had to dedicate engineering time just to managing our Datadog costs, creating exclusion filters, and sampling aggressively, which feels like we're being punished for using the product more.

    • How to improve it: We need a "cost calculator" inside the platform. Before I enable monitoring on a new cluster or turn on a new integration, I want Datadog to give me a concrete estimate of what it will cost. We also need better built-in tools for attributing costs back to specific teams or services before the bill arrives.
  2. The Steep Learning Curve and UI Density: The UI is incredibly powerful, but it's dense. For a senior SRE who lives in the tool all day, it's fine. For a new engineer or a developer who only jumps in during an incident, it's overwhelming. We've seen people "click in circles" trying to find a simple stack trace that's buried three layers deep. Building a "perfect" dashboard is still too much of an art form.

For how long have I used the solution?

I have been using Datadog for about five years.

What do I think about the stability of the solution?

Datadog is stable.

Which solution did I use previously and why did I switch?

I did not previously use a different solution.

How was the initial setup?

I did not deal with any of the pricing, setup cost, or licensing.

What about the implementation team?

I do not know if we purchased Datadog through the AWS Marketplace.

What other advice do I have?

My advice to others looking into using Datadog is to just try using it and see how easy it is to use. I found this interview great. On a scale of 1-10, I rate Datadog a 10.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Flag as inappropriate
PeerSpot user
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
Updated: October 2025
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.