Try our new research platform with insights from 80,000+ expert users
reviewer9637683 - PeerSpot reviewer
Software Engineer at Liberis Limited
User
Top 20
Great for logging and racing but needs better customization
Pros and Cons
  • "Real user monitoring has made triaging any possible bugs our users might face a lot easier."
  • "They need to offer better/more customization on what logs we get and making tracing possible on Edge runtime logs is a real requirement."

What is our primary use case?

We're using the product for logging and monitoring of various services in production environments. 

It excels at providing real-time observability across a wide range of metrics, logs, and traces, making it ideal for DevOps teams and enterprises managing complex environments. 

The platform integrates seamlessly with our cloud services, but browser side logging is a little lagging. 

Dashboards are very useful for quick insights, but can be time consuming to create, and the learning curve is steep. Documentation is vast, but not as detailed as I'd like.

How has it helped my organization?

The solution has made logging and tracing a lot easier, and the RUM sessions are something we did not have previously. Datadog’s real-time alerting and anomaly detection help reduce downtime by allowing us to identify and address performance issues quickly. 

The platform’s intelligent alert system minimises noise, ensuring your team focuses on critical incidents. This results in faster Mean Time to Resolution (MTTR), improving service availability. 

It consolidates monitoring for infrastructure, applications, logs, and security into a single platform. This enables us to view and analyse data across the entire stack in one place, reducing the time spent jumping between tools.

What is most valuable?

Real user monitoring has made triaging any possible bugs our users might face a lot easier. RUM tracks actual user interactions, including page load times, clicks, and navigation flows. This gives our organization a clear picture of how our users are experiencing your application in real-world conditions, including slow-loading pages, errors, and other performance issues that affect user satisfaction. We can then easily prioritize these, and make sure we offer our users the best possible experience.

What needs improvement?

I'm not sure if this is on Datadog, however, Vercel integration is very limited. 

They need to offer better/more customization on what logs we get and making tracing possible on Edge runtime logs is a real requirement. It is extremely difficult, if not completely impossible, to get working traces and logs displayed in Datadog with our stack of Vercel, NexJs, and Datadog. This is a very common stack in front end development and the difficulty of implementing it is unacceptable. Please do something about it soon. Front end logs matter.

Buyer's Guide
Datadog
June 2025
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: June 2025.
856,873 professionals have used our research since 2012.

For how long have I used the solution?

I've used the solution for a little over a year.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Flag as inappropriate
PeerSpot user
Tony Martinez1 - PeerSpot reviewer
Works at VANTA INC
Real User
Top 20
Great logging, session replays, and alerting
Pros and Cons
  • "Dashboards are helpful for reviewing occasionally to get a higher-level overview of what's happening."
  • "The UI has a lot going on. It should be simpler and have a better way to onboard someone new to using Datadog."

What is our primary use case?

Our primary use cases include:

  • Alert on errors customers encounter in our product. We've set up logs that go to slack to tell us when a certain error threshold is hit.
  • Investigate slow page load times. We have pages in our app that are loading slowly and the logs help us figure out which queries are taking the longest time.
  • Metrics. We collect metrics on product usage.
  • Session replays. We watch session replays to see what a user was doing when a page took a long time to load or hit an error. This is helpful.

How has it helped my organization?

It's helped us find bugs that customers are experiencing before they're reported to us. Sometimes, customers don't report errors, so being able to catch errors before they're reported helps us investigate before other users find errors

Datadog has helped us investigate slow page loading times and even see the specific queries that are taking a long time to load

Logging lets us see the context around an error. For example, see if a backend service had an error before it surfaced on the frontend.

Dashboards are helpful for reviewing occasionally to get a higher-level overview of what's happening.

What is most valuable?

The most valuable aspects include: 

  • Logging. Being able to view detailed logs helps debug issues.
  • Session replays. They are helpful for seeing what a customer was doing before they saw an error or had a slow page load
  • Alerting. This is an important part of our on-call process to send alerts to slack when an error threshold is crossed. Alerts/monitors are easy to configure to only alert when we want them to alert.
  • Dashboards. It's helpful to pull up dashboards that show our most common errors or page performance. It's a good way to see how the app is performing from a birds-eye-view.

What needs improvement?

The UI has a lot going on. It should be simpler and have a better way to onboard someone new to using Datadog.

The log querying syntax can be confusing. Usually, I filter by finding a facet in a log and selecting to filter by that facet - but I'm not sure how to write the filter myself

The monitor/alert syntax is also somewhat hard to understand.

Overall, it should be easier to learn how to use the product while you're using the product. Perhaps tooltips or a link to learn more about whatever section you're using.

For how long have I used the solution?

I've used the solution for two years.

Which solution did I use previously and why did I switch?

We did not previously use a different solution.

Which other solutions did I evaluate?

We did not evaluate other options. 

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Flag as inappropriate
PeerSpot user
Buyer's Guide
Datadog
June 2025
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: June 2025.
856,873 professionals have used our research since 2012.
Victor Chen1 - PeerSpot reviewer
Software Engineer at Zip
Real User
Top 20
Good for log ingestion and analyzing logs with easy searchability of data
Pros and Cons
  • "The feature I've found most valuable is the log search feature."
  • "More helpful log search keywords/tips would be helpful in improving Datadog's log dashboard."

What is our primary use case?

We use Datadog as our main log ingestion source, and Datadog is one of the first places we go to for analyzing logs. 

This is especially true for cases of debugging, monitoring, and alerting on errors and incidents, as we use traffic logs from K8s, Amazon Web Services, and many other services at our company to Datadog. In addition, many products and teams at our company have dashboards for monitoring statistics (sometimes based on these logs directly, other times we set queries for these metrics) to alert us if there are any errors or health issues.

How has it helped my organization?

Overall, at my company, Datadog has made it easy to search for and look up logs at an impressively quick search rate over a large amount of logs. 

It seamlessly allows you to set up monitoring and alerting directly from log queries which is convenient and helps for a good user experience, and while there is a bit of a learning curve, given enough time a majority of my company now uses Datadog as the first place to check when there are errors or bugs. 

However, the cost aspect of Datadog is tricky to gauge because it's related to usage, and thus, it is hard to tell the relative value of Datadog year to year.

What is most valuable?

The feature I've found most valuable is the log search feature. It's set up with our ingestion to be a quick one-stop shop, is reliable and quick, and seamlessly integrates into building custom monitors and alerts based on log volume and timeframes. 

As a result, it's easy to leverage this to triage bugs and errors, since we can pinpoint the logs around the time that they occur and get metadata/context around the issue. This is the main feature that I use the most in my workflow with Datadog to help debug and triage issues.

What needs improvement?

More helpful log search keywords/tips would be helpful in improving Datadog's log dashboard. I recently struggled a lot to parse text from raw line logs that didn't seem to match directly with facets. There should be smart searching capabilities. However, it's not intuitive to learn how to leverage them, and instead had to resort to a Python script to do some simple regex parsing (I was trying to parse "file:folder/*/*" from the logs and yet didn't seem to be able to do this in Datadog, maybe I'm just not familiar enough with the logs but didn't seem to easily find resources on how to do this either). 

For how long have I used the solution?

I've used the solution for 10 months.

What's my experience with pricing, setup cost, and licensing?

Beware that the cost will fluctuate (and it often only gets more expensive very quickly).

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Flag as inappropriate
PeerSpot user
reviewer2561892 - PeerSpot reviewer
Principal. Performance Engineering at Invitation Homes
User
A go-to tool for analyzing, understanding, and investigating application performance
Pros and Cons
  • "Log analytics give us a powerful mechanism for error tracking, research, and analysis."
  • "Network device and performance monitoring could be improved, as we've faced some limitations in this area."

What is our primary use case?

The soluton is used for full stack enterprise performance monitoring for our primarily cloud-based stack on AWS. We have implemented monitoring coverage using RUM for critical apps and websites and utilize APM (integrated with RUM) for full stack traceability.  

We use Datadog as our primary log repository for all apps and platforms, and the advanced log analytics enable accurate log-based monitoring/alerting and investigations. 

Additionally, we some advanced RUM capabilities and metrics to track and optimize client-side user experience. We track SLO's for our critical apps and platforms using Datadog.

How has it helped my organization?

We now have full-stack observability, which allows us to better understand application behavior, quickly alert users about issues, and proactively manage application performance.  

We've seen value by implementing observability coordinated across multiple applications, allowing us to track things like customer shopping and orders across multiple applications and services.  

For critical application launches, we've built dashboards that can track user activity and confirm users are able to successfully utilize new features, tracking user activities in real-time in a war-room situation.  

Datadog is our go-to tool for analyzing, understanding, and investigating application performance and behavior.

What is most valuable?

APM accurately tracks our service performance across our ecosystem. RUM gives us client-side performance and user experience visibility, and the rate of new features implemented in the Digital Experience area recently has been high. Log analytics give us a powerful mechanism for error tracking, research, and analysis.  

Custom metrics that we've created allow us to track KPIs in real-time on dashboards. All of these have proven valuable in our organization.  Additionally, Datadog product support teams are responsive and have provided timely support when needed.

What needs improvement?

Agent remote configuration should be provided/improved and streamlined, allowing for config changes/upgrades to be performed via the portal instead of at the host.   

Cost tracking via the admin portal is a bit lacking, even though it has gotten better.  I'm looking for usage trends (that drive cost) across time and better visibility or notifications about on-demand charges.  

Network device and performance monitoring could be improved, as we've faced some limitations in this area.  

The Datadog usage-based cost model, while giving us better transparency, is difficult to follow at times and is constantly evolving.  

For how long have I used the solution?

I've used the solution for three years.

How are customer service and support?

Support has been responsive and helpful.  

How would you rate customer service and support?

Positive

What's my experience with pricing, setup cost, and licensing?

Pricing is straightforward. That said, it's sometimes difficult to estimate usage volumes.

Which other solutions did I evaluate?

We evaluated Datadog and New Relic in detail and chose Datadog due to their straightforward and competitive pricing model, and their full coverage of monitoring features that we desired, and an easy-to-use UI.  

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Flag as inappropriate
PeerSpot user
SecOps Engineer at Ava Labs
User
Helpful support, with centralized pipeline tracking and error logging
Pros and Cons
  • "Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most."
  • "While the documentation is very good, there are areas that need a lot of focus to pick up on the key details."

What is our primary use case?

Our primary use case is custom and vendor-supplied web application log aggregation, performance tracing and alerting. 

How has it helped my organization?

Through the use of Datadog across all of our apps, we were able to consolidate a number of alerting and error-tracking apps, and Datadog ties them all together in cohesive dashboards. 

What is most valuable?

The centralized pipeline tracking and error logging provide a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly. 

Synthetic testing is great, allowing us to catch potential problems before they impact real users. Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most. And the ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders. 

What needs improvement?

While the documentation is very good, there are areas that need a lot of focus to pick up on the key details. In some cases the screenshots don't match the text when updates are made. 

I spent longer than I should trying to figure out how to correlate logs to traces, mostly related to environmental variables.

For how long have I used the solution?

I've used the solution for about three years.

What do I think about the stability of the solution?

We have been impressed with the uptime.

What do I think about the scalability of the solution?

It's scalable and customizable. 

How are customer service and support?

Support is helpful. They help us tune our committed costs and alert us when we start spending out of the on-demand budget.

Which solution did I use previously and why did I switch?

We used a mix of SolarWinds, UptimeRobot, and GitHub actions. We switched to find one platform that could give deep app visibility.

How was the initial setup?

Setup is generally simple. .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.

What about the implementation team?

We implemented the solution in-house.

What was our ROI?

There has been significant time saved by the development team in terms of assessing bugs and performance issues.

What's my experience with pricing, setup cost, and licensing?

I'd advise others to set up live trials to asses cost scaling. Small decisions around how monitors are used can have big impacts on cost scaling. 

Which other solutions did I evaluate?

NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.

What other advice do I have?

We are excited to dig further into the new offerings around LLM and continue to grow our footprint in Datadog. 

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Flag as inappropriate
PeerSpot user
Operations Manager at TodayTix
User
Good dashboards, easy troubleshooting, and integrations
Pros and Cons
  • "The dashboards are super convenient to us for a more zoomed out view of what is going on with each integration that we utilize."
  • "There could be more easily identifiable documentation on how to find different things on the platform."

What is our primary use case?

We utilize Datadog mainly to monitor our API integrations and all of the inventory that comes in from our API partners. Each event has its own ID, so we can trace all activity related to each event and troubleshoot where needed.

How has it helped my organization?

Datadog gives non-dev teams insights as to what all is happening with a particular event as well as flags any errors so that we can troubleshoot more efficiently.

What is most valuable?

The dashboards are super convenient to us for a more zoomed out view of what is going on with each integration that we utilize.

What needs improvement?

There could be more easily identifiable documentation on how to find different things on the platform. It can be overwhelming at first glance, and it's hard to find appropriate documentation on the site to lead you to where you need to be. 

For how long have I used the solution?

I've used the solution for about 1.5 years.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Flag as inappropriate
PeerSpot user
Enrique Bassallo - PeerSpot reviewer
AWS Cloud Architect Consultant at a manufacturing company with 10,001+ employees
Real User
A very solid option with flexible features for analyzing data and enhancing observability
Pros and Cons
  • "The solution allows flexibility and heightened observability for presenting data, creating indicators, and setting service-level objectives."
  • "The solution should provide alerts for cloud outages."

What is our primary use case?

Our company deploys the solution for our customers as an observability tool to define SLOs and SLIs along with logs and metrics. 

The solution includes incident, post-mortem, and root cause analysis that provides a level of truth for incidents and issues with applications. 

We have SREs and teams in operations, management, and applications who all access to the solution and ensure proper integrations. 

How has it helped my organization?

Our company is adopting SRE practices and the solution helps us to align the practices with our site reliability. We get more insights about issues at the outset which helps us to make better decisions such as continuing with agility or stopping to fix issues. 

We are at the beginning stages of using the solution but are defining it as our company standard for use by all teams. 

What is most valuable?

The solution allows flexibility and heightened observability for presenting data, creating indicators, and setting service-level objectives. There are interesting options for monitors and features that offer flexible ways to analyze data.

What needs improvement?

The solution should provide alerts for cloud outages that would allow us to report potential service impacts directly to applications or on the dashboard. Alerts are important because there is a need to determine the impact on your SLAs, SLOs, and SLIs to decide whether to move toward disaster recovery or another environment. 

I would like the ability to share dashboard screenshots via email rather than having to direct others to the dashboard because it sometimes requires permissions. 

For how long have I used the solution?

I have been using the solution for one year. 

What do I think about the stability of the solution?

Our SRE teams report that the solution is very solid, stable, and reliable. 

What do I think about the scalability of the solution?

We are using the SaaS model so do not manage scalability because the product takes care of our scaling needs with no issues. 

How are customer service and support?

I don't have direct access with support but our SRE teams work with them and are satisfied. 

Which solution did I use previously and why did I switch?

Our company used other products in the past but wanted to move to the cloud. We found that the solution was a very good fit for us. 

How was the initial setup?

The initial setup is not that easy because there are many choices for configuration, workloads, servers, and containers. 

We utilized technical support to help us understand integration and prepare patterns for other applications. 

What about the implementation team?

We created small configurations and then utilized technical support to configure an application selected from our portfolio. 

We utilize a team approach for implementations that sometimes includes SREs. 

What's my experience with pricing, setup cost, and licensing?

The solution is fairly priced but history and log storage can get costly depending on your needs. 

I rate the cost a four out of ten. 

What other advice do I have?

The solution is appropriate for companies that are moving to the cloud and want a very solid tool for observability, logging, and everything related to SRE practices. 

I rate the solution a nine out of ten. 

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Other
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
Nuno Rosa - PeerSpot reviewer
Principal Consultant at Infosys
MSP
Top 10
Easy to set up and good UI but needs better customization capabilities
Pros and Cons
  • "The many dozens of integrations that the solution brings out of the box are excellent."
  • "Deploying the agents is still very manual."

What is our primary use case?

The solution is basically used for servers and applications.

What is most valuable?

The UI, basically, is the most valuable aspect of the solution. I really like the look and feel of the solution. It's not very distinctive now since other players have caught up, however, they were the first in the market to present such an effective UI. 

The many dozens of integrations that the solution brings out of the box are excellent.

It's easy to set up.

What needs improvement?

Deploying the agents is still very manual. 

Network monitoring could be better or rolled into this solution so that you do not have to buy a different product.

Customization of the tool itself should be taken into account. At the moment, although what they provide out of the box is good, they don't offer many customization possibilities. I know it's difficult, however, it's something that they would need to look at. When the customer gets some customization, they want customized requirements. We cannot do it. 

For how long have I used the solution?

I've been dealing with the solution for five years. 

What do I think about the stability of the solution?

It's quite stable. I have never had an issue in regard to reliability, so it's very stable.

What do I think about the scalability of the solution?

It's very scalable. I have not reached the limits at any time, never in the solution. I've never seen any performance degradation in large environments. I would say it's very scalable.

Each client has its own instance. We do not share instances with multiple customers. There's usually between 20 and 30, depending on the customer.

How are customer service and support?

I never use technical support, to be honest.

How was the initial setup?

The initial setup for the solution itself is quite straightforward. You just set it up and that's it. However, when it comes to, for instance, deploying the agents to the servers, or at least the target machines, it's still a manual task. They still do not have centralized management of the FD agents, which basically delays the deployment of the solution. It's very manual still.

How long it takes to deploy is difficult to pin down. It will vary based on the environment size. Obviously, if it's ten servers, it will basically take half an hour or one hour. If it's 5,000, obviously, besides the number of notes, other considerations will need to be taken into account. If t's a large environment, it will take much longer. We would need to basically develop a solution, or an effective process to deploy the agent and configure them in a standardized manner. This is something that the tool itself or the tool provider does not offer out of the box. You need to build it. That's a drawback.

How many people you need for the deployment and maintenance processes depends on the environment's size and geographical area. On average,  I would usually require for every 500 notes, one resource for implementation. Then for overall support, I usually put one resource per 1500.

What was our ROI?

Before, the ROI was much higher as you would not have to compete with any kind of tool since they were very good in the space. However, with time, other companies have picked up the slack. Now, you have other tools which provide a higher ROI. I cannot give a specific ROI percentage since I don't use it for personal use with deployment. We deploy it on behalf of customers. Obviously, depending on the deal, depending on the size, and the ROI will vary. If people are looking for a global monitoring solution in the same tool as Datadog network monitoring, they are always hindered as Datadog does not provide an adequate solution for it. That kind of decreases the ROI since you still need to get another tool to do the network monitoring.

What's my experience with pricing, setup cost, and licensing?

The licensing is a bit complicated. When you pay for it on a note basis, that's perfectly fine. However, when you put log analytics on top of it, it's based on traffic. This is actually an issue. It gets complicated.

What other advice do I have?

I'm providing Datadog. I'm a retailer.

I would recommend the solution. 

I would suggest if their environment is in the cloud, companies have their environments in the public cloud, such as GCP, Azure, or AWS. Datadog is a very good candidate to provide an overview of the monitoring. If you want to consider a hybrid solution where systems and servers and applications also provide a good solution and have a lot of APM capabilities, the only drawback will be network monitoring. When you grab a tool that you want to basically monitor the entire environment at a single point of contact, with Datadog, it's possible, however, there's not an effective tool to do network monitoring.

I'd rate the solution seven out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer:
PeerSpot user
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
Updated: June 2025
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.