Try our new research platform with insights from 80,000+ expert users
Head of Software at a manufacturing company with 51-200 employees
User
Top 5
Sep 30, 2024
Great for web application log aggregation, performance tracing, and alerting
Pros and Cons
  • "Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most."
  • "I'd like to see an expansion of the Android and IOS apps to have a simplified CI/CD pipeline history view."

What is our primary use case?

Our primary use case is for custom and vendor-supplied web application log aggregation, performance tracing, and alerting. 

We run a mix of AWS EC2, Azure serverless, and colocated VMWare servers to support higher education web applications. We're managing a hybrid multi-cloud solution across hundreds of applications is always a challenge. 

Datadog agents are on each web host, and we have native integrations with GitHub, AWS, and Azure to get all of our instrumentation and error data in one place for easy analysis and monitoring.

How has it helped my organization?

Through use of Datadog across all of our apps, we were able to consolidate a number of alerting and error-tracking apps. Datadog ties them all together in cohesive dashboards. Whether the app is vendor supplied or we built it ourselves, the depth of tracing, profiling, and hooking into logs is all obtainable and tunable. Both legacy .NET Framework and Windows Event Viewer and cutting edge .NET Core with streaming logs all work. The breath of coverage for any app type or situation is really incredible. It feels like there's nothing we can't monitor.

What is most valuable?

When it comes to Datadog, several features have proven particularly valuable. 

The centralized pipeline tracking and error logging provide a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly. 

Synthetic testing has been a game-changer, allowing us to catch potential problems before they impact real users. 

Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most. And the ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders. 

Together, these features form a powerful toolkit that helps us maintain high performance and reliability across our applications and infrastructure, ultimately leading to better user satisfaction and more efficient operations.

What needs improvement?

I'd like to see an expansion of the Android and IOS apps to have a simplified CI/CD pipeline history view. 

I like the idea of monitoring on the go, yet it seems the options are still a bit limited out of the box. 

While the documentation is very good considering all the frameworks and technology Datadog covers, there are areas - specifically .NET Profiling and Tracing of IIS hosted apps - that need a lot of focus to pick up on the key details needed. 

In some cases the screenshots don't match the text as updates are made.

Buyer's Guide
Datadog
January 2026
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: January 2026.
880,901 professionals have used our research since 2012.

For how long have I used the solution?

I've been using the solution for about three years.

What do I think about the stability of the solution?

We have been impressed with the uptime. It offers clean and light resource usage of the agents.

What do I think about the scalability of the solution?

The solution scales well and is customizable. 

How are customer service and support?

Customer support is always helpful to help us tune our committed costs and alerting us when we start spending out of the on demand budget.

Which solution did I use previously and why did I switch?

We used a mix of a custom error email system, SolarWinds, UptimeRobot, and GitHub actions. We switched to find one platform that could give deep app visibility regardless of whether it is Linux or Windows or Container, cloud or on-prem hosted.

How was the initial setup?

The implementation is generally simple. .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.

What about the implementation team?

We implemented the setup in-house. 

What was our ROI?

We've witnessed significant time saved by the development team assessing bugs and performance issues.

What's my experience with pricing, setup cost, and licensing?

Set up live trials to asses cost scaling. Small decisions around how monitors are used can have big impacts on cost scaling. 

Which other solutions did I evaluate?

NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.

What other advice do I have?

We're excited to dig further into the new offerings around LLM and continue to grow our footprint in Datadog. 

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Product Engineering Manager at a computer software company with 201-500 employees
User
Top 20
Sep 30, 2024
Good logging, easy to find issues, and saves time
Pros and Cons
  • "The logging in general is one of my favorite features."
  • "I love to have some DD guru come in and do a department training directly at our setup."

What is our primary use case?

We use the solution for APM, AWS, Lambda, logging, and infrastructure. We have many different things all over AWS, and having one place to look is great.

We have all sorts of different AWS things out there that are in C# and Node. Having a single place to log and APM into is very important to us.

Keeping track of the cloud infrastructure is also important. We have Lambda, containers, EC2, etc.

Having a super simple interface to filter the searching for APM and logging is great. It is super easy to show people how to use. This is super important to us.

How has it helped my organization?

Finding issues quickly is super important. Being able to create dashboards and alert on issues.

Having the ability to create dashboards has really taught us how to utilize the searching part of the system. We are able to share them, and build upon them so easily. Many iterations later people are putting some solid information out there.

Alerting is also important to us. We have set up many alerts that help us spot issues in the platform before they become bigger issues. This has enabled my teams to use incidents and address the issues so they are no longer problems.

What is most valuable?

Alerting on running systems is very helpful. Finding issues is quick. We have one place for logging, searching through. Being able to save these and reference them in the future and build upon them.

The logging in general is one of my favorite features. The search is so straight forward and easy to use. Just being able to click on a field and add it to search has taught me so much about the interface, It might not be as useful without a shortcut like that to teach me the system. We have Cloudflare logs in there, and I have no idea sometimes how to filter on such a buried piece of JSON. That is where the interface helps me by clicking on the add to search I get what I need.

What needs improvement?

The "Pager Duty" replacement is something we are very interested in. We only really use pager duty to call the team when things are down.

I love to have some DD guru come in and do a department training directly at our setup. We would love to have someone come in and show us the things we could do better within our current setup.

Also saving a bit of cash would also help if there are things we are doing that are costing us. It's a big enough tool that it is tough to have someone dedicated to manage. 

For how long have I used the solution?

I've used the solution for a bit over a year at this point.

What do I think about the stability of the solution?

The stability seems good here too.

What do I think about the scalability of the solution?

Scalability seems good to me. I have no complaints

How are customer service and support?

I get answers from our contact, and one team member did reach out. It went well.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We used Loggly. 

We switched because we wanted an all-in-one tool

How was the initial setup?

Some parts of our setup were tough. Some Windows container setups cost us a lot of time.

The AWS infrastructure was tough to fully turn on due to the large cost of everything being run.

What about the implementation team?

We handled the setup ourselves in-house.

What was our ROI?

This cost us more overall. ROI is hard to sell. That said, I can find issues way faster and see what is going on in my entire platform. I pay back the cost every month with productivity. 

What's my experience with pricing, setup cost, and licensing?

It is going to cost you more than you think to keep everything running. We saw value in the one-for-all solution, however, it came at a premium to what we were paying. 

Which other solutions did I evaluate?

We did evaluate Dynatrace.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Datadog
January 2026
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: January 2026.
880,901 professionals have used our research since 2012.
Sid Nigam - PeerSpot reviewer
Works at a transportation company with 51-200 employees
User
Top 20
Sep 23, 2024
Unified platform with customizable dashboards and AI-driven insights
Pros and Cons
  • "The infrastructure monitoring capabilities, especially for our Kubernetes clusters, have helped us optimize resource allocation and reduce costs."
  • "We'd like to see more advanced incident management capabilities integrated directly into the platform."

What is our primary use case?

Our primary use case for this solution is comprehensive cloud monitoring across our entire infrastructure and application stack. 

We operate in a multi-cloud environment, utilizing services from AWS, Azure, and Google Cloud Platform. 

Our applications are predominantly containerized and run on Kubernetes clusters. We have a microservices architecture with dozens of services communicating via REST APIs and message queues. 

The solution helps us monitor the performance, availability, and resource utilization of our cloud resources, databases, application servers, and front-end applications. 

It's essential for maintaining high availability, optimizing costs, and ensuring a smooth user experience for our global customer base. We particularly rely on it for real-time monitoring, alerting, and troubleshooting of production issues.

How has it helped my organization?

Datadog has significantly improved our organization by providing us with great visibility across the entire application stack. This enhanced observability has allowed us to detect and resolve issues faster, often before they impact our end-users. 

The unified platform has streamlined our monitoring processes, replacing several disparate tools we previously used. This consolidation has improved team collaboration and reduced context-switching for our DevOps engineers. 

The customizable dashboards have made it easier to share relevant metrics with different stakeholders, from developers to C-level executives. We've seen a marked decrease in our mean time to resolution (MTTR) for incidents, and the historical data has been invaluable for capacity planning and performance optimization. 

Additionally, the AI-driven insights have helped us proactively identify potential issues and optimize our infrastructure costs.

What is most valuable?

We've found the Application Performance Monitoring (APM) feature to be the most valuable, as it provides great visibility on trace-level data. This granular insight allows us to pinpoint performance bottlenecks and optimize our code more effectively. 

The distributed tracing capability has been particularly useful in our microservices environment, helping us understand the flow of requests across different services and identify latency issues. 

Additionally, the log management and analytics features have greatly improved our ability to troubleshoot issues by correlating logs with metrics and traces. 

The infrastructure monitoring capabilities, especially for our Kubernetes clusters, have helped us optimize resource allocation and reduce costs.

What needs improvement?

While Datadog is an excellent monitoring solution, it could be improved by building more features to replace alerting apps like OpsGenie and PagerDuty. Specifically, we'd like to see more advanced incident management capabilities integrated directly into the platform. This could include features like sophisticated on-call scheduling, escalation policies, and incident response workflows. 

Additionally, we'd appreciate more customizable machine learning-driven anomaly detection to help us identify unusual patterns more accurately. Improved support for serverless architectures, particularly for monitoring and tracing AWS Lambda functions, would be beneficial. 

Enhanced security monitoring and threat detection capabilities would also be valuable, potentially reducing our reliance on separate security information and event management (SIEM) tools.

For how long have I used the solution?

I've used the solution for two years.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Head of Software at a manufacturing company with 51-200 employees
User
Top 5
Sep 20, 2024
Good centralized pipeline tracking and error logging with very good performance
Pros and Cons
  • "Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most."
  • "In some cases the screenshots don't match the text as updates are made."

What is our primary use case?

Our primary use case is custom and vendor-supplied web application log aggregation, performance tracing and alerting. 

We run a mix of AWS EC2, Azure serverless, and colocated VMWare servers to support higher education web applications. 

Managing a hybrid multi-cloud solution across hundreds of applications is always a challenge. 

Datadog agents on each web host and native integrations with GitHubAWS, and Azure get all of our instrumentation and error data in one place for easy analysis and monitoring.

How has it helped my organization?

Using Datadog across all of our apps, we were able to consolidate a number of alerting and error-tracking apps, and Datadog ties them all together in cohesive dashboards. 

Whether the app is vendor-supplied or we built it ourselves, the depth of tracing, profiling, and hooking into logs is all obtainable and tunable. Both legacy .NET Framework and Windows Event Viewer and cutting-edge .NET Core with streaming logs all work. 

The breadth of coverage for any app type or situation is really incredible. It feels like there's nothing we can't monitor.

What is most valuable?

When it comes to Datadog, several features have proven particularly valuable. For example, the centralized pipeline tracking and error logging provide a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly. 

Synthetic testing has been a game-changer, allowing us to catch potential problems before they impact real users. 

Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most. And the ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders. 

Together, these features form a powerful toolkit that helps us maintain high performance and reliability across our applications and infrastructure, ultimately leading to better user satisfaction and more efficient operations.

What needs improvement?

They need an expansion of the Android and IOS apps to provide a simplified CI/CD pipeline history view. 

I like the idea of monitoring on the go. That said, it seems the options are still a bit limited out of the box. 

While the documentation is very good considering all the frameworks and technology Datadog covers, there are areas - specifically .NET Profiling and Tracing of IIS hosted apps - that need a lot of focus to pick up on the key details needed. 

In some cases the screenshots don't match the text as updates are made. I spent longer than I should figuring out how to correlate logs to traces, mostly related to environmental variables.

For how long have I used the solution?

I've used the solution for about three years.

What do I think about the stability of the solution?

We have been impressed with the uptime and clean and light resource usage of the agents.

What do I think about the scalability of the solution?

The solution has been very scalable and very customizable.

How are customer service and support?

Support is always helpful to help us tune our committed costs and alert us when we start spending out of the on-demand budget.

Which solution did I use previously and why did I switch?

We used a mix of a custom error email system, SolarWinds, UptimeRobot, and GitHub actions. We switched to find one platform that could give deep app visibility regardless of Linux or Windows or Container, cloud or on-prem hosted.

How was the initial setup?

The implementation is generally simple. That said, .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.

What about the implementation team?

The solution was implemented in-house. 

What was our ROI?

Our ROI has been significant time saved by the development team assessing bugs and performance issues.

What's my experience with pricing, setup cost, and licensing?

Set up live trials to asses cost scaling. Small decisions around how monitors are used can impact cost scaling. 

Which other solutions did I evaluate?

NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.

What other advice do I have?

We are excited to explore the new offerings around LLM further and continue to expand our presence in Datadog. 

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Neil Elver - PeerSpot reviewer
Application Development Team Lead at a university with 201-500 employees
User
Top 10
Sep 20, 2024
Good synthetic testing, centralized pipeline tracking and error logging
Pros and Cons
  • "Synthetic testing has been a game-changer, allowing us to catch potential problems before they impact real users."
  • "I'd like to see an expansion of the Android and IOS apps to have a simplified CI/CD pipeline history view."

What is our primary use case?

Our primary use case is custom and vendor-supplied web application log aggregation, performance tracing and alerting. 

We run a mix of AWS EC2, Azure serverless, and colocated VMWare servers to support higher education web applications. 

Managing a hybrid multi-cloud solution across hundreds of applications is always a challenge. Datadog agents on each web host and native integrations with GitHubAWS, and Azure get all of our instrumentation and error data in one place for easy analysis and monitoring.

How has it helped my organization?

Through the use of Datadog across all of our apps, we were able to consolidate a number of alerting and error-tracking apps, and Datadog ties them all together in cohesive dashboards. Whether the app is vendor-supplied or we built it ourselves, the depth of tracing, profiling, and hooking into logs is all obtainable and tunable. Both legacy .NET Framework and Windows Event Viewer and cutting-edge .NET Core with streaming logs all work. The breadth of coverage for any app type or situation is really incredible. It feels like there's nothing we can't monitor.

What is most valuable?

When it comes to Datadog, several features have proven particularly valuable. 

The centralized pipeline tracking and error logging provide a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly. 

Synthetic testing has been a game-changer, allowing us to catch potential problems before they impact real users. Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most. And the ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders. 

Together, these features form a powerful toolkit that helps us maintain high performance and reliability across our applications and infrastructure, ultimately leading to better user satisfaction and more efficient operations.

What needs improvement?

I'd like to see an expansion of the Android and IOS apps to have a simplified CI/CD pipeline history view. I like the idea of monitoring on the go, however, it seems the options are still a bit limited out of the box. 

While the documentation is very good considering all the frameworks and technology Datadog covers, there are areas - specifically .NET Profiling and Tracing of IIS-hosted apps - that need a lot of focus to pick up on the key details needed. In some cases the screenshots don't match the text as updates are made. I feel I spent longer than I should figuring out how to correlate logs to traces, mostly related to environmental variables.

For how long have I used the solution?

I've used the solution for about three years.

What do I think about the stability of the solution?

We have been impressed with the uptime and clean and light resource usage of the agents.

What do I think about the scalability of the solution?

The solution was very scalable and very customizable.

How are customer service and support?

Sales service is always helpful in tuning our committed costs and alerting us when we start spending outside the on-demand budget.

Which solution did I use previously and why did I switch?

We used a mix of a custom error email system, SolarWinds, UptimeRobot, and GitHub actions. We switched to find one platform that could give deep app visibility regardless of Linux, Windows, Container, cloud or on-prem hosted.

How was the initial setup?

The setup is generally simple. That said, .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.

What about the implementation team?

The solution was iImplemented in-house. 

What was our ROI?

I'd count our ROI as significant time saved by the development team assessing bugs and performance issues.

What's my experience with pricing, setup cost, and licensing?

It's a good idea to set up live trials to asses cost scaling. Small decisions around how monitors are used can have big impacts on cost scaling. 

Which other solutions did I evaluate?

NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.

What other advice do I have?

We are excited to dig further into the new offerings around LLM and continue to grow our footprint in Datadog. 

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Michael Johnston1 - PeerSpot reviewer
Senior Software Engineer at a tech vendor with 5,001-10,000 employees
Vendor
Top 10
Sep 20, 2024
A great tool with an easy setup and helpful error logs
Pros and Cons
  • "The setup cost was minimal."
  • "We did have an issue where a synthetic test was set up before the holiday break, and we were quickly charged a great amount. Our team worked with Datadog, and they were able to help us out since it was inadvertent on our end and was a user error."

What is our primary use case?

We currently have an error monitor to monitor errors on our prod environment.  Once we hit a certain threshold, we get an alert on Slack. This helps address issues the moment they happen before our users notice. 

We also utilize synthetic tests on many pages on our site. They're easy to set up and are great for pinpointing when a bug is shipped, but they may take down a less visited page that we aren't immediately aware of. It's a great extra check to make sure the code we ship is free of bugs.

How has it helped my organization?

The synthetic tests have been invaluable. We use them to check various pages and ensure functionality across multiple areas. Furthermore, our error monitoring alerts have been crucial in letting us know of problems the moment they pop up.  

Datadog has been a great tool, and all of our teams utilize many of its features.  We have regular mob sessions where we look at our Datadog error logs and see what we can address as a team. It's been great at providing more insight into our users and logging errors that can be fixed.

What is most valuable?

The error logs have been super helpful in breaking down issues affecting our users. Our monitors let us know once we hit a certain threshold as well, which is good for momentary blips and issues with third-party providers or rollouts that we have in the works. Just last week, we had a roll-out where various features were broken due to a change in our backend API. Our Datadog logs instantly notified us of the issues, and we could troubleshoot everything much more easily than just testing blind. This was crucial to a successful rollout.

What needs improvement?

I honestly can't think of anything that can be improved. We've started using more and more features from our Datadog account and are really grateful for all of the different ways we can track and monitor our site. 

We did have an issue where a synthetic test was set up before the holiday break, and we were quickly charged a great amount. Our team worked with Datadog, and they were able to help us out since it was inadvertent on our end and was a user error. That was greatly appreciated and something that helped start our relationship with the Datadog team.

For how long have I used the solution?

We've been using Datadog for several months. We started with the synthetic tests and now use It for error handling and in many other ways.

What do I think about the stability of the solution?

Stability has been great. We've had no issues so far.

What do I think about the scalability of the solution?

The solution is very easy to scale. We've used it on multiple clients.

How are customer service and support?

We had a dev who had set up a synthetic test that was running every five minutes in every single region over the holiday break last year. The Datadog team was great and very understanding and we were able to work this out with them.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We didn't have any previous solution. At a previous company, I've used Sentry. However, I also find Datadog to be much easier, plus the inclusion of synthetic tests is awesome.

How was the initial setup?

The documentation was great and our setup was easy.

What about the implementation team?

We implemented the solution in-house.

What was our ROI?

This has had a great ROI as we've been able to address critical bugs that have been found via our Datadog tools.

What's my experience with pricing, setup cost, and licensing?

The setup cost was minimal. The documentation is great and the product is very easy to set up.

Which other solutions did I evaluate?

We also looked at other providers and settled on Datadog. It's been great to use across all our clients.

Which deployment model are you using for this solution?

Private Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Tejaswini A - PeerSpot reviewer
Application Engineer at a financial services firm with 10,001+ employees
Real User
Top 20
Jul 3, 2024
Consolidates all our logs into a single place, making it easier to find errors
Pros and Cons
  • "The best way it has helped us is by consolidating all our logs into a single place and making it easier to find errors."
  • "Another issue that I have is with the search syntax, it could be simpler and it feels like there are too many ways to do the same things."

What is our primary use case?

We have a tech stack including all backend services written in TS/Node (mostly) and as a full stack engineer, it is crucial to keep track of new and existing errors. Our logs have been consolidated in Datadog and are accessible for search and review, so the service has become a daily tool for my work. 

More recently, session replay has been adopted at my company, but I do not like it so much because the UI elements are not in their place, so it is very hard to see what the users on the web app are actually clicking on.

How has it helped my organization?

The best way it has helped us is by consolidating all our logs into a single place and making it easier to find errors. Previously using AWS Cloudwatch was cumbersome and time-consuming. One issue I do have with logs is the length of time they are on the platform. Some issues happen sporadically, so it would be good to have logs for longer than one month by default or make it a configuration. 

Another issue that I have is with the search syntax, it could be simpler and it feels like there are too many ways to do the same things.

What is most valuable?

Logs search is the most valuable feature because it has consolidated all of our backend services logs into one place. Now we can see the relationship between them as requests are going from one service to other dependencies. 

What needs improvement?

One issue I do have with logs is the length of time they are on the platform. Some issues happen sporadically, so it would be good to have logs for longer than one month by default or make it a configuration. I have yet to try rehydrating logs, so this might be an option I need to try. Another issue I have is with the search syntax, it could be simpler. The syntax is a bit cumbersome and there is not an intuitive to save them to look for similar searches in the future. 

Finally, while my company replaced a different tool for session replay with DataDog's version, I find it clunky and in need of further improvements. For example, when troubleshooting a web portal issue, it is super important to know what the user clicked, but the elements are not where they should be in the replay.

It is also hard to find details about the sessions, and metadata such as user email, account, etc. that exist on other services with replay features.

For how long have I used the solution?

I have been using Datadof for approximately five years.

What do I think about the stability of the solution?

So far we haven't had any issues with uptime and Datadog has been available when needed.

What do I think about the scalability of the solution?

It seems to scale well as we continue to add services that need monitoring.

How are customer service and support?

I haven't had to contact support.

Which solution did I use previously and why did I switch?

Cloudwatch was not a great tool for what we need to do to troubleshoot issues.

What about the implementation team?

We deployed it in-house with intermediate expertise.

What was our ROI?

I am not sure how much we are paying, but I use the app often enough to feel like we are getting a good ROI.

Which other solutions did I evaluate?

I was not involved in the choosing process as a software engineer

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Caleb Parks - PeerSpot reviewer
Works at a marketing services firm with 201-500 employees
User
Top 20
Oct 1, 2024
Lots of features with a rapid log search and an easy setup process
Pros and Cons
  • "The ease of graph building is nice, and MUCH easier than Prometheus."
  • "It is far too easy to run up huge unexpected costs."

What is our primary use case?

We use the solution for logs, infrastructure metrics, and APM. We have many different teams using it across both product and data engineering.

How has it helped my organization?

The solution has improved our observability by giving us rapid log search, a correlation between hosts/logs/APM, and tons of features in one website.

What is most valuable?

I enjoy the rapid log search. It's such a pleasure to quickly find what you're looking for. The ease of graph building is also nice, and MUCH easier than Prometheus.

What needs improvement?

It is far too easy to run up huge unexpected costs. The billing model is not flexible enough to handle cases where you temporarily have thousands of nodes. It is not price effective for monitoring big data jobs. We had to switch to open-source Grafana plus Prometheus for those.

It would be cool to have an open telemetry agent that automatically APM instruments everything in the next release.

For how long have I used the solution?

I've used the solution for three years.

What do I think about the stability of the solution?

I'd rate the stability ten out of ten.

What do I think about the scalability of the solution?

I'd rate the scalability ten out of ten.

Which solution did I use previously and why did I switch?

We did not previously use a different solution.

How was the initial setup?

The setup is very straightforward. Users just install the helm chart, and boom, you're done.

What about the implementation team?

We handled the setup in-house.

What's my experience with pricing, setup cost, and licensing?

Be careful about pricing. Make sure you understand the billing model and that there are multiple billing models available. Set up alarms to alert you of cost overruns before they get too bad.

Which other solutions did I evaluate?

We've never evaluated other solutions.

What other advice do I have?

It's a great product. However, you have to pay for quality.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
Updated: January 2026
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.