The soluton is used for full stack enterprise performance monitoring for our primarily cloud-based stack on AWS. We have implemented monitoring coverage using RUM for critical apps and websites and utilize APM (integrated with RUM) for full stack traceability.
We use Datadog as our primary log repository for all apps and platforms, and the advanced log analytics enable accurate log-based monitoring/alerting and investigations.
Additionally, we some advanced RUM capabilities and metrics to track and optimize client-side user experience. We track SLO's for our critical apps and platforms using Datadog.
We now have full-stack observability, which allows us to better understand application behavior, quickly alert users about issues, and proactively manage application performance.
We've seen value by implementing observability coordinated across multiple applications, allowing us to track things like customer shopping and orders across multiple applications and services.
For critical application launches, we've built dashboards that can track user activity and confirm users are able to successfully utilize new features, tracking user activities in real-time in a war-room situation.
Datadog is our go-to tool for analyzing, understanding, and investigating application performance and behavior.
APM accurately tracks our service performance across our ecosystem. RUM gives us client-side performance and user experience visibility, and the rate of new features implemented in the Digital Experience area recently has been high. Log analytics give us a powerful mechanism for error tracking, research, and analysis.
Custom metrics that we've created allow us to track KPIs in real-time on dashboards. All of these have proven valuable in our organization. Additionally, Datadog product support teams are responsive and have provided timely support when needed.
Agent remote configuration should be provided/improved and streamlined, allowing for config changes/upgrades to be performed via the portal instead of at the host.
Cost tracking via the admin portal is a bit lacking, even though it has gotten better. I'm looking for usage trends (that drive cost) across time and better visibility or notifications about on-demand charges.
Network device and performance monitoring could be improved, as we've faced some limitations in this area.
The Datadog usage-based cost model, while giving us better transparency, is difficult to follow at times and is constantly evolving.
I've used the solution for three years.
Support has been responsive and helpful.
Pricing is straightforward. That said, it's sometimes difficult to estimate usage volumes.
We evaluated Datadog and New Relic in detail and chose Datadog due to their straightforward and competitive pricing model, and their full coverage of monitoring features that we desired, and an easy-to-use UI.