What is our primary use case?
Our primary use of Datadog includes:
- Keeping a close look into our AWS resources. Monitoring our multiple RDS and ElastiCache instances play a big role in our indicators.
- Kubernetes. We aren't using all of the available Kubernetes integrations but the few of them that work out of the box adds great value to our metrics.
- Monitoring and alerting. We wired our most relevant monitoring and alerts to services like PagerDuty, and for the rest of them, we keep our engineers up to date with constant Slack updates.
How has it helped my organization?
Observability is something that a lot of Companies are trying to achieve. Having a clear view, not only of our infrastructure but our apps and services as well, has brought a great added value to our customers.
For a logging solution, we use to have Papertrail. It did the trick but having a single point that manages and indexes all the logs is a BIG improvement. Also, having the option to generate metrics from logs is a game-changer that we're trying to include in our monitoring strategy.
I would like to say the same about APM but the support for PHP seems to be somewhat lacking. It works but I think this service could provide us more information.
What is most valuable?
With respect to logs, we used to integrate various kinds of tools to achieve very basic tasks and it always felt like a very fragile solution. I think logs are by far the most useful feature and at the same time, the one that we could improve.
APM - This is either a hit or miss, allow me to explain: we use various programming languages, mainly PHP and Ruby, and the traces generated don't always provide all of the information we want. For example, we get a great level of detail for the SQL queries that the app generates but not so much for the PHP side. It's hard to track where exactly where all of the bottlenecks are, so some analysis tools for APM could make a good addition.
What needs improvement?
Please add PHP profiling; you already have it for other popular programming languages such as Python and Java, which is great because we have a little bit of those, but our main app is powered by PHP and we don't have profiling for this yet. I guess it's only a matter of time for this to be added, so in the meanwhile, you can consider this review as a vote for the PHP profiling support.
The pricing model could be simplified as it feels a bit outdated, especially when you look at the billing model of compute instances vs the containers instances.
For how long have I used the solution?
We have been using Datadog for one year.
What do I think about the stability of the solution?
It's pretty stable for the main integrations. There was only one time where Datadog was down and that was scary since all of our monitoring is handled by Datadog. There was a lot of uncertainty while the outage was in place.
What do I think about the scalability of the solution?
For everyday use, it's adequate, but for very specific tasks, not so much. There was a time where I had to do a big export and as expected, the API is somewhat limited. Since it was a one-time task, it was not a big deal but if this was a regular task, I wouldn't be happy about it.
How are customer service and technical support?
For small tasks, I think it's great. For specialized support, it feels like you're under-staffed, having to wait days/weeks for a solution is a big NO-NO.
Which solution did I use previously and why did I switch?
I've used a few other products such as NewRelic and AppDynamics. The switch is usually affected by two factors: pricing and convenience.
How was the initial setup?
Getting APM metrics out of Kubernetes is always a painful task. We got support to take a look at this and we had to go through various iterations to get it right, and then AGAIN the next year. This was a bad experience.
What about the implementation team?
It was all implemented in-house. The documentation is fairly up to date, for the most part.
What's my experience with pricing, setup cost, and licensing?
Pricing is somewhat affordable compared to other solutions but in order to really lower the costs of other products you need to plan very carefully your resources usage, otherwise, it can get expensive real quick.
Which other solutions did I evaluate?
Unfortunately, it wasn't my call to include Datadog for this Company but sure I'm glad that the Lead Architect took this decision. It brought many improvements in a small span of time.
What other advice do I have?
Please add PHP profiling soon!
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.