The primary use case is application monitoring. We also use it set custom metrics and watch our AWS metrics, as well as data.
At my current job, I have only use it a couple months. However, I used it for a few years at a previous company.
The primary use case is application monitoring. We also use it set custom metrics and watch our AWS metrics, as well as data.
At my current job, I have only use it a couple months. However, I used it for a few years at a previous company.
It lets us react more quickly to things going wrong. Whereas before, it might have been 30 minutes to an hour before we noticed something going on, we will know within a minute or two if something is off, which will let us essentially get something back up and running faster for our customers, which is revenue.
Its most valuable feature is the monitoring, such as all the custom metrics that Datadog imports from AWS. In addition, the specific monitoring where you can set up an alert to a bunch of different services.
Some of their newer solutions are interesting, like their logging, but they are not fleshed out. They could use more metrics or synthetics, which would be really helpful.
I would love to see support for front-end and mobile applications. Right now, it is mostly all back-end stuff. Being able to do some integration with our front-end products would be awesome.
It is very stable. Both times that I have worked with Datadog, we haven't had any issues with them going down. Or, if they did, we didn't know, which is good.
At the previous company that I worked at, we threw a lot at them all at once.
Because this is a newer integration, we are putting less stress on the tool. We are still working on integrating it into our platform.
It has scaled great. I haven't run into any problems anywhere that I've used it. They have handled everything that we have needed them to.
We are a 100 person company with 20 engineers.
The technical support is great. They respond quickly. They know what they are talking about and dig right in. If they don't know the answer, they can get it to us very quickly.
The integration and configuration through AWS was pretty smooth. It was easy to set up and start using. The documentation was clear. So, it worked really well.
We did the integration and configuration through AWS ourselves.
We haven't seen ROI at my current company. The solution is too new.
At my last company, we did see ROI, specifically around response time. We could get to mission critical things that were down and losing revenue on immediately. So, the product paid itself back.
The pricing and licensing through AWS Marketplace has been good. It would be nice if it was cheaper, but their pricing is reasonable for what it is. Sometimes, for their newer features, they charge as if it's fully fleshed out, even though it is a newer feature and it may have less stuff than their other items. So, if they would scale the pricing appropriately as they add more stuff to it, that would makes sense. The pricing should reflect the abilities of the features.
We looked into self-hosting something, like Prometheus. We also evaluated New Relic.
We chose Datadog for its ease of use in getting set up and what they offered us.
Take the time to explore it and see all the metrics which are available. The metrics make the reporting better. Spend the time and learn the metrics. The things that they can send and give you are good. Learn how to aggregate them and how to write more complex queries, which they do a good job of showing how to do, but I found that newer people don't do this. They just try to use the baseline set of features. Doing the more complex stuff adds significant value.
We have PagerDuty integrated with it, as well as all of AWS. Those are the big ones we have running through it. It integrates well. It essentially replaces CloudWatch, so we can just use Datadog, which is nice. The biggest thing that they provide is putting everything in one spot.
I have just used the AWS version.
We mainly use it to send metrics about CV and memory usage, in addition to the number of files descriptors on a socket.
We are working as an SMS segregator. Therefore, we send a lot of SMS message to customers. This product holds one of the most important dashboards for our traffic from each server or cluster on our Gateway. It gives us very good information, mainly for the operations team and other sales guys, about what each account is sending, how often, etc.
Using the data, our operation teams works with the dashboards to get their statistics, analytics, etc.
The ability to create dashboards and matrices with graphs. This information is useful to us.
I would like testing for data in the future. That would be really nice.
Also, I would like some additional enhancement in the visuals.
The stability is good. For every message that we send, we have a corresponding metric. We send one to two million messages per server a day.
The scalability is good.
We have over a 1000 accounts on three servers. There are other servers, which work as a helper server. However, the servers which help aid traffic or do the heavy lifting are three main servers, currently. They are hosted on Amazon: large machines with large instances.
I have not used Datadog's technical support.
The AWS integration and configuration is pretty good. It has multiple languages and platforms.
Give it a try. It is a good tool for creating statistics and analytics with data.
Anyone who uses a large amount of data and want insights on the analytics of their data. They can just dump into the tool, and it will do all the heavy lifting.
We use it to monitor our infrastructure, particularly our different EC2 instances, and our containers. We also use it to capture our logs.
We have a better grasp of what is occurring during the deployment cycle. If something fails, we have an idea what has failed, where it has failed, and how it failed to better mitigate the situation.
It is a good one stop location where we keep all our data for our infrastructure, and it's also easier to navigate between different things.
We want to reduce having to go to different screens to obtain all the information. However, they are moving in the right direction from what we have noticed.
Stability has never been an issue. We throw all of our servers and containers at it. We have now started to throw our on-premise logs at it too.
At the beginning, when we started throwing logs at it, there was a bit of hiccup. However, this was during their beta period, so hiccups were expected.
It pretty much vacuums up any information that we throw at it. So, stability hasn't been an issue.
It scales depending on the time of year. Right now, we have about 25 to 50 instances, and in each instance there are probably five different containers, not including logging for all those containers.
We used their technical support, especially during rollout. They were really good. We worked hand in hand to try to figure out how to configure everything.
For the monitoring of different EC2 instances, you install them into Datadog.
We use Chef to install Datadog's package, then that calls out all the information from the instance.
We did evaluate other vendors.
We chose Datadog because we were looking for an all-in-one package. They also do log caching and integrate with other systems well.
Take advantage of Datadog's trial period, and really beat it up, then give them a call.
We use the web service for this product.
We use it for custom metrics of our applications and monitoring of our systems.
My current company didn't have very good monitoring in the past. We had been using basic CPU monitoring. We have been able to set very specific CPU and memory alerts, at the very base level, then we started to pull real business value, like 99th percentile response rates for our API calls.
It has turned into an operational dashboard. If you felt something is going wrong, you can immediately open up Datadog. It has been our go to application because we know the answer will be there.
Their interface is probably one of the easiest things to use because it lets non-developers and non-engineers quickly get access to metrics and pull business value out of them. We could put together dashboards and give it to people who are non-technical, then they can see the state of the world.
They have a very good ecosystem for their integrations. They have a lot of different integrations, and we use a lot of them. We have integrations with Amazon for ECS, RDS, and all of the subsystems of Amazon. We also have Docker and Splunk integrations. The integrations are great because they're definitely vetted and not third-party integrations. They're part of the Datadog ecosystem and seamless.
The way data is represented can be limiting. They have added their own little query language that you can use to manipulate things, so you can graph and relate two different metrics together. This is relatively new this year. When I first tried it out a long time ago, you could graph a metric and another metric, and they'd overlay, but you couldn't take the ratio between the two. However, it looks like this is the direction that they're going, and that's a good direction. I think they should continue adding things that way.
I like being able to put the formulas in myself. I don't want the average. I want a rolling average over three minutes, not five minutes. They're getting better at letting the user customize this.
When I started using it years ago, it had stability problems. I remember, specifically, we ran everything in Docker containers. There were some problems getting it into a Docker container with very specific memory limits. We couldn't nail down exactly what the limits and the application needed. Once we did that, we were good. However, it was tricky to get the limit in the first place.
It has always scaled for us. Cost scales up too, but that is not necessarily a bad thing. It's reasonable for what they're providing. I haven't had any concerns about scaling.
We use between a 100 to 500 servers at any given point in time.
For the most part, the technical support is pretty good. Every now and again, you will get stuck with a support rep who could have better training, but in general, they are very good and responsive. They're willing to talk about new features, etc.
The integration and configuration processes have been very smooth because everything is very well-documented. The documentation is phenomenal.
We can see trends a lot easier than if we didn't have the solution. The management can see the changes which are being made, whether it being performance or in the number of hosts that went down. We recently made internal improvements to some of our internal APIs, so we reduced the number of servers that we needed. So, you could see that the load on the system went down and the number of servers went down. Thus, it was easy to visualize.
Pricing and licensing are reasonable for what they give you. You get the first five hosts free, which is fun to play around with. Then it's about four dollars a month per host, which is very affordable for what you get out of it. We have a lot of hosts that we put a lot of custom metrics into, and every host gives you an allowance for the number of custom metrics. We have not had a problem with it.
My company now is pretty good at looking at alternatives. Also, I evaluated alternative solutions at my last company.
There are some other competitors. For example, I know one of them started doing metrics and their licensing is very cheap because the metric size is very small and it's per megabyte. They charge you per storage, and it's very small. However, the interface and integrations aren't there. and there are some other competitors,
The other thing is granularity. Datadog gives you one second granularity for a year. Whereas, some of the competitors would roll up, so after about a week you don't have one second, you have five seconds. Then, after a month, you don't have five seconds, you have a minute. So, you start to lose the granularity, whether it be that it averages it or maxes it, you start to lose the ability to see incidents historically, which is super valuable. If we have an incident, which we think we've seen this before, and want to look back historically, we can zoom right in and see in the database where it peaked.
Give Datadog a try. It's the leader in this space.
I have only used the AWS version of the product.
They have a thing for the color purple, but it is all good.
We use it to store editorial content.
We started out on the on-premise version, then moved to the AWS version.
I don't have to worry about upgrades with the AWS version.
The on-premise version is very difficult to upgrade.
We run the agent in AWS.
It has empowered all our platform engineers with a very powerful and easy to use monitoring system. Most of our platform organization is now involved in monitoring. Previously, only a handful of platform engineers were involved, because Graphite and Sensu were so cumbersome to use.
It is incredibly easy to do common monitoring actions:
Very rarely. Maybe only once or twice that we noticed. It is very reliable.
No.
It is excellent. The web app has a real-time support chat window in which a support engineer is chatting with you within a minute. That is the "right" way to do support.
We previously ran Graphite and Sensu ourselves. By moving to Datadog, we did not need to manage our own monitoring infrastructure anymore. Graphite was somewhat complex to run.
Initial setup is easy. Install the agent and send it metrics. There are StatsD/Datadog libraries available for most languages.
Pricing seems reasonable. It depends on the size of your organization, the size of your infrastructure, and what portion of your overall business costs go toward infrastructure. It is hard to say without looking at all of this.
We looked at several competitors at the time (Summer 2016). There did not seem to be any compelling alternatives. Once we did the PoC with Datadog, we loved it and decided to move forward.
Try it out and see if you like it.
We can build dashboards as fast we roll out new systems, which can be fast.
We use standard and custom metrics for every new system we roll out for 360 degree visibility into our systems.
The most valuable features have been: Sharable dashboards, TimeBoards, dogstatsd API, Slack Integration, Event logging API. CloudTrail Events, Tags, alerts, and anomaly detection. EBS Volume Snapshot Age, which they added upon request. We used PagerDuty integration for a while as well.
More granular control over dashboard sharing. Timeboard sharing.
There are infrequent hiccups, which have been decreasing over the time we have used it.
No.
Customer Service:
Never seen better. Questions answered usually almost immediately, even on weekends. An in-stream with your event stream.
Technical Support:
High.
Overall they have always had an amazing team, and quality has been maintained as the company has grown.
Complementary to other tools we used.
Setup is generally easy. They provide an large number of integrations, some are more complex than others, which is to be expected.
In house implementation.
We didn’t calculate explicitly, but as we used the product to track down underutilized instances, it more than paid for itself in the first month.
Pricing overall in this segment has standardized in the last several years.
A few, including Zabbix and Icinga.
One of the fastest and most flexible tools we have used in this area..
