Lead Architect at a computer software company with 11-50 employees
Real User
Great search and filtering with useful troubleshooting capabilities
Pros and Cons
  • "We have found that we're able to get in and out of troubleshooting issues much more rapidly, which in turn, of course, enables us to spend more time on our products."
  • "I've found that the documentation is lacking in certain regards."

What is our primary use case?

We primarily use the solution for log management and application performance monitoring. We have been getting into using more solutions on Datadog, such as runbooks, monitoring, and dashboards. 

Another area that we've been investing some time in is the database monitoring. We've been able to get some relatively new employees onboarded into the tool, and they've been able to create some meaningful dashboards and reports without too much hand-holding at all. 

We plan on exploring the synthetics solution as well.

How has it helped my organization?

We are still working through fully rolling the service out to our employees. Those that have so far begun using it have found that it decreases the time required to investigate and troubleshoot production issues. 

We have found that we're able to get in and out of troubleshooting issues much more rapidly, which in turn, of course, enables us to spend more time on our products. We are still investigating other areas where other Datadog services could potentially be injected into our workflows.

What is most valuable?

Correlation between logs and APM has been the most important feature that we've found in Datadog to date. Previous solutions around log collection or APM instrumentation were rather cumbersome to connect. We previously needed to use different solutions for each which were not connected and required complex queries and a lot of time investment by key employees.

The search and filtering capabilities are rather helpful as well. The aggregation of all currently available properties has been great. It's excellent that available options drop as filters are refined. This allows for a nuanced view of available data.

We intend on exploring other products at Datadog, so this list may expand.

What needs improvement?

I've found that the documentation is lacking in certain regards. In going through sessions around certain services, the presenter expressed opinions on best practices that are not covered by documented examples. 

In taking these thoughts to the "experts," further research is required both by us and those working the table to come to a solution that meets our needs. If there were more documentation on best practices this may be easier to manage.

Buyer's Guide
Datadog
April 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,740 professionals have used our research since 2012.

For how long have I used the solution?

I've been using the solution for ten years. 

What do I think about the stability of the solution?

The solution overall seems rather stable.

What do I think about the scalability of the solution?

The solution seems scalable. We just need to keep an eye on the costs as it scales.

How are customer service and support?

Customer support has been ok, yet not great. We've had ticket resolution drag on for weeks.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We previously used Scalyr for logs and switched due to APM linkage.

How was the initial setup?

The initial setup was straightforward.

What about the implementation team?

We handled hte setup in-house.

What was our ROI?

We've saved many developer hours by using Datadog. We plan on expanding our investment in this solution (and thus our return).

What's my experience with pricing, setup cost, and licensing?

Pricing can be a bit of a sell internally. We've found it to be worth it, though.

Which other solutions did I evaluate?

We came from using other solutions.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Lead Software Engineer at a retailer with 51-200 employees
Real User
Great APM and interesting log management but the UI is daunting
Pros and Cons
  • "The most useful feature is the APM."
  • "As a new customer, the Datadog user interface is a bit daunting."

What is our primary use case?

We are trying to get a handle on observability. Currently, the overall health of the stack is very anecdotal. Users are reporting issues, and Kubernetes pods are going down. We need to be more scientific and be able to catch problems early and fix them faster.

Given the fact that we are a new company, our user base is relatively small, yet growing very fast. We need to predict usage growth better and identify problem implementations that could cause a bottleneck. Our relatively small size has allowed us to be somewhat complacent with performance monitoring. However, we need to have that visibility.

How has it helped my organization?

We are still taking baby steps with Datadog. Hence, it's hard to come up with quantifiable information. The most immediate benefit is aggregating performance metrics together with log information. Having a better understanding of observability will help my team focus on the business problems they are trying solve and write code that is conducive to being monitored, instead of reinventing the wheel and relying on their own logic to produce metrics that are out of context

What is most valuable?

The most useful feature is the APM. Being able to quickly view which requests are time-consuming, and which calls have failed is invaluable. Being able to click on a UI and be pointed to the exact source of the problem is like magic. 

I'm also very intrigued by log management, although I haven't had quite a chance to use it very effectively. In particular, the trace and span IDs don't quite seem to work for me. However, I'm very keen on getting this to work. This will also help my developers to be more diligent and considerate when creating log data.

What needs improvement?

As a new customer, the Datadog user interface is a bit daunting. It gets easier once one has had a chance to get acquainted with it, yet at first, it is somewhat overwhelming. Maybe having a "lite" interface with basic features would make it easier to climb the learning curve.

Maybe the feature already exists. However, I'm not sure how to keep dashboard designs and synthetic tests in source control. For example, we may replace a UI feature, and rebuild a test accordingly in a pre-production environment, yet once the code is promoted to production, the updated test would also need to be promoted.

For how long have I used the solution?

We have just started using the solution and have only used it for about two months.

What do I think about the stability of the solution?

We're new at this. That said, so far, there haven't been any issues to report.

What do I think about the scalability of the solution?

I have not had the opportunity to evaluate the scalability.

How are customer service and support?

Customer support is full of great folks! We're beginning our Datadog journey, so I haven't had that much experience. The little I have had has been great.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

This is all new. 

We used to work with New Relic. New Relic has an amazing APM solution. However, it also became cost-prohibitive

How was the initial setup?

Since we are relatively greenfield, it was relatively painless to set up the product. 

What about the implementation team?

Our in-house DevOps team did the implementation.

What was our ROI?

I don't know what the ROI is at this stage.

What's my experience with pricing, setup cost, and licensing?

I'm not sure what the exact pricing is. 

What other advice do I have?

So far, it's been great!

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Datadog
April 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,740 professionals have used our research since 2012.
Engineering Manager at Indeed.com
User
Transparent, easy to use, and integrates well with Slack
Pros and Cons
  • "Datadog's seamless integration with Slack and PagerDuty helped us to receive alerts right to the most common notification methods we use (our mobile devices and Slack)."
  • "I would like better navigability across pages."

What is our primary use case?

I primarily use the solution to learn, watch and monitor business and engineering metrics in the production and QA environments of my team. 

We create monitors on key business metrics and observe regressions and anomalies.

Less often, I leverage the events ability in Datadog to get notified about significant activities happening in my teams' deployments.

We learn about Datadog monitor alerts through Slack and often attempt to create SLOs using Terraform.

We use APM for observability.

Most recently, I learned about WatchDog Alerts that I will be heavily looking into.

How has it helped my organization?

Datadog simplified my ability to watch easily and add monitors on any metric emitted by any team at my organization.

Datadog APM immensely improved our ability to understand the reasons behind production issues. Its ability to navigate across services seamlessly to understand the time spent at each critical stage of a production request is helpful. This, combined with Datadog's historical ability to show business metrics aside, helped get more powerful insights much more quickly.

Datadog's seamless integration with Slack and PagerDuty helped us to receive alerts right to the most common notification methods we use (our mobile devices and Slack).

What is most valuable?

The most valuable aspects include:

  • The ability to monitor any team's metric in my company (transparency)
  • The ability to create/clone dashboards for myself (ease of use)
  • Its integration with Slack (it is very powerful)
  • The ability to add monitors on any metric emitted by any team at my organization
  • (Through Datadog APM) the ability to understand the reasons behind production issues. Its ability to navigate across services seamlessly in order to understand the time spent at each critical stage of a production request is key. This, combined with Datadog's historical ability to show business metrics aside, helped me get more powerful insights much more quickly.
  • (Through integrations like Slack and PagerDuty) the ability to receive alerts right to the most common notification method we use (our mobile devices and Slack), which saves a lot of time and helps us maintain focus. 

What needs improvement?

I would like better navigability across pages. The UI/UX is powerful, yet less intuitive. A lot of times, I somehow navigate across buttons and pages, and I end up forgetting how to get back to a particular view that was more insightful. 

Particularly as Datadog starts offering more platform capabilities like APM, Watchdog, Shift left initiatives like instrumentation, continuous testing, intelligent test runner, and Synthetic and real user monitoring, the UI can become more and more clunky, giving users a very frustrating experience. 

For how long have I used the solution?

I've used the solution for five to six years.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Sr Platform Engineer at a pharma/biotech company with 11-50 employees
Real User
Good logging with lots of great integrations and an interesting dashboard
Pros and Cons
  • "Datadog has a lot of features to be able to drill down deep into the swath of logs that our platforms generate."
  • "Some of the interface is still confusing to use."

What is our primary use case?

We use it mostly for logging log messages from our Kubernetes and EC2 instances, for example, system messages and errors. Also, we want log messages from our firewalls and other network infrastructure in case of network issues. We intend to use it for application logging, et cetera, to get insight into internal problems in the applications in Kubernetes pods. We want to use it for monitoring in case of system problems and hardware failures so that it can notify us.

How has it helped my organization?

It's good to have a single location for all the logs. If you have logs coming from a whole lot of sources, it makes it hard to find where the problem lies. 

We had to spend a lot of time logging into various systems and pursuing a billion different log files looking for something that stands out as a possible cause of the issue. That can take a lot of time and doesn't give much visibility into the possible interactions between systems.

What is most valuable?

Datadog has a lot of features to be able to drill down deep into the swath of logs that our platforms generate. 

It has a lot of ability to make fancy and deep searches using regular expressions and to graph them into useful and interesting dashboard graphs. 

The plethora of built-in/downloadable integrations make it much easier to set up for our platforms. Otherwise, we'd have to parse the log files ourselves, which would take a great deal of effort. Had to do it before when had to use an ELK stack for logging, which was painful.

What needs improvement?

Some of the interface is still confusing to use. It has many features, and it takes a lot of effort to figure out what they all mean. Maybe having tooltips or something would be helpful. Also, some of the integrations are better than others.

For how long have I used the solution?

I've used the solution for a month.

What do I think about the stability of the solution?

The solution seems very stable.

Which solution did I use previously and why did I switch?

Have used an ELK stack before. However, it took a lot of effort to maintain, and parsing the logs was difficult.

How was the initial setup?

We implemented the solution in-house.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Ian Schell - PeerSpot reviewer
Senior Site Reliability Architect at a tech vendor with 1,001-5,000 employees
Real User
Reduces debugging time, with good distributed tracing and useful RUM
Pros and Cons
  • "We have hundreds of microservices, and knowing how top-level requests weave throughout all of them is invaluable."
  • "There is occasional UI slowness and bugs."

What is our primary use case?

We use Datadog for general observability into our infrastructure, as well as running analytics queries for our SLI/SLO platform. This helps all of our teams be informed of how well their products are actually performing in production, and aim their efforts at the thing that will provide the highest ROI. 

We also use it for general monitoring and alerting during load tests and service releases to detect any issues related to the deployments. This helps us maintain our high contractual uptime promises to our clients.

How has it helped my organization?

It has drastically reduced the amount of time we spend on debugging issues and tracking down the root causes of incidents. What might have taken days or hours with separate vendors in the past (or even single vendors with terrible UI) is now quick and easy. 

We've often gone from detecting an incident to identifying the needed fix within ten minutes or less and covered multiple domains like APM, Logs, Database performance monitoring, etc., in just a few clicks. This is extremely powerful.

What is most valuable?

Distributed tracing is the most valuable feature. We have hundreds of microservices, and knowing how top-level requests weave throughout all of them is invaluable. 

At one glance, we can clearly see which service is slow and then switch over to the infrastructure view or container view to debug why the slowness is happening. This is true of all their other integrated products as well; the more you add, the more insights you get when looking at traces.

We also use RUM extensively. This helps us cover the last mile of application performance. Without it, we wouldn't know if our browser applications were functioning slowly for our users.

What needs improvement?

There is occasional UI slowness and bugs. While the Datadog UI is generally miles above its competitors, there are a few cases where it falls short or has started to slow down over time. They also occasionally make poor UI redesign choices. They should continue focusing on this area to maintain the high standard they started out with.

For how long have I used the solution?

I've used the solution for five years.

What do I think about the stability of the solution?

We've never had major stability issues.

What do I think about the scalability of the solution?

Scalability has never been an issue, although there is occasionally UI slowness.

How are customer service and support?

Support via tickets is absolutely terrible. It's the one obvious bad spot for Datadog. If we didn't have direct relationships with many of their product managers, our experience would be much worse.

How would you rate customer service and support?

Negative

Which solution did I use previously and why did I switch?

We previously used New Relic. It had a terrible UI and the integration between products was not great. Datadog is miles ahead of them and is continuing to increase that distance.

How was the initial setup?

The initial setup is straightforward, and the docs are done well.

What about the implementation team?

We managed the implementation in-house.

What was our ROI?

Our ROI is high.

What's my experience with pricing, setup cost, and licensing?

I'd advise users to negotiate rates. Datadog's off-the-shelf rates are pretty high.

Which other solutions did I evaluate?

We have only used and looked into New Relic.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Software Engineer at Enable Medicine
User
Top 20
Centralizes logs and provides high-level views but is quite expensive
Pros and Cons
  • "Datadog has made it much easier to have a central place for people to look for logs and made it much easier to notify them of any elevated error rates or failures."
  • "The product is quite complex, and there are so many features that I either didn't know about or wasn't sure how to use."

What is our primary use case?

We mostly use it to handle log aggregation, monitor our web application, and alert us on data pipeline failures. 

Our system is fully on AWS, and so we pipe in all of our Cloudwatch logs into Datadog to have a central place to index and search logs. 

Our web app is built on an Elastic Beanstalk backend, and we use the Datadog agent to keep track of all of the requests that hit our backend and all of their components. 

We also use the prebuilt AWS pipeline dashboards to monitor our batch jobs and lambdas.

How has it helped my organization?

Datadog has made it much easier to have a central place for people to look for logs and made it much easier to notify them of any elevated error rates or failures. 

It is also easier to get high-level views of platform health, whereas looking directly at AWS tends to provide very specific insight into particular surface areas or products. 

By having the whole team onboard onto Datadog, we also have a single source of truth that everyone can use when triaging and resolving incidents that occur across any surface area.

What is most valuable?

The ease of setting up metrics and alerting and integrating with Slack has significantly reduced the friction of keeping the team up to date on the platform's health. Before creating custom Cloudwatch metrics was never very intuitive, and also it was non-trivial to set up integrations with other services we use, especially Slack

It also provides a good way to gain the context needed when trying to fix issues, as it's a central place to look through logs, requests, AWS metrics, and more - overall contributing to the health of our platform.

What needs improvement?

The product is quite complex, and there are so many features that I either didn't know about or wasn't sure how to use.

One thing that could be improved is somehow surfacing interesting or relevant products that might be applicable given our infrastructure. 

Additionally, the billing can sometimes be confusing and opaque, especially around not making it obvious what the implications can be if you add different AWS integrations. This has caused some unexpected costs in the past due to engineers not understanding how Datadog pricing works.

For how long have I used the solution?

We've used the solution for around two years.

Which solution did I use previously and why did I switch?

This was the first solution we tried.

What's my experience with pricing, setup cost, and licensing?

It is quite expensive, especially if you don't know how the pricing works.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
IT Test Manager at a transportation company with 10,001+ employees
Real User
Very good documentation provided along with regular new features
Pros and Cons
  • "Datadog is constantly adding new features."
  • "Lacks some flexibility in the customization."

What is our primary use case?

Our primary use case is log management and we also use the solution for monitoring the application and underlying infrastructure. I'm an IT test manager. 

What is most valuable?

I appreciate that they are constantly adding new features, some of which we haven't yet had a chance to implement. 

What needs improvement?

I'd like to see more flexibility in the customization and they have a few settings which need to be changed but we are unable to make those changes as users or as the administrator. The tagging to get the different parts of the monitoring interconnected is a bit tricky and takes time to work out. 

For how long have I used the solution?

I've been using this solution for 18 months. 

What do I think about the stability of the solution?

The stability is good. 

What do I think about the scalability of the solution?

I would say that the amount that we are monitoring is not that large and we've never had any scalability issues. We have around 50 users in our department. 

How are customer service and support?

The availability or accessibility to customer service is not always good, although they generally provide solutions once you do manage to get hold of them. 

Which solution did I use previously and why did I switch?

We have previously used different tools for different parts of the monitoring. We changed to AWS when we moved to the cloud. We also found that the effort in maintaining Grafana and Prometheus and keeping it up to date was taking too much time.

How was the initial setup?

The initial setup was straightforward, we used a service provider and they also maintain our operation in general.

What's my experience with pricing, setup cost, and licensing?

We have a four-year contract with Datadog, and the solution is pay-as-you-use. 

What other advice do I have?

I would suggest using the documentation, which is quite good. It's best to start with existing integrations, and then do the customization step-by-step.

I rate this solution eight out of 10. 

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1479957 - PeerSpot reviewer
Senior Director of DevOps at Housecall Pro
Real User
Good graphing and dashboards, and it improves visibility for developers
Pros and Cons
  • "Having a wealth of information has helped us investigate outages, and having historical data helps us tune our system."
  • "Datadog has a lot of documentation, but a lot of that documentation assumes you know how the service works, which can lead to confusion."

What is our primary use case?

We primarily use Datadog for the monitoring of EC2 and ECS containers running mostly Rails applications that host a SaaS product. We also monitor ElasticSearch and RDS, and we are working on adding their Application Performance Monitoring solution to monitor our applications directly.

We use DataDog to create dashboards, graphs, and alerts based on interesting metrics. DataDog is our first place to look to find the performance of our system.

We also use their logging platform and it works well. Especially useful is that the logs and metrics are tightly integrated so you can jump between them easily.

How has it helped my organization?

Developers are able to see how code is running in production, where this was mostly opaque previous to us implementing DataDog. We are able to emit custom metrics that are specific to our business, and the built-in metrics have also proven useful. Having a wealth of information has helped us investigate outages, and having historical data helps us tune our system.

DevOps engineers are able to put sensors around our system to proactively detect problems, whereas before, our engineers heard about problems from customers. Logs are easier to find for developers.

What is most valuable?

Metric graphing and Dashboards are the most valuable features because they give us good observability into our system and work well to alert us when interesting things happen. We use this functionality daily.

We value the monitoring capability since it allows us to be pushed alerts, rather than have to observe graphs continually. The integrations with Slack and PagerDuty enable us to be interrupted appropriately and keep a running tab on the system without bothering us unnecessarily.

The online process monitoring has been extremely helpful, as it gives engineers the ability to see the live status of all the processes running our systems without them having to log in.

What needs improvement?

Their logging solution is expensive for our use case. They do have the capability to rehydrate old or incomplete logs, and it works, but I would rather not have to think about that operation.

Datadog has a lot of documentation, but a lot of that documentation assumes you know how the service works, which can lead to confusion. Positive note is that they do have lots of documentation, it just needs better curation.

Their APM solution still needs some work, but they are actively developing it. I would also like to see more database-specific application monitoring.

For how long have I used the solution?

I have been using Datadog for five years across two companies.

What do I think about the stability of the solution?

Any issues are addressed and communicated very quickly. I have not had any issues with uptime.

What do I think about the scalability of the solution?

If you do not need 100% of data such as logs, APM traces, etc., this scales well. It does not scale as well if you want 100% of your logs indexed. You should understand any other usage-based bills before using any part of their service as it is very easy to run up a large bill.

The performance of the system scales very well, and host monitoring and APM are relatively cheap.

How are customer service and technical support?

Account support is excellent.

Customer support is good if you get them to go beyond pointing out the right documentation.

Which solution did I use previously and why did I switch?

Previously, I used homebuilt solutions with Nagios and Cacti but found that there was far too much work to understand them and keep them up and fed compared to the value that I got. They also did not integrate well with existing data sources without a lot of effort.

I also previously used StackDriver and found it too opinionated. I like that DataDog gives you tools to work with certain types of data and make your own graphs, monitors, etc., whereas, with StackDriver, I felt like there were a limited number of ways you could accomplish goals.

How was the initial setup?

The basic setup is easy. A more advanced setup can be tricky because the documentation assumes you know how the system works already. Support is somewhat helpful, but mostly points out the documentation you should already have found.

What about the implementation team?

We implemented in-house.

What's my experience with pricing, setup cost, and licensing?

My advice is to understand what number of hosts and data you want to commit to. Beware that usage-based billing is both a blessing and a curse. It is easy to run up a large bill, so become familiar with the cost of each piece of your bill and use the metrics they supply to estimate and monitor your bill.

I have had good luck with their support team helping us to figure out the correct commit levels. Their account support is excellent in this regard. I have heard their sales team can be aggressive, but I have not experienced it personally.

Which other solutions did I evaluate?

I originally chose Datadog because of my previous experience. We recently considered moving over to New Relic because we liked their APM solution better. However, the pricing of New Relic and our familiarity with Datadog won over. New Relic is a good product but it didn't fit our overall needs as well as Datadog.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
Updated: April 2024
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.