Software Engineering Manager at a healthcare company with 501-1,000 employees
Real User
Top 20
Great CI visibility, logging, and monitoring
Pros and Cons
  • "Datadog helps us detect issues early on and helps in troubleshooting."
  • "We would really like to see more from the Service Catalog."

What is our primary use case?

We mainly use the product to monitor our infrastructure and apps. It is the go-to tool when we want to check that things are running properly. We use Datadog synthetic monitors to ensure our app works across different locations in the United States. 

We also have set up Datadog monitors to send alerts if things stop working as expected. 

We use Continuous Integration Pipeline visibility to make sure our developers are not being blocked by infrastructure and other things that might be out of their control.

How has it helped my organization?

Datadog helps us detect issues early on and helps in troubleshooting. Creating Service Level Objectives and defining monitors is helping us to stay on top of potential issues that might affect our users. 

We take advantage of Application Performance Monitoring to ensure our applications are working as expected, and our users can get the healthcare they need at a price they can afford. 

Synthetic monitoring also helps us in testing our application in different browsers.

What is most valuable?

The most valuable aspects of the solution include: 

CI visibility, which helps us in making sure our CI systems are running efficiently and are not blocking our developers from releasing new software and fixing bugs.

Logs, which help us in debugging issues where we can search for logs and can make sure they are relevant to the issues we are looking at.

APM, which can help us to stay on top of our applications by giving us the confidence that our apps are running.

Monitoring. We use monitoring a lot to ensure we know about potential issues and fix them before they affect our customers.

What needs improvement?

Overall, we really like the quality and relevance of all of the Datadog products that are currently being used. 

The documentation is very well organized and is the go-to place for us to find answers to our questions. 

We would really like to see more from the Service Catalog. It is something that we are interested in. However, some might think it lacks some key features at this time. We will definitely keep our eye out for this and adopt it when all the features are implemented. 

We're really looking forward to all the great things DD will do.

Buyer's Guide
Datadog
April 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
767,995 professionals have used our research since 2012.

For how long have I used the solution?

I've used the solution for three years.

What do I think about the stability of the solution?

The stability is great.

What do I think about the scalability of the solution?

The scalability is great.

How are customer service and support?

Technical support is great.

What about the implementation team?

We handled the initial setup in-house.

What's my experience with pricing, setup cost, and licensing?

I don't have any insights into pricing.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Senior Software Engineer at a transportation company with 51-200 employees
Real User
Good dashboard, excellent monitoring, and easy to expand
Pros and Cons
  • "Datadog has helped us a ton by allowing us to set up a multitude of easily configurable alarms across our tech stack and infrastructure."
  • "I found the documentation can sometimes be confusing."

What is our primary use case?

We primarily use Datadog for alerts. If we're running out of database connections or CPU credits we want to find out in Slack. Datadog provides nice features for that.

Secondarily, we use Datadog for analyzing historical trends and forecasting potential issues.

I'm trying to learn how to add in Continuous Profiler in our primary backend servers and set up Synthetic Tests for monitoring our front end.

Everything is mostly on AWS, and the Datadog integrations help a ton.

How has it helped my organization?

Datadog has helped us a ton by allowing us to set up a multitude of easily configurable alarms across our tech stack and infrastructure. It doesn't matter if it's in AWS Lambda or a Docker container in AWS EC2, Datadog's intuitive interface makes alarms incredibly easy to configure, reducing our resolution time for incidents.

A lot of the value comes from how frictionless the integrations are. Adding in a Datadog agent or flipping a switch on the Datadog UI to start streaming Lambda data makes the product so incredibly appealing for my company.

What is most valuable?

The monitoring feature has been the most valuable.

I really like the dashboard. Monitoring has a straightforward tie-in to business value at my company (i.e. declaring incidents, etc). Things like having a dashboard and APM make my job easier. That said DevX is a little bit of a harder sell to executives in my company.

The dashboard feature makes it so easy to inspect multiple metrics at once across services. It's truly been a lifesaver when I'm personally trying to understand why performance degradation is happening.

What needs improvement?

I found the documentation can sometimes be confusing. I tried configuring APM for some of our Python containers, and I had to cross-reference multiple blog posts and the official documentation to figure out which Datadog-agent to use. If I needed a ddtrace trace, what environment variables I should set, etc. 

Furthermore, to generate my own traces, I wasn't aware that ddtrace adds its own "monkey patching," which led to headaches with respect to configuring the service for RabbitMQ.

A more unified and up-to-date documentation suite would be greatly appreciated.

For how long have I used the solution?

I've used the solution for about two years.

What do I think about the stability of the solution?

I don't recall seeing an incident from Datadog in the past couple of years and that's been wonderful.

What do I think about the scalability of the solution?

The solution is incredibly scalable! To be fair, our data throughput to Datadog isn't super huge, however, we have never seen issues as it scaled to handle more of our data.

Which solution did I use previously and why did I switch?

We used to use AWS Cloudwatch for a lot of our monitoring needs. That said, the interface felt clunky, confusing, and limited.

What was our ROI?

We don't have hard numbers on ROI. That said, overall, it has been a wonderful addition to our tooling suite.

Which other solutions did I evaluate?

We also looked at Honeycomb and are currently using both in production.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Datadog
April 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
767,995 professionals have used our research since 2012.
Senior Cloud Engineer at a comms service provider with 10,001+ employees
Real User
Good platform monitoring and great cost and performance optimization
Pros and Cons
  • "The observability pipelines are the most valuable aspect of the solution."
  • "Geo-data is also something very critical that we hope to see in the future."

What is our primary use case?

We use the solution primarily for platform monitoring for the services that are deployed in AWS. It gives a better way to monitor the services, including pods, cost, high availability, etc. This way, observability is ensured and also customer services are uninterrupted. 

Also, we host the data pipelines between the cloud and the on-prem for which Datadog is used to ensure better services. We report issues based on the metrics reported over it. 

How has it helped my organization?

Cost and performance optimization were the major enhancements for our organization. It gives us platform monitoring for the services that are deployed in AWS for a better way to monitor the services (pods, cost, high availability, etc.). With this product, we ensure that observability and also keep customer services uninterrupted. We host the data pipelines between the cloud and the on-prem. Datadog helps to ensure better services. We find we can report issues based on the metrics reported over it.

What is most valuable?

The observability pipelines are the most valuable aspect of the solution. 

Platform monitoring for the services that are deployed in AWS is helpful. It gives a better way to monitor the services. With Datadog, we ensure observability and maintain uninterrupted customer service. 

We can host the data pipelines between the cloud and the on-prem. Issues are easily reported.

The data streams are good. Data lineage is something that really helped in ensuring tracking of the data and metrics and also the volumes processed.

What needs improvement?

We'd like to see better transformers.

Live chat would be the best way to support us. 

Also, the features that we saw getting launched recently were something we expected and we're glad to see them coming.  

Geo-data is also something very critical that we hope to see in the future.

For how long have I used the solution?

I've used the solution for two or more years. 

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
LuWang - PeerSpot reviewer
DevOps Engineer at Screencastify
Real User
Customizable and helpful for isolating and filtering environments
Pros and Cons
  • "We have way more observability than what we had before - on the application and the overall system."
  • "Auto instrumentation on tracing has not been very easy to find in the documentation."

What is our primary use case?

We use Datadog for observability and system/application health, mainly for product support, triaging, debugging, and incident responses.

We use a lot of the logging and the Datadog agent to collect logs, metrics, and traces from our GKE workloads. We use APM and continuous profiling for latency and performance measurement. We use RUM to observe frontend user events, such as tracing on request and what actions they take before errors occur. We also use error tracking and source maps to debug production failures.

We are still relatively new to the product, and we are planning to use more of the notebook functionality and power packs to record run books and break knowledge silos. We also need to utilize dashboards and continuous profiling more for performance measurement and integrate Datadog alerts for incident response.

How has it helped my organization?

We have way more observability than what we had before - on the application and the overall system. That includes the GKE cluster, nodes, and pods. It's helped with our cloud-run instances, databases, and data storage.

We also started observability in the CI pipeline to measure our CI performance, as it was a pain point for us. We are aiming to do incremental deployments and releases, and the bottleneck so far has been our CI performance. The visibility on which actions or functions take the most time allows us to pinpoint and focus on improving configurations on these.

What is most valuable?

We use structure logging a lot to triage production issues. The querying, attributes and tags manipulation, and customization have been very helpful in isolating and filtering environments. The integration with Winston logger has also been a breeze.

First and foremost, was that structured logging, tags, and attributes have not only allowed us to narrow down to a problem quickly in production, they have also let us create dashboards from these logs to understand more user behaviors, such as how many users stop and leave our application before an upload has completed. That helps us understand how important processing time is to a user.

We also intend to use distributed tracing more to understand where the error has occurred in a particular request.

What needs improvement?

Definitely, documentation could use improvement. As I navigated and try to find instrumentation and implementation details, I discovered inconsistency among SDKs based on languages. 

There are also places where highlighting can be improved. I once created an issue on GitHub, and it was resolved right away by an engineer. He pointed out that it was actually in the documentation. I looked again and found it was not very obvious. We were stuck on the problem for days.

Auto instrumentation on tracing has not been very easy to find in the documentation. We ended up using OpenTelemetry, yet the conversion between tracing contexts has been difficult.

For how long have I used the solution?

We've used the solution between six months and a year. 

How are customer service and support?

Customer service and support are generally very fast. I did experience one ticket, which involved changing the log index retention period, not being responded to. Any support tickets related to technical issues were resolved pretty fast.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We used to use GCP Stackdriver for logging and monitoring since our infrastructure is all GCP based. It was lacking a lot, particularly on tracing and structured logging. We often had a lot of trouble triaging and diagnosing a production problem. Datadog's specialty is observability. Since we started using the product, we were able to create dashboards, and utilize APM, continuous profiling, RUM, and distributed tracing for production support and user trends.

Datadog also offers labs and workshops for its products, which is very helpful.

What about the implementation team?

We implemented the product ourselves.

What was our ROI?

I'm not sure what our ROI would be.

What's my experience with pricing, setup cost, and licensing?

We started with on-demand pricing as we were re-writing our product, and we weren't sure about the total usage. After we went into production and released the product, we experienced a price surge. Fortunately, our Datadog account manager reached out to us and suggested a monthly subscription, which is what we'll be switching to.

I'd advise keeping an eye on the usage and possibly setting up some monitoring on price. We didn't have much of a setup cost; we started with a free trial and continued with on-demand after the trial ended.

Which other solutions did I evaluate?

We didn't evaluate many of the other options. However, we do also use OpenTelemetry, which is vendor agnostic and integrates with Datadog.

What other advice do I have?

We always keep the Datadog agent to the latest version.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Google
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Software Engineer at Spring Health
User
Great dashboards and custom metrics with the ability to parse logs
Pros and Cons
  • "The dashboards are great."
  • "We need more advanced querying against logs."

What is our primary use case?

We share dashboards, set up alerts, and monitor everything that happens in our system. We use it in staging, features, production, and our load test environment. It is exceptionally helpful for making our engineering more data-driven. 

I came from a company that believes we should focus on being telemetry driven. Instilling this in a smaller, less mature engineering organization has been challenging. However, it is much easier while using Datadog.

What is most valuable?

The dashboards are great. They are an easy way to give visibility into what we need to watch with others who are not SMEs.

I enjoy the custom metrics. With this, we can take things that were once logs and then retain them longer.

We are able to parse logs. To be honest, this was only useful due to the fact that we had not yet set up the Datadog agent properly in PHP. Once we did this, the Datadog log parsing was no longer needed.

The ability to pin to a date and time is very helpful. This allows us to pinpoint exactly what was happening.

What needs improvement?

We need more advanced querying against logs. While most issues I have had here can be alleviated by way of sending better-formatted logs, it would be cool to do SQL-type queries against our data.

We need a way to see dashboard metadata. We launched a huge customer, and we saw more people using Datadog than ever across the entire organization, yet had no way to tell.

It would be ideal if we had some way to compare arbitrary date times more easily. We would love to use the Diff Graph command against some hard-coded value, for instance, against some known event.

For how long have I used the solution?

I've used the solution for eight months.

What do I think about the scalability of the solution?

The scalability is great!

Which solution did I use previously and why did I switch?

We previously used New Relic. I was not part of the decision-making team that made the switch.

What was our ROI?

The ROI is the speed at which we can debug live sites. It has been excellent. It's amazing how many incidents we can capture before customers notice.

Which other solutions did I evaluate?

We looked into New Relic and a home-brewed solution as potential other options.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Infrastructure engineer at a insurance company with 10,001+ employees
Real User
Good infrastructure, helpful logs, and useful alerts
Pros and Cons
  • "It has a high-level insight into the infrastructure model of the application and provides important detailed data on the host and metrics, which is the main concern of our customers."
  • "I sometimes log in and see items changed, either in the UI or a feature enabled. To see it for the first time without proper communication can sometimes come as a shock."

What is our primary use case?

Our use case is to provide cloud organization application monitoring. I use it for insight into what host in what region has activity or what market is using Datadog to its fullest potential and utilizing that for cost. This may also help determine who is using monitoring and setting alerts or just setting up monitoring and not doing anything about it. The use case can also be to check when the host or applications are down, or if the usage of CPU, memory, etc, is too high.

How has it helped my organization?

The solution has improved our organization from a market perspective. We have multiple departments and need some time to gather that data from a grouping point of view. Grouping that data via tag or seeing the separation is easy. In addition, it provides metrics and insights for senior leadership to have a high level of usage and cost. Application teams have better insight into their application, outages, when to plan for patches, updates, etc. Also, they have a better understanding of where the data gaps may be.

What is most valuable?

The infrastructure is the most valuable. It has a high-level insight into the infrastructure model of the application and provides important detailed data on the host and metrics, which is the main concern of our customers. It provides confirmation that the layer where the application is running is monitored and will be alerted when it is down and not functional. The customers can have ease of mind knowing their metrics are accurately being measured. The value of data provided, including service name, logs, and all other pertinent details tied to the host, makes it a valuable source of data

What needs improvement?

The solution can be improved via open communication to the broader audience on what has changed and what has not changed. I sometimes log in and see items changed, either in the UI or a feature enabled. To see it for the first time without proper communication can sometimes come as a shock.

For how long have I used the solution?

I have been using the solution for three years.

What do I think about the stability of the solution?

The stability is great.

How are customer service and support?

Technical support is great. Datadog has the resources and knowledge to tackle questions.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

I did not previously use a different solution.

How was the initial setup?

The initial setup is straightforward.

What about the implementation team?

The initial setup was handled in-house.

Which other solutions did I evaluate?

I did not evaluate any other solutions.

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Software Engineer at a tech vendor with 1,001-5,000 employees
Real User
Great profiling and tracing but storage is expensive
Pros and Cons
  • "Anything I've wanted to do, I found a way to get it done through Datadog."
  • "When it comes to storing the logs with Datadog, I'm not sure why it costs so much to store gigabytes or terabytes of information when it's a fraction of the cost to do so myself."

What is our primary use case?

We use the solution for application hosting and a little bit of everything when it comes to supporting a worldwide logistics tracking service. It's used as a central service for collecting telemetrics and logs. We find it does the same work as all of our old tools combined, including Prometheus, Kibana, Google Logs, and more; putting all of this information in a single platform makes it easy to corroborate information and associate a request with the data, which might be lost when it is saved as logs.

How has it helped my organization?

At my organization, we have plenty of microservices written in different languages. Different teams prefer one or the other framework or library within those languages.

With Datadog, we can get in a single line and march in the same direction; our logs and metrics are collected in the same fashion, making it easy to find bugs or integration problems across services and understand how they interact with other systems.

What is most valuable?

I primarily prefer to utilize the profiling and tracing feature. It can potentially be used as a more-informed alternative to logs.

Beyond that, anything I've wanted to do, I found a way to get it done through Datadog. It allows for testing, logging, hardware monitoring, system performance, memory consumption, advanced observability, AI assistance, cross-team collaboration, and business analytics. Datadog helps some of the world’s biggest brands transform faster with the help of true AIOps, AI-assisted answers, UX and business analytics, cloud observability, and smart AI assistance.

It's all supporting my desire to build a great application, and in a centralized SaaS application, it's hard to say anything can beat it.

What needs improvement?

The storage of logs is a little bit unexpected; most services generate gigabytes of logs, and their size is not excessive. When it comes to storing the logs with Datadog, I'm not sure why it costs so much to store gigabytes or terabytes of information when it's a fraction of the cost to do so myself.

For how long have I used the solution?

I've used the solution for one year.

What do I think about the stability of the solution?

We have no concerns with stability.

What do I think about the scalability of the solution?

It appears to be that there are no issues with scaling.

How are customer service and support?

Technical support is slow. It takes forever to get responses from the support team.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

I've previously used Kibana and Prometheus. We are still using these.

How was the initial setup?

Setting up through the environment variables made it unbelievably easy to get started.

What about the implementation team?

We've implemented the solution in-house.

What was our ROI?

I do not have this number off-hand, as I am not the finance guy. I just like the product.

What's my experience with pricing, setup cost, and licensing?

I'd advise new users not to start off by sending logs.

Which other solutions did I evaluate?

We did not really look at other options.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Google
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Security Engineering Manager at a financial services firm with 201-500 employees
Real User
Democratizes observability, great log searchability, and intuitive UI
Pros and Cons
  • "I find the greatest feature is being able to search across logs from various microservices."
  • "One area where I was really looking for improvement was the CSPM product line. I had really wanted to have team-level visibility for findings, since the team managing the resources has much more context and ability to resolve the issue, as the service owner. However, this has been added to the announcement in a recent keynote."

What is our primary use case?

I use the solution to manage security-related logs and metrics, as well as create detection rules for security events. I am a security engineer, so one area of interest is the CSPM product, giving us the ability to look at findings across the cloud environment. 

The great part about the Datadog security products is that they incorporate the context of the resources/hosts where the security event is found. This allows us to see exactly what is running on a host that we see as a security alert.

How has it helped my organization?

The greatest impact it has had is on the ability to democratize observability and put monitoring into the hands of the people. Teams can quickly get the information they need, without needing a bunch of training, since the UI is super intuitive and easy for beginners. This helps reduce time to resolution during incidents and gives context to developers quickly and easily. Context is really important since seconds matter when the ship is down, and you don't know why.

What is most valuable?

I find the greatest feature is being able to search across logs from various microservices. As a member of the security team, I find that I often need visibility into other teams' services in order to get a good picture of our security posture.

I also am a fan of the ability to easily create monitors and get alerts into Slack quickly, without too much overhead. For example, I often need to create monitors where I am not too sure where the baseline lies. Having the ability to create anomaly monitors makes this process much more straightforward. Anomaly monitors are great for a security team.

What needs improvement?

One area where I was really looking for improvement was the CSPM product line. I had really wanted to have team-level visibility for findings, since the team managing the resources has much more context and ability to resolve the issue, as the service owner. However, this has been added to the announcement in a recent keynote. 

For how long have I used the solution?

Personally, I've used it my entire time employed here, more than three years.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
Updated: April 2024
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.