Senior Software Engineer at a tech vendor with 501-1,000 employees
Real User
Great logging and tracing features with good insights into performance
Pros and Cons
  • "I have found the logging and tracing features the most valuable."
  • "The documentation could be improved regarding setting up the agent properly and debugging."

What is our primary use case?

We use this solution to monitor our Kubernetes clusters, nodes, deployments, daemon sets, replica sets, and pods.

How has it helped my organization?

I've been using this solution for a few months at my current company as a member of the Kubernetes team we use Datadog to provide monitoring and telemetry for our team and our customers.

This solution has improved our organization by giving us deeper insight into what's running in our clusters and their performance of it.

What is most valuable?

I have found the logging and tracing features the most valuable.

What needs improvement?

The documentation could be improved regarding setting up the agent properly and debugging.

Buyer's Guide
Datadog
December 2023
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: December 2023.
746,635 professionals have used our research since 2012.

For how long have I used the solution?

I've been using this solution for a few months at my current company.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Lead Support Engineer at a tech vendor with 11-50 employees
Real User
Good centralization of data with good integration but can be overwhelming at first
Pros and Cons
  • "The integration into AWS is key as well as our software is currently bound to AWS."
  • "The ability to find what you are looking for when starting out could be improved."

What is our primary use case?

Our use case is mainly deploying into our applications for monitoring/logging observability. We currently have our microservices feed into an actuator that exists in each instance of our application that extends to a local and central Grafana for client and internal visibility. The application we use is Grafana.

Logging captures application and system logs that are ported to each application instance for querying.

Whenever anything occurs that is considered unhealthy from a range of health checks, we have notification rules configured internally and externally for a prompt response time.

How has it helped my organization?

We have been able to be a more confident, knowledgeable, and capable team when everything is being ported into a centralized format. Beforehand, knowledge was isolated to individuals. Knowledge in terms of what information represented and where it was led to a lack of confidence. By having everything in one place, rules out that confusion and allows us to respond better to issues.

It also allows for personal growth as our team is learning the application from the ground up, and each person is enhancing their own skills.

What is most valuable?

The valuable features include the following: 

  • We are currently utilizing a decentralized distributed framework for our deployment, including our monitoring/logging observability capabilities. Centralizing them, if contingent on our company privacy guidelines, will be a big help in tracking and responding to issues that come up and have the means to understand the origin of the log management tools that were demonstrated.
  • The ability to fiddle around and manipulate how logs are outputted.
  • The ability to track AWS Lambda functions, Cloudformation, and Cloudwatch allow someone that is not savvy to dip their toe into understanding their own product.
  • The integration into AWS is key as well as our software is currently bound to AWS.

What needs improvement?

The ability to find what you are looking for when starting out could be improved. It was a bit overwhelming trying to figure out what is the best solution. It led to many prototypes or time spent just perusing documentation. If we were able to select bundles or template use cases, we would hit the ground running quicker.

For how long have I used the solution?

I've used the solution for one year.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Datadog
December 2023
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: December 2023.
746,635 professionals have used our research since 2012.
Devops Engineer II at a comms service provider with 11-50 employees
Real User
Great CPU profiler and lots of features but can be overwhelming
Pros and Cons
  • "Even if we don't end up using Datadog, it revealed problems and optimizations to us that weren't obvious before."
  • "The sheer amount of products that are included can be overwhelming."

What is our primary use case?

We use the solution for monitoring our logs across distributed clusters. Right now, we have an Elasticsearch solution that is tied to each platform (our product is a PaaS solution). 

We are looking at moving to a single pane of glass solution, which Datadog would be good for (plus, we could wrap up other tools like Prometheus, Grafana, Pagerduty, Pingdom, and more). We want to be able to have Datadog running on one single cluster and ingesting and processing logs from all our distributed clusters.

How has it helped my organization?

So far, we are just in the evaluation stages so it's hard to say how it's improved out organization. However, one positive impact it had is it's been just showing us an example of how to build in observability, metrics, tracing, etc., in a better way. 

Even if we don't end up using Datadog, it revealed problems and optimizations to us that weren't obvious before. One potential reason why it may not help us is that we have strict rules around log parsing and may not be able to send it to an external organizaton for ingestion/processing.

What is most valuable?

The CPU profiler has been interesting even though it isn't our core use case. 

We are finding that Datadog has way more offerings than originally expected, so we are constantly finding new parts of it that would be convincing to use. 

The log and ingestion are very similar to our current Elasticsearch setup. We find the tracing and overall integration/ecosystem to be the most valuable part. Basically, the CPU profiler is a good example of a value add for a problem we knew we had yet was low priority and had hacky workarounds. The value proposition is in the ecosystem as a whole.

What needs improvement?

The sheer amount of products that are included can be overwhelming. 

The solution requires better overarching UI, which would make things clearer. Even though I generally dislike the AWS UI, it makes the different services very clear, and it also makes where you are at any given point clear. 

The sidebar for all the different services is a bit much. 

I also found the tagging of logging pipelines to be a bit tedious. It would be great if, once marked up, it would automatically be a first-class citizen in Datadog.

For how long have I used the solution?

We are still in the evaluation stage and have used it less than one month.

What do I think about the stability of the solution?

The stability looks good so far.

What do I think about the scalability of the solution?

It seems easier to scale and build app functionality across multiple teams rather than other solutions.

Which solution did I use previously and why did I switch?

We have used Elasticsearch, Grafana, and Prometheus. We are still evaluating Datadog.

What was our ROI?

The product has provided good ROI by saving development time as well as time managing setting up ES.

What's my experience with pricing, setup cost, and licensing?

It is somewhat expensive compared to open-source options.

Which other solutions did I evaluate?

We evaluated Elasticsearch, Grafana, and Prometheus. We are still evaluating Datadog.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: evaluator
PeerSpot user
Software Engineer at a comms service provider with 11-50 employees
Real User
Industry-standard with good profiling and helpful alerts
Pros and Cons
  • "The biggest thing I liked was the combination of all the things - monitoring, log aggregation, and profiling."
  • "It can be overwhelming for new people as it has a lot of features."

What is our primary use case?

We use different tools for log collection and monitoring. Using Datadog will combine different use cases into one product that will be easier to manage. 

The tools we use are open-source, so there is no commercial support. Having customer support would be ideal since we're a small team. 

Profiling would be another great feature to have. Currently, it's manual. Having Datadog would give us a standard, and we don't have to do much manual work.

How has it helped my organization?

It will solve a lot of our problems. We have different tools for each of them in our organization; they are open-source and therefore not very well maintained with there is no customer support. 

Having an industry-standard product such as Datadog would be ideal for us as we are short on manpower. Since this is a managed all-in-one product with readily available support, we will be able to focus on application logic rather than figuring out why a tool isn't working.

What is most valuable?

The biggest thing I liked was the combination of all the things - monitoring, log aggregation, and profiling. We have different tools for each of them in our organization and all of them are open-source. These are not very well maintained and there is no customer support. 

Having an industry-standard product is ideal for us as we are short on manpower. Profiling is another amazing feature. Currently, we rely on some open-source solutions, and it's all done locally. Having it done on Kubernetes would give us more insights and help with performance. Alerting is again a nightmare for us. Datadog solves all of these issues.

What needs improvement?

It can be overwhelming for new people as it has a lot of features. The UI could certainly be improved. Having less information with better organization could help newcomers. I haven't seen the documentation, however, a well-organized documentation would invite many varied users.

For how long have I used the solution?

I've been using the solution for three years.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Cloud Operations Engineer at a tech vendor with 10,001+ employees
Real User
Great log, SQL, and network monitors
Pros and Cons
  • "Datadog documentation on web pages has improved a lot and is pretty easy to follow and find."
  • "Alerting timing should be improved to be more fine-tuned and exact."

What is our primary use case?

We primarily use the solution for monitoring applications and informing customers via Pagerduty and Statuspage. The monitoring and alerts can be personalized internally, and we are able to find problems and issues. The response time monitor has been great, and it has been validating upgrades. We can check in to see which step fails,

How has it helped my organization?

Previously, we had monitors scattered with different places and products, making troubleshooting harder and slower. Also, logs and monitors were on different platforms, making it harder to put the infrastructure puzzle together.

Datadog documentation on web pages has improved a lot and is pretty easy to follow and find.

Additionally, integrations with, for example, GCP, Network, component, and Software providers are much easier as everything is now centralized.

API and notification integrations are also a great benefit for our organization.

Datadog is listening actively for customer feedback and develops improvements for us effectively.

What is most valuable?

The most valuable features of the solution include the APM, log monitor, SQL monitors, network monitors, and integrations.

What needs improvement?

Alerting timing should be improved to be more fine-tuned and exact. The current problem is that monitoring is integrated with the Statuspage and the SLA.

Also, browser support for browsers other than Chrome should be added. Browser test recording is another problem, as it does not always work in normal mode. One needs to use incognito mode or a pop-up.

For how long have I used the solution?

I've been using the solution for three years.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
Sr Platform Engineer at a pharma/biotech company with 11-50 employees
Real User
Good logging with lots of great integrations and an interesting dashboard
Pros and Cons
  • "Datadog has a lot of features to be able to drill down deep into the swath of logs that our platforms generate."
  • "Some of the interface is still confusing to use."

What is our primary use case?

We use it mostly for logging log messages from our Kubernetes and EC2 instances, for example, system messages and errors. Also, we want log messages from our firewalls and other network infrastructure in case of network issues. We intend to use it for application logging, et cetera, to get insight into internal problems in the applications in Kubernetes pods. We want to use it for monitoring in case of system problems and hardware failures so that it can notify us.

How has it helped my organization?

It's good to have a single location for all the logs. If you have logs coming from a whole lot of sources, it makes it hard to find where the problem lies. 

We had to spend a lot of time logging into various systems and pursuing a billion different log files looking for something that stands out as a possible cause of the issue. That can take a lot of time and doesn't give much visibility into the possible interactions between systems.

What is most valuable?

Datadog has a lot of features to be able to drill down deep into the swath of logs that our platforms generate. 

It has a lot of ability to make fancy and deep searches using regular expressions and to graph them into useful and interesting dashboard graphs. 

The plethora of built-in/downloadable integrations make it much easier to set up for our platforms. Otherwise, we'd have to parse the log files ourselves, which would take a great deal of effort. Had to do it before when had to use an ELK stack for logging, which was painful.

What needs improvement?

Some of the interface is still confusing to use. It has many features, and it takes a lot of effort to figure out what they all mean. Maybe having tooltips or something would be helpful. Also, some of the integrations are better than others.

For how long have I used the solution?

I've used the solution for a month.

What do I think about the stability of the solution?

The solution seems very stable.

Which solution did I use previously and why did I switch?

Have used an ELK stack before. However, it took a lot of effort to maintain, and parsing the logs was difficult.

How was the initial setup?

We implemented the solution in-house.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
SRE at a computer software company with 51-200 employees
Vendor
Great for log aggregation, searching, and system monitoring
Pros and Cons
  • "The ability to easily drill down into log queries quickly and efficiently has helped us to resolve several critical incidents."
  • "Datadog could always lower the price!"

What is our primary use case?

We are using Datadog for server metrics, log aggregation and searching, system monitoring, alerting the team about errors, and dashboards for our developers. It's used by the Site Reliability Engineering team and Management of all levels. 

It's assisting us in proving SOC II compliance. 

We're looking to improve our usage of Datadog's RUM and APM components to get better and more performance insights on our production environments. 

We're also looking to leverage more synthetic monitors and runbooks for anyone responding to incidents.

How has it helped my organization?

The ability to easily drill down into log queries quickly and efficiently has helped us to resolve several critical incidents so far this year, and we heavily rely on a series of dashboards showing us various queues and load on CPU and memory for servers. 

We also have a view of the information required when we begin the patch and/or upgrade processes. 

I've also set up several monitors to alert the Site Reliability Engineering team when various metrics show a server might be reaching capacity. We use it to send an email suggesting we increase the size of the cloud instance.

What is most valuable?

The ability to easily drill down into log queries quickly and efficiently has helped us to resolve several critical incidents. We heavily rely on dashboards that are showing us various queues and load on CPU and memory for servers. 

We also have a view of the information required when we begin the patch and/or upgrade processes. 

I've arranged several monitors to alert the Site Reliability Engineering team when various metrics show a server that might be reaching capacity. We use it to send an email suggesting we increase the size of the cloud instance.

What needs improvement?

Datadog could always lower the price! In general, more demos online and maybe more free hands-on tutorials for basic functionality would be good for less technical users. 

I would also prefer more chances to amend the contract more than twice a year. As a smaller but growing company, it can be difficult to adequately predict demand.

For how long have I used the solution?

I've used the solution for more than three years.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Cloud Engineer at a tech services company with 10,001+ employees
Real User
Intuitive with high availability and good integrations
Pros and Cons
  • "The network map is crucial in identifying bottlenecks and determining what needs more attention."
  • "To be very fair, I haven't had enough experience with Datadog to pick out improvements."

What is our primary use case?

We are using the solution for scaling up the website for market data applications. EC2 and Datadog have enabled high-level monitoring of underlying infra and services.

The Datadog profiler comes in handy to pinpoint issues with resource utilization during peak hours, and traces/log management helps narrow down the root cause.

The network map is crucial in identifying bottlenecks and determining what needs more attention.

Host map helps identify problematic hardware and devise ways to counter issues that arise during scaling, and deploying solutions on the cloud.

How has it helped my organization?

While my team is relatively new to Datadog, I already see immense value in switching over to Datadog as the primary APM and NPM tool.

The arsenal of features it offers is bound to come in a clutch when facing production issues, and when finding out what went wrong is crucial.

The network map has helped to figure out the golden signals and optimize the infrastructure.

The synthetics have helped ensure the high availability of arch functions as intended.

What is most valuable?

The network map is useful. With it, we have the ability to see the data flow across the entire network path across all the applications is highly valuable as the data from this service helps identify network bottlenecks, non-performant applications, and bad endpoints.

This is especially crucial for a high-availability website aimed at market data applications where low latency is crucial.

The host map gives a clear picture of the entire infrastructure, and the ability to switch between logs, metrics, and traces is very handy when it comes to debugging issues on the fly.

I love the ability to install the integrations and agents quickly. This is a well-made product.

What needs improvement?

To be very fair, I haven't had enough experience with Datadog to pick out improvements.

My involvement with Datadog has largely been positive. I love the simplicity and intuitiveness it offers - even for nontechnical folks who just might be starting out with developing technical chops in their domain.

For how long have I used the solution?

I've used the solution for three years.

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
Updated: December 2023
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.