Datadog Primary Use Case

AF
Senior Software Engineer at a insurance company with 10,001+ employees

I have been using Datadog products and capabilities increasingly over the last 4 years, from POC to widespread adoption. 

The capabilities we use are unique for each use case and can be combined in various ways to provide the full observability coverage needed to maintain stable operations and shift from becoming more reactive to proactive. 

Our organization uses both site/service reliability for the range of backend and frontend services, custom monitoring, and dashboards that can be dynamic and reused for multiple teams.

View full review »
BH
Architect at SEI Investments

We primarily use Datadog for:

  • Native memory
  • Logging
  • APM
  • Context switching
  • RUM
  • Synthetic
  • Databases
  • Java
  • JVM settings
  • File i/o
  • Socket i/o
  • Linux
  • Kubernetes
  • Kafka
  • Pods
  • Sizing

We are testing Datadog as a way to reduce our operational time to fix things (mean time to repair). This is step one. We hope to use Datadog as a way to be proactive instead of reactive (mean time to failure).

So far, Datadog has shown very good options to work on all of our operational and development issues. We are also trying to use Datadog to shift left, and fix things before they break (MTTF increase).

View full review »
BH
Principal Enterprise Systems Engineer at a healthcare company with 10,001+ employees

We deploy agents on-premise to collect data on on-premise VM instances. We don't use Datadog in our cloud network. We do have some Cloud apps that we have it on and we also have Containers. We have it on their headquarters, the main software for them is on their own Cloud.

Eventually, we're building out the process now and using it better. We plan to use Datadog for root cause analysis relating to any kinds of issues we have with software, with applications going down, latency issues, connection issues, etc. Eventually, we're going to use Datadog for application performance, monitoring, and management. To be proactive around thresholds, alerts, bottlenecks, etc. 

Our developers and QA teams use this solution. They use it to analyze network traffic, load, CPU load, CPU usage, and then Tracey NPM, API calls for their application. There are roughly 100 users right now. Maybe there's 200 total, but on a given day, maybe 13 people using this solution.

View full review »
Buyer's Guide
Datadog
April 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
767,847 professionals have used our research since 2012.
Felix Flores - PeerSpot reviewer
Staff Engineer at a tech services company with 1,001-5,000 employees

We are using a mixture of on-prem and cloud solutions to bridge the gap with healthcare entities in the service of providing patients with the medication they need to live healthy lives.

Since we're a heavily regulated company, a lot of our solutions grew from on-premises monoliths. However, as we scaled out, it became harder and harder to move forward with that architecture. Today, we're investing heavily in transforming our systems from monoliths into distributed systems.

With this change in mind, the ability for us to connect the dots using Datadog has been invaluable.

View full review »
Enrique Bassallo - PeerSpot reviewer
AWS Cloud Architect Consultant at a manufacturing company with 10,001+ employees

Our company deploys the solution for our customers as an observability tool to define SLOs and SLIs along with logs and metrics. 

The solution includes incident, post-mortem, and root cause analysis that provides a level of truth for incidents and issues with applications. 

We have SREs and teams in operations, management, and applications who all access to the solution and ensure proper integrations. 

View full review »
SE
Director of IT at a consumer goods company with 201-500 employees

I used Datadog typically for monitoring website statistics and some of the cloud networking equipment.

View full review »
JulianLewis - PeerSpot reviewer
Senior Engineer at a educational organization with 5,001-10,000 employees

Datadog is a SaaS solution we tried for URL and synthetic monitoring. You record a transaction going into a website and replay that transaction from various locations. Datadog is mainly used by the admin, but three or four other guys had access to the reports and notifications, so it's five altogether.  

We probably tried no more than 8 percent of what Datadog can do. There are so many other bits and modules. I've only gone into about half of what APM can do in the Datadog stack.

View full review »
Jaswinder Kumar - PeerSpot reviewer
Senior Manager - Cloud & DevOps at Publicis Sapient

My customers were using Datadog for monitoring purposes. They were using it only because the solution is running on AWS and it's a microservices-based solution. They were using an application called Dynatrace for their log.

View full review »
RB
Senior Cloud Engineer, Vice President of Monitoring at a financial services firm with 10,001+ employees

We are using the solution for migrating out of the data center. Old apps need to be re-architected. We are planning on moving to multi-cloud for disaster recovery and to avoid vendor lockouts. 

The migration is a mix between an MSP (Infosys) and in-house developers. The hard part is ensuring these apps run the same in the cloud as they do on-premises. Then we also need to ensure that we improve performance when possible. With deadlines approaching quickly it's important not to cut corners - which is why we needed observability

View full review »
SB
Delivery Manager, DBA Services at a manufacturing company with 10,001+ employees

We use Datadog for monitoring to get the traces and logs of all our applications. Datadog provides dashboard and alert capabilities to identify if something is wrong with various teams. More than 200 users, mostly software engineers, work with Datadog. 

View full review »
RG
Architect at a comms service provider with 10,001+ employees

We use the solution primarily for distributed tracing, service insight and observability, metrics, and monitoring. We create custom metrics from outbound service calls to trace the availability of back-office systems. 

We use the flame graph to get insights into our GraphQL implementation. It helps highlight how resolvers work. 

However, it's lacking in tracing which GraphQL queries are run, and we use custom spans for that.

View full review »
GS
Software Engineering Manager at a hospitality company with 1,001-5,000 employees

We primarily use the solution for application monitoring (APM, logs, metrics, alerts).

It's useful for active monitoring (static monitors, threshold monitors). We get a lot of value out of anomaly detection as well. SLOs and monitoring of SLOs have been another value add.

In terms of metrics, the out-of-the-box infrastructure metrics that come with the Datadog agent installation are great. We have made use of both the custom metrics implementation as well as the log-based metrics which are extremely convenient.

We also leverage Datadog for use of RUM and want to explore session replay.

View full review »
Ramon Snir - PeerSpot reviewer
CTO at a tech vendor with 1-10 employees

We use Datadog for three main use cases, including:

  • Infrastructure and application monitoring. It is ensuring that our services are available and performant at all times. This allows us to proactively address incidents and outages without customers contacting us. This includes monitoring of cloud resources (databases, load balancers, CPU usage, etc.), high-level application monitoring (response times, failure rates, etc.), and low-level application monitoring (business-oriented metrics and functional exceptions to customer experience.
  • Analyzing application behavior, especially around performance. We often use Datadog's application performance monitoring on non-production environments to evaluate the impact of newly introduced features and gain confidence in changes.
  • End-to-end regression testing for APIs and browser-based experiences. Using Datadog's synthetic testing checks periodically that the system behaves in the exact correct way. This is often used as a canary to detect issues even before users reach them organically.
View full review »
MS
Staff Cloud Engineer at a energy/utilities company with 51-200 employees

We are using the solution for migrating out of the data center. Old apps need to be re-architected. We plan to move to multi-cloud for disaster recovery and avoid vendor lockouts. The migration is a mix between an MSP (Infosys) and in-house devs. The hard part is ensuring these apps run the same in the cloud as they do on-prem. Then we also need to ensure that we improve performance when possible. With deadlines approaching quickly, it is important not to cut corners which is why we needed observability.

View full review »
AN
Software Engineer at a financial services firm with 10,001+ employees

We use it to monitor and alert our ECS instances as well as other AWS services, including DynamoDB, API Gateway, etc. 

We have it connected to Pagerduty for alerting all our cloud applications. 

We also use custom RUM monitoring and synthetic tests for both our internal and public-facing websites. 

For our cloud applications, we can use Datadog to define our SLOs, and SLIs and generate dashboards that are used to monitor SLOs and report them to our senior leadership.

View full review »
Jon Schwartz - PeerSpot reviewer
Senior Software Engineer at LeafLink

We use Datadog to view and aggregate logs and monitor all of our services. We have a lot of running infrastructure and it is very convenient to have logs and metrics all aggregated somewhere we can view and chart them. 

I use Datadog to create dashboards and runbooks, and sharable graphs, which really help out my whole team. We mostly use logs and APM, yet have been starting to use other products. I would like to use more synthetic monitors.

View full review »
BW
Software Engineer at Enable Medicine

We primarily use the solution for log monitoring across our entire cloud infra (EB, EC2, Batch, and Lambda).

This is in addition to Rstudio Workbench, which has its own logs that would not be picked up via Cloudwatch(https://docs.rstudio.com/ide/server-pro/server_management/logging.html#default-log-file-locations). 

We own several dozen of these servers, and we used to manage instance logs by tailing logs when incidents occurred. Datadog allows for much better visibility across our entire fleet and has saved us countless hours.

View full review »
Rawat Singhsatit - PeerSpot reviewer
Solutions Consultant Manager at MFEC

We use this solution for our customer's IP and to support their cloud infrastructure.

View full review »
FO
SRE at a financial services firm with 10,001+ employees

We deploy various services for our main platform on AWS across multiple regions. We have a development environment, a staging environment, a QA environment, and a production environment. We deploy our many services across hundreds of instances. 

We have many server farms, all responsible for various services on our market intelligence platform. The deployment of each server farm or even individual instances varies depending on what stood up. We have instances built in three different ways, with two different pipelines and some even on user data scripts.

View full review »
FC
API Developer at a tech services company with 501-1,000 employees

We use the solution for monitoring, logging, and alerts. 

Thanks to Datadog, we report errors using the logger integrated into our services, which is crucial since we only do unit tests. The infrastructure team handles the monitoring part, so I can't give more insights about that. I am an API developer, so I use Datadog mainly for logging.

The alerts are connected to Microsoft Teams in a specific channel, and we pay a lot of attention to it, and we usually create tickets based on these alerts.

View full review »
DG
VP, Application support at a financial services firm with 10,001+ employees

We primarily use the solution for the service catalog.

We use this type of offering for our Microservices applications, and it gives a good view of flow. It is a must when we have different developers working on different services.

Having the trace and log features are useful for locating the microservice for the on-call person.

We would like to see some more useful applications for health monitoring where we can customize the cases based on data from the database.

It needs to have the facility to monitor data inside tables and the status of the UI.

View full review »
RC
Senior Director with 10,001+ employees

We used Datadog to capture the salvatory of our AWS fleet of around 1,200 servers.

View full review »
RA
Senior IT Manager at a financial services firm with 1,001-5,000 employees

The main use cases are to provide visibility to costs for each product in the company as well as to consolidate all the observability in one tool. We are moving the team from being an operational team that needs to keep the tool up and running (applying patches and resolving problems) to a team that is focused on providing meaningful visibility of the systems, applications, and services of the company. We want to add value where the developers and the systems administrators are not able to focus.

View full review »
MC
VP at a financial services firm with 10,001+ employees

The product is used for APM solutions for the metrics and traces for the REST API requests and service maps to understand the upstream and downstream services.

We are creating dashboards and widgets to monitor the status. We are creating alerts and monitors as well. We integrated the alerts and ticketing system in our organization with SNOW and Netcool.

We are using Kubernetes, AWS, and infrastructure metrics. We are using Kafka and Aurora Postgres logs as well, and we are using HTTP status codes to identify the error types.

View full review »
CY
Senior Manager at a manufacturing company with 10,001+ employees

This solution is for physical device monitoring across breweries, including PLCs, HMI Cameras, RFID panels, scales, etc. We want to gain visibility into these devices to influence predictive maintenance and unscheduled downtime. We want to monitor physical devices across the zone from a control tower perspective for end users and support teams alike. Understanding more about the performance of the devices and mechanical components will allow us to schedule downtime to fix imminent catastrophic failures and prevent unplanned downtime and lost revenue.

View full review »
CF
ITOPS and SRE Manager at Ticket

We primarily use the solution for observability.

View full review »
Nuno Rosa - PeerSpot reviewer
Principal Consultant at Infosys

The solution is basically used for servers and applications.

View full review »
MP
Sr. Manager - DevOps at a aerospace/defense firm with 10,001+ employees

We primarily use the solution for logging and APM, and for real user metrics.

View full review »
James Baird - PeerSpot reviewer
Infrastructure Engineer at a tech services company with 11-50 employees

We currently use it for log aggregation and SEIM. We send logs from our AWS account (particularly our Cloudtrail and S3 logs) and use them to give us security signals. 

This has helped with our SOC2 certification process and has given us a window into our processes and the security holes in our system. 

We are also considering using the APM features to help with our development effort. We want to be able to profile all of our code and see what is going on with it.

View full review »
VM
Software Engineering Manager at a healthcare company with 501-1,000 employees

We mainly use the product to monitor our infrastructure and apps. It is the go-to tool when we want to check that things are running properly. We use Datadog synthetic monitors to ensure our app works across different locations in the United States. 

We also have set up Datadog monitors to send alerts if things stop working as expected. 

We use Continuous Integration Pipeline visibility to make sure our developers are not being blocked by infrastructure and other things that might be out of their control.

View full review »
ER
Senior Software Engineer at a transportation company with 51-200 employees

We primarily use Datadog for alerts. If we're running out of database connections or CPU credits we want to find out in Slack. Datadog provides nice features for that.

Secondarily, we use Datadog for analyzing historical trends and forecasting potential issues.

I'm trying to learn how to add in Continuous Profiler in our primary backend servers and set up Synthetic Tests for monitoring our front end.

Everything is mostly on AWS, and the Datadog integrations help a ton.

View full review »
VM
Senior Cloud Engineer at a comms service provider with 10,001+ employees

We use the solution primarily for platform monitoring for the services that are deployed in AWS. It gives a better way to monitor the services, including pods, cost, high availability, etc. This way, observability is ensured and also customer services are uninterrupted. 

Also, we host the data pipelines between the cloud and the on-prem for which Datadog is used to ensure better services. We report issues based on the metrics reported over it. 

View full review »
LuWang - PeerSpot reviewer
DevOps Engineer at Screencastify

We use Datadog for observability and system/application health, mainly for product support, triaging, debugging, and incident responses.

We use a lot of the logging and the Datadog agent to collect logs, metrics, and traces from our GKE workloads. We use APM and continuous profiling for latency and performance measurement. We use RUM to observe frontend user events, such as tracing on request and what actions they take before errors occur. We also use error tracking and source maps to debug production failures.

We are still relatively new to the product, and we are planning to use more of the notebook functionality and power packs to record run books and break knowledge silos. We also need to utilize dashboards and continuous profiling more for performance measurement and integrate Datadog alerts for incident response.

View full review »
RD
Software Engineer at Spring Health

We share dashboards, set up alerts, and monitor everything that happens in our system. We use it in staging, features, production, and our load test environment. It is exceptionally helpful for making our engineering more data-driven. 

I came from a company that believes we should focus on being telemetry driven. Instilling this in a smaller, less mature engineering organization has been challenging. However, it is much easier while using Datadog.

View full review »
TC
Infrastructure engineer at a insurance company with 10,001+ employees

Our use case is to provide cloud organization application monitoring. I use it for insight into what host in what region has activity or what market is using Datadog to its fullest potential and utilizing that for cost. This may also help determine who is using monitoring and setting alerts or just setting up monitoring and not doing anything about it. The use case can also be to check when the host or applications are down, or if the usage of CPU, memory, etc, is too high.

View full review »
WZ
Software Engineer at a tech vendor with 1,001-5,000 employees

We use the solution for application hosting and a little bit of everything when it comes to supporting a worldwide logistics tracking service. It's used as a central service for collecting telemetrics and logs. We find it does the same work as all of our old tools combined, including Prometheus, Kibana, Google Logs, and more; putting all of this information in a single platform makes it easy to corroborate information and associate a request with the data, which might be lost when it is saved as logs.

View full review »
AM
Security Engineering Manager at a financial services firm with 201-500 employees

I use the solution to manage security-related logs and metrics, as well as create detection rules for security events. I am a security engineer, so one area of interest is the CSPM product, giving us the ability to look at findings across the cloud environment. 

The great part about the Datadog security products is that they incorporate the context of the resources/hosts where the security event is found. This allows us to see exactly what is running on a host that we see as a security alert.

View full review »
Plinio Moreira - PeerSpot reviewer
Sales Engineer at Delfia

I'm a Datadog partner in Brazil, and I monitor all my applications with Datadog too. I would like to enable all features in my DPN portal and get access to custom demos. We resell Datadog and a full stack of pre-sales, sales, and post-sales services. We have customers for all sectors, including governmental, financial services, services in general, telecom, et cetera. Today, we are the biggest Datadog partner in Brazil, and we are searching for an expansion in our MSP environment.

View full review »
RD
Director of Software Engineering at Code Climate

We primarily use the solution for charting application metrics.

We use it for all our application metrics, host metrics, and monitors with a PagerDuty integration. 

We integrate our application logs. It is great to be able to tie our metrics and our traces together.

We use the APM module with traces. It is great to be able to link APM, logs, and metrics in one go, as it shortens our troubleshooting and RCA dramatically.

We are loving the tool; it is great to have all those insights in one place. 

We hope that they keep making my life and our engineers' life easier.

View full review »
BS
Senior Site Reliability Engineer at a comms service provider with 501-1,000 employees

We primarily use the solution for centralized dashboarding and telemetry viewing for teams across the organization. 

We're focused on ensuring that both development teams and leadership can reasonably gain insights into the status of various systems. 

At the end of the day, managing various dashboards and metrics aggregators like Prometheus, Kubernetes server, AWS Cloudwatch, and Grafana have lead to some confusion, and we've had issues with teams not knowing where their data exists and where they can view their system metrics. 

Datadog has proven to be easy to set up and legible for both development and operational teams.

View full review »
SA
Cloud Engineer at a tech services company with 10,001+ employees

We are using the solution for scaling up the website for market data applications. EC2 and Datadog have enabled high-level monitoring of underlying infra and services.

The Datadog profiler comes in handy to pinpoint issues with resource utilization during peak hours, and traces/log management helps narrow down the root cause.

The network map is crucial in identifying bottlenecks and determining what needs more attention.

Host map helps identify problematic hardware and devise ways to counter issues that arise during scaling, and deploying solutions on the cloud.

View full review »
AL
Associate at a financial services firm with 10,001+ employees

We use the product for recording loggers on our various services across different teams. For example, we use logs to keep track of info logs for events and error logs to catch exceptions. 

When users ask us to investigate a situation, we use logs to keep track of events and where the user's code traveled to. We also use synthetic testing and monitoring features to keep track of our many alerts in the production and QA environments.

View full review »
MF
Software Developer at a pharma/biotech company with 51-200 employees

We’re currently using logging, monitoring, metrics, APM, etc.

We've started to use e-SLOs, however, it takes a bit of time to work through those.

RUM has been very useful. I have used this in the past to debug problems in production, which has been g great.

We also want to start using synthetics and tracing more. 

Our application currently runs in many different environments based on our customers' requirements. This allows us to see everything in one place and filter by environment as required, which is extremely useful.

View full review »
CS
Product SRE at a computer software company with 51-200 employees

We use Datadog for application logs, error tracking, performance tracking, alerting, and overall production state surveillance. 

It helps us improve observability and ease of maintenance through better information for our support teams and their issue qualification. 

We also use dashboards to keep all the information at ready and easy to access. SLOs notably for our uptimes but also our feature usage. It also feeds our alerting for our on-call SREs into PagerDuty by launching alerts when specific parameters are exceeded.

View full review »
TV
Lead Architect at a computer software company with 11-50 employees

We primarily use the solution for log management and application performance monitoring. We have been getting into using more solutions on Datadog, such as runbooks, monitoring, and dashboards. 

Another area that we've been investing some time in is the database monitoring. We've been able to get some relatively new employees onboarded into the tool, and they've been able to create some meaningful dashboards and reports without too much hand-holding at all. 

We plan on exploring the synthetics solution as well.

View full review »
EL
Lead Software Engineer at a retailer with 51-200 employees

We are trying to get a handle on observability. Currently, the overall health of the stack is very anecdotal. Users are reporting issues, and Kubernetes pods are going down. We need to be more scientific and be able to catch problems early and fix them faster.

Given the fact that we are a new company, our user base is relatively small, yet growing very fast. We need to predict usage growth better and identify problem implementations that could cause a bottleneck. Our relatively small size has allowed us to be somewhat complacent with performance monitoring. However, we need to have that visibility.

View full review »
SM
Engineering Manager at Indeed.com

I primarily use the solution to learn, watch and monitor business and engineering metrics in the production and QA environments of my team. 

We create monitors on key business metrics and observe regressions and anomalies.

Less often, I leverage the events ability in Datadog to get notified about significant activities happening in my teams' deployments.

We learn about Datadog monitor alerts through Slack and often attempt to create SLOs using Terraform.

We use APM for observability.

Most recently, I learned about WatchDog Alerts that I will be heavily looking into.

View full review »
HL
Sr Platform Engineer at a pharma/biotech company with 11-50 employees

We use it mostly for logging log messages from our Kubernetes and EC2 instances, for example, system messages and errors. Also, we want log messages from our firewalls and other network infrastructure in case of network issues. We intend to use it for application logging, et cetera, to get insight into internal problems in the applications in Kubernetes pods. We want to use it for monitoring in case of system problems and hardware failures so that it can notify us.

View full review »
Ian Schell - PeerSpot reviewer
Senior Site Reliability Architect at a tech vendor with 1,001-5,000 employees

We use Datadog for general observability into our infrastructure, as well as running analytics queries for our SLI/SLO platform. This helps all of our teams be informed of how well their products are actually performing in production, and aim their efforts at the thing that will provide the highest ROI. 

We also use it for general monitoring and alerting during load tests and service releases to detect any issues related to the deployments. This helps us maintain our high contractual uptime promises to our clients.

View full review »
JC
Software Engineer at Enable Medicine

We mostly use it to handle log aggregation, monitor our web application, and alert us on data pipeline failures. 

Our system is fully on AWS, and so we pipe in all of our Cloudwatch logs into Datadog to have a central place to index and search logs. 

Our web app is built on an Elastic Beanstalk backend, and we use the Datadog agent to keep track of all of the requests that hit our backend and all of their components. 

We also use the prebuilt AWS pipeline dashboards to monitor our batch jobs and lambdas.

View full review »
JK
IT Test Manager at a transportation company with 10,001+ employees

Our primary use case is log management and we also use the solution for monitoring the application and underlying infrastructure. I'm an IT test manager. 

View full review »
reviewer1479957 - PeerSpot reviewer
Senior Director of DevOps at Housecall Pro

We primarily use Datadog for the monitoring of EC2 and ECS containers running mostly Rails applications that host a SaaS product. We also monitor ElasticSearch and RDS, and we are working on adding their Application Performance Monitoring solution to monitor our applications directly.

We use DataDog to create dashboards, graphs, and alerts based on interesting metrics. DataDog is our first place to look to find the performance of our system.

We also use their logging platform and it works well. Especially useful is that the logs and metrics are tightly integrated so you can jump between them easily.

View full review »
MI
Site Reliability Engineer at a computer software company with 201-500 employees

We use it for custom metrics of our applications and monitoring of our systems.

View full review »
YS
Works

Our primary use case would be using the dashboards and getting proper insights based on the dashboards.

The monitoring, SLO, and SLA have been better and easier since we started using the Terraform infrastructure. APM has been easier as we had to enable it through the CronJob directly.

Profiling has been made easier. We are able to get many insights into the code. Profiling provides really good insights right now. 

Logs are the most valuable and the best solution so far. Datadog can help solve any slow queries or database-related errors. 

The primary use case would be using the dashboards and getting proper insights based on the dashboards.

View full review »
DM
Atlassian Expert at a tech consulting company with 51-200 employees

We are providing managed services to our customers across multiple industries. Datadog is key to delivering these services by bringing the observability, monitoring, and alerting capabilities we need to operate at scale. 

We operate custom cloud native workloads as well as ISV products such as Atlassian Jira or Confluence. 

Integrating Synthetics, infrastructure, and application performance monitoring, as well as piping all logs through Datadog allows us to operate more with less with good alerting right in time.

View full review »
DK
Senior Director at a tech vendor with 1,001-5,000 employees

We primarily use the solution for the RUM, security monitoring, and streams.

We need to monitor users and what they access. We also need to identify security loopholes and attack patterns and identify and quickly respond to issues.

We can identify pushbacks, and get insight into application components that stack up with each other. We can understand which components, libraries, and code to alert teams.

Using Datadog, we can raise incidents, track incidents to completion, and be able to gather data for reporting and post-mortem.

The solution allows us to track fixes and tracks their test coverage. With it, we get confidence in the fix/improvement phase and be able to provide a response.

View full review »
AA
Technical Lead at a wholesaler/distributor with 1,001-5,000 employees

We use Datadog for observability and monitoring primarily. Various cross-functional teams have built various dashboards, including Developers, QA, DevOps, and SRE. 

There are also some dashboards created for senior leadership to keep tabs on days to day activities like cost, scale, issues, etc. 

Also, we've set up monitors and alarms that kick off when any metrics go beyond the threshold. With Slack and PagerDuty integration, correct team members get alerted and react to solve the issue based on various runbooks.

View full review »
MF
Software Developer at a pharma/biotech company with 51-200 employees

We’re currently using logging, monitoring, metrics, APM, etc.  

We've started to use קSLOs. However, it takes a bit of time to work through those.

RUM (Real User Monitoring) has been very useful. I have used this in the past to debug problems in production, which has been great.

We also want to start using synthetics and tracing more. 

Our application currently runs in many different environments based on our customers' requirements. This allows us to see everything in one place and still filter by the environment as required, which is extremely useful. 

View full review »
reviewer1494894 - PeerSpot reviewer
Senior Manager, Site Reliability Engineering at Extra Space Storage

We primarily use Datadog for logs, APM, infrastructure monitoring, and lambda visibility.

We have built a number of critical dashboards that we display within our office for engineers to have a good understanding of the application performance, as well as business partners to understand at a high level the traffic flowing through the app.

We started with logging, as our primary monitor, and have shifted to APM to get a deeper understanding of what our system is doing, and how the changes we are making impact the apps.

View full review »
reviewer1476039 - PeerSpot reviewer
Network Engineer / AWS Cloud Engineer / Network Management Specialist at CareFirst

We were in need of a cloud monitoring tool that was operationally focused on the AWS Platform. We wanted to be able to responsibly and effectively monitor, troubleshoot, and operate the AWS platform, including Server, Network, and key AWS Services.

Tooling that highlighted and detected problems, anomalies, and provided best practice recommendations. Tooling that expedites root-cause analysis and performance troubleshooting.

Datadog provided us the ability to monitor our cloud infrastructure (network, servers, storage), platform/middleware (database, web/applications servers, business process automation), and business applications across our cloud providers.

View full review »
MG
Software Engineer at a tech services company with 501-1,000 employees

We use actual user monitoring and have set up thresholds for alerts to PagerDuty, Sentry, Slack, and so on. We also have dashboards set up for tracking latency and error rates. 

As an individual contributor, I also try to set up dashboards for the individual feature projects I work on. I'd like to learn more ways to use this, though, especially when it comes to more proactive approaches to issues. A starter pack of common-use types would be nice.

View full review »
ME
Devops Engineer II at a comms service provider with 11-50 employees

We use the solution for monitoring our logs across distributed clusters. Right now, we have an Elasticsearch solution that is tied to each platform (our product is a PaaS solution). 

We are looking at moving to a single pane of glass solution, which Datadog would be good for (plus, we could wrap up other tools like Prometheus, Grafana, Pagerduty, Pingdom, and more). We want to be able to have Datadog running on one single cluster and ingesting and processing logs from all our distributed clusters.

View full review »
AM
Software Engineer at a comms service provider with 11-50 employees

We use different tools for log collection and monitoring. Using Datadog will combine different use cases into one product that will be easier to manage. 

The tools we use are open-source, so there is no commercial support. Having customer support would be ideal since we're a small team. 

Profiling would be another great feature to have. Currently, it's manual. Having Datadog would give us a standard, and we don't have to do much manual work.

View full review »
RH
SRE at a computer software company with 51-200 employees

We are using Datadog for server metrics, log aggregation and searching, system monitoring, alerting the team about errors, and dashboards for our developers. It's used by the Site Reliability Engineering team and Management of all levels. 

It's assisting us in proving SOC II compliance. 

We're looking to improve our usage of Datadog's RUM and APM components to get better and more performance insights on our production environments. 

We're also looking to leverage more synthetic monitors and runbooks for anyone responding to incidents.

View full review »
KO
support Eng

We use the application for our application monitoring, data security monitoring, and log management. What we like about the application is that it helps us to track issues more proactively instead of reactively.

There are other improvements we would like to see.

1. Being able to restrict users from seeing or viewing specific dashboards once they log in

2. They can cut down the prices for Cloud SIEM. It seems very useful, however, the prices are high. Some organizations are finding it difficult to make decisions in terms of getting the tool.

View full review »
PL
SRE at a tech vendor with 201-500 employees

We use an enterprise version of a CMS platform which is enabling businesses to transmit content to their customers. The tool is fully customizable to the end user, including out-of-the-box integrations as well as APIs for custom plugin support. 

Our systems fully manage content using AWS as the back-end cloud provider. Assets are kept in secure buckets and utilize the Kubernetes infrastructure to deliver our product to end users and internal authors. Using the CMS allows for business people to manage content without needing development efforts.

View full review »
NY
SRE at a financial services firm with 10,001+ employees

We primarily use the solution for observability, metrics, logs, tracing, and end-to-end user flow monitoring. 

We are looking to implement this as a company-wide standard for cloud solutions.

At this time, we're currently in a POC, and we're interested in using either a Datadog agent or the OTel agent with a Datadog exporter. We have dashboards with panels that correlate metrics and allow you to link through to traces. Flame graphs to show latency across services and the various spans. 

While we are not security minded, we still require it and are interested in more. It's used for monitoring critical systems.

View full review »
NK
Lead Blockchain and Back-End Developer at Torum Technology

We are using the solution from a monitoring and management perspective. We use it for alerts.

View full review »
EY
Software Engineer at Sony Corporation of America

If our app is up and running, we use it to monitor how many credits the app is using up on each node. We also monitor services by how long each call is taking with the help of EC2s off of application.

View full review »
CC
Senior Site Reliability Engineer at a tech vendor with 10,001+ employees

Datadog provides us with a solution for data ingesting for all of our application metrics, resource metrics, APM/tracing data etc. 

We use it for use in dashboards, monitoring/alerting, SLO targets, incident response etc. 

We have a lot of applications across multiple languages/frameworks etc., and have deployed in Kubernetes across multiple regions in AWS, along with underlying managed resources such as SQS, Aurora, etc. 

Datadog makes understanding the state of these seamless. We are a company with millions of daily active users, and this level of detail is excellent.

View full review »
BK
Senior Engineering Manager,Mobile Wireless Engineering at a comms service provider with 10,001+ employees

The product is primarily used for the DevOps team. 

View full review »
WF
Cloud Specialyst at a financial services firm with 501-1,000 employees

We collect all data logs from all operating systems, such as Windows, Linux, VMware, and bare metal data centers. We also automatize the installation of the agent on servers. 

Now we are starting a POC to analyze the APM module. In the feature, the next step is to do a POC of security modules. 

The final idea is to have a unique portal for observability. This will make it easy to troubleshoot and for layer levels 1 and 2. 

View full review »
TW
Test Engineer at a tech services company with 1,001-5,000 employees

We're moving towards the cloud yet still have several active data center contracts. As we move to the cloud, we are interested in knowing more about our services, and DataDog APM/logs should give us this perspective. 

We currently use the infrastructure monitoring part of DataDog. Still, I've really seen the advantage of moving more data into the cloud for comparison and being able to have one place where we can view all related pieces of information regarding a possible incident or potential issue.

View full review »
SM
Senior Manager, Cyber Digital Transformation at a security firm with 1,001-5,000 employees

We have used this solution primarily for application performance monitoring. To do this, we needed to make sure we had the right data in the system so that people could be able to monitor their applications end-to-end.

View full review »
CC
DevOps Engineer at a printing company with 51-200 employees

Log aggregation for us was a key component since we have a fairly old-school app running on VMs on bare metal. We previously didn't have much insight into our logs unless we manually tunneled them into each server.

The solution is reducing manual labor in troubleshooting problems in our environments server by server.

We also needed to monitor our Java app and MySQL database to understand their problems so that we could take action and resolve them.

Our use cases have since expanded to encompass all aspects of monitoring.

View full review »
AS
Production engineer at a consultancy with 51-200 employees

We have deep integration with Datadog for observability and monitoring.

We use everything from APM, logs, and RUM to monitor and dashboards for tracking system health. 

We are trying to move from many different solutions for error tracking/observability to a single platform (Datadog).

We are currently in the process of setting up logging in Datadog in order to maintain our logs better. We are looking to create more insights into the real user flows by using real user monitoring (RUM) too.

View full review »
GH
Cloud Engineer at a financial services firm with 51-200 employees

After a security incident, we needed to find and migrate to a different cloud provider, and after evaluating different competitors and the skill set of the team, we decided to move to AWS. AWS also enables the team to have finer control over how our apps are deployed and how security and access are managed. By leveraging AWS's functionality, we have increased our application's security and sped up the deployment process. We've even been able to handle higher workloads due to AWS's auto-scaling functionality.

View full review »
MZ
Software enginneer at a construction company with 1,001-5,000 employees

We use the solution for monitoring time spent on views and events triggered. For example, for one of our products, we have created a custom dashboard that lets us track all the custom events and multiple entry points in the same part of the application.

Knowing the entry point helps us choose which part of the program should be improved next. It also helps us with collecting important data about the overall usage of each module within our application. 

View full review »
RC
Director Of Software Development at Major League Baseball

We primarily use the solution for monitoring and telemetry.

We use lots of log collections, log-based metrics, and dashboard visualization.  The logging, metrics, and APM are vital.  

View full review »
AM
Senior Cyber Security Expert at a security firm with 11-50 employees

We implement these solutions for our clients. We have implemented Datadog as an SIEM solution.

View full review »
AB
Director of Cloud Operations at a tech services company with 11-50 employees

Our clients use it for monitoring applications. Its deployment depends on our customer's use case. 

It is 100% cloud. We have got a multi-tenant environment, so we segment it out.

View full review »
PA
Co-Founder at Exeo IT

We're in the process of doing a Proof of Concept with the solution right now.

View full review »
reviewer1480866 - PeerSpot reviewer
Director of DevOps at Digital Media Solutions Group

We primarily use this product for availability and performance monitoring, log aggregation.

View full review »
JL
Lead Application Developer at a retailer with 10,001+ employees

We primarily use the solution for monitoring and log analysis.

View full review »
SB
Cloud Architect at a tech services company

We are a solution provider and Datadog is one of the products that I was working on with one of my clients. They are currently evaluating it for use in cloud monitoring.

Specifically, Datadog is used for monitoring cloud applications in terms of performance. The logs come into this solution from AWS and it provides dashboards for various environments.

View full review »
CG
Cloud Engineer at a retailer with 51-200 employees

I am using the solution for monitoring metrics, logs, traces, etc. It's mainly for making dashboards as well as monitoring our services. 

We also use Datadog to help centralize our incident management to show the logs, where issues spiked, and some metrics. 

We use Datadog to do troubleshooting in Kubernetes, specifically in our Azure Kubernetes service. Beyond that, we are looking to use open telemetry in tandem with Datadog to further our log-tracing efforts. In the future, this may be expanded.

View full review »
EX
Lead Support Engineer at a tech vendor with 11-50 employees

Our use case is mainly deploying into our applications for monitoring/logging observability. We currently have our microservices feed into an actuator that exists in each instance of our application that extends to a local and central Grafana for client and internal visibility. The application we use is Grafana.

Logging captures application and system logs that are ported to each application instance for querying.

Whenever anything occurs that is considered unhealthy from a range of health checks, we have notification rules configured internally and externally for a prompt response time.

View full review »
JL
Software engineer at a marketing services firm with 501-1,000 employees

We use metrics to track the metrics of our application. We use logging to log any errors or erroneous application behavior as well as successful behavior. We use events to log successful steps in our pipeline or failed steps in our deployment. We use a combination of all these features to diagnose bugs. 

It makes it much more efficient to look at all the data in one place. This speeds up our development speed so that we can be agile.

View full review »
LM
Principal Solutions Architect at a security firm with 51-200 employees

One of the things we use it for is the same thing that we use FullStory for, which is to replay customer interactions with our platform. However, it also does the monitoring. It's like monitoring cloud tools. We're really mostly monitoring our own software to make sure that everything is functioning properly. We can check a bunch of things, and we can even play back customer sessions. It’s basically monitoring our application.

View full review »
EB
AWS Cloud Architect Consultant at a transportation company with 10,001+ employees

We are evaluating Datadog for observability and monitoring requirements that we have in our company. In our use case, our intention is to provide some kind of framework for multiple app teams to use the tool for our cyber ability and engineering practices.

View full review »
DJ
Software Engineer at a media company with 51-200 employees

We run the agent in AWS. 

View full review »
MW
Principal Software Engineer at a insurance company with 10,001+ employees

We use the solution for testing all of our application's endpoints. It is making sure that they work on a consistent basis.

View full review »
SR
Data Engineer II at a comms service provider with 10,001+ employees

Ingesting data from various sources to monitor the log metrics of the system and enabling an alert mechanism to notify the teammates if something goes wrong.

More specifically, having Datadog agents as integration to different services provides easy access and management.

View full review »
BS
Sr. Director of Software Engineering at a tech consulting company with 1,001-5,000 employees

The RUM is implemented for customer support session replays to quickly route, triage, and troubleshoot support issues which can be sent to our engineering teams directly. 

Customer Support will log in directly after receiving a customer request and work on the issue. Engineers will utilize the replay along with RUM to pinpoint the issue combined with APM and Infra trace to be able to look for signals to find the direct cause of the customer impact. 

Incident management will be utilized to open a Jira ticket for engineering, and it integrates with ITSM systems and on-call as needed.

View full review »
KW
Senior Software Engineer at Grata

We primarily use the dashboard/metric with many tags.

View full review »
AP
Technology Competency and Solution Head at LearningMate

We use Datadog for application monitoring, to help identify errors. It is also used to monitor application performance.

It helps organizations to understand User Experience with user behaviour pattern

View full review »
reviewer1486134 - PeerSpot reviewer
Infrastructure Engineer at DATACAMP, INC

We use Datadog as a monitoring platform to achieve visibility into our container environments.

Almost all of our workloads are containerized and with DataDog, we are able to get metrics, logs, alerts, and events about all the containers that we are running. Our developers also extensively use APM to find and diagnose performance issues that might appear.

We use Terraform to automatically create all of the necessary monitors and dashboards that our developers need to make sure that our level of service is sufficient.

View full review »
reviewer1477686 - PeerSpot reviewer
Senior DevOps Engineer at DigitalOnUs

Our primary use of Datadog includes: 

  • Keeping a close look into our AWS resources. Monitoring our multiple RDS and ElastiCache instances play a big role in our indicators.
  • Kubernetes. We aren't using all of the available Kubernetes integrations but the few of them that work out of the box adds great value to our metrics.
  • Monitoring and alerting. We wired our most relevant monitoring and alerts to services like PagerDuty, and for the rest of them, we keep our engineers up to date with constant Slack updates. 
View full review »
PK
Product Manager, Delivery Engineering at a media company with 1,001-5,000 employees

The main use case is observability and reliability as part of a platform/delivery engineering solution. We use the product to assist tenants and clients within the company to get more ramped up on SRE/DevOps.

View full review »
BA
Performance Testing Manager at a tech services company with 10,001+ employees

I'm not sure which version we're using, although I believe it to be the latest. 

We essentially use the solution in advance of performance testing, performance monitoring, and troubleshooting.

View full review »
RG
Software Engineering Manager at a tech vendor with 51-200 employees

I am using Datadog for error reporting.

View full review »
reviewer1493811 - PeerSpot reviewer
Sr. Architect - SaaS Ops at CommVault

We primarily use DataDog for performance and log monitoring of cloud environments, which include VMs and Azure Services like Azure compute, storage, network, firewall, and app services via event hubs.  

Alerting based on monitors via teams and PagerDuty

Logs collection for Azure services like Azure database, Azure Application Gateway, Azure AKS, and other Azure services.

Custom metrics using a Python script to collect metrics for components not natively supported by Datadog.

Synthetic testing to ensure uptime and browser tests via CI/CD pipeline.

View full review »
BB
Software Engineer at Lovepop

The primary use case is application monitoring. We also use it set custom metrics and watch our AWS metrics, as well as data.

At my current job, I have only use it a couple months. However, I used it for a few years at a previous company.

View full review »
JL
Software Engineer at a financial services firm with 501-1,000 employees

We use Datadog to monitor our Kubernetes clusters. 

We have 3 different clusters for different parts of the SDLC. We run the Datadog agent DaemonSet as well as the Datadog cluster agent. Our services have the APM installed by default. 

To create monitors, we use Terraform. This is provided out-of-the-box for our service owner. 

We run EKS on top of K8s, therefore, we also make use of some of the AWS monitoring capabilities that can be integrated into Datadog. 

We are hugely reliant on Datadog for all aspects of our system.

View full review »
JL
Chief Strategy Officer (CSO) at a computer software company with 11-50 employees

We use Datadog to monitor our product on the cloud.

View full review »
FC
Tech Lead & Solutions Architect at DXC

I implement this solution for clients.

View full review »
EB
manager at a financial services firm with 501-1,000 employees

We use the solution for logs from all our applications. In Datadog, for monitoring logs, our team creates an automation for implementing massive logging in all our systems. Now, we are deploying it in our core systems.

View full review »
AG
Site Reliability Engineer at a financial services firm with 1-10 employees

Our company is transitioning to using the solution for monitoring and analytic services we provide to customers. Once fully rolled out, there will be 80-100 users companywide. 

View full review »
PG
Software Engineer at a comms service provider with 5,001-10,000 employees

We primarily use the product for tracing, metrics, and alarms in various deployment environments.

View full review »
NP
Sales Engineer at a tech services company with 201-500 employees

The solution is primarily used for better understanding the health of applications, modern environments, and many other solutions, which are the main focus of Datadog and many other monitoring tools.

With Datadog specifically, I can look at the health of the technology stack and services, and also integrate multiple metric sources, security, business data, and much more. This makes it a real software solution for centralizing data and unifying monitoring silos in one place. Datadog is like a hub - not just a monitoring software.

View full review »
PN
Head of Digital & Cognitive Services at a tech company with 11-50 employees

We use it for monitoring and instrumentation of security. We secure our databases and servers. It is typically for the security of apps, services, and systems. We are using its latest version.

View full review »
DD
Senior Solutions Architect at a tech services company with 11-50 employees

We are using the infrastructure and app monitoring side, such as process monitoring. We are using it in a very traditional way. We are not using the APM capabilities. When it comes to something like containers, we will generally use it on the host but not inside the container itself. 

We are using it with our customers and in-house day-to-day.

View full review »
SM
Sr. Software Engineer at a tech vendor with 51-200 employees

Observability is a key use case, as is security.

View full review »
SH
Senior Cloud Security Engineer at a financial services firm with 201-500 employees

I'm a senior cloud security engineer and we are customers of Datadog. 

View full review »
AS
DevOps Engineer at Spark New Zealand

We use it for notifications, alerting, and capturing most of the information from Amazon, such as EC2 instances.

View full review »
IA
Cloud Operations Engineer at a tech vendor with 10,001+ employees

We primarily use the solution for monitoring applications and informing customers via Pagerduty and Statuspage. The monitoring and alerts can be personalized internally, and we are able to find problems and issues. The response time monitor has been great, and it has been validating upgrades. We can check in to see which step fails,

View full review »
ER
Security Engineer at a financial services firm with 201-500 employees

We primarily use the solution for security monitoring and anomaly detection.

View full review »
GC
Security Analyst at a tech services company with 11-50 employees

We are currently testing it. If the testing goes well, we'll purchase the full version, and it will probably be our main monitoring tool. We plan to use it for monitoring our activities and the attacks on our systems or network.

View full review »
BB
Project Director at a tech services company with 501-1,000 employees

We use it for our infrastructure network and servers.

View full review »
FD
Production Engineering at a construction company with 51-200 employees

We use Datadog incident management for our incident tooling. Whenever we run into an incident, we try to use it. It allows us to create a separate Slack channel for it.

View full review »
JC
System Ninja at a philanthropy with 51-200 employees

We use it to monitor our infrastructure, particularly our different EC2 instances, and our containers. We also use it to capture our logs.

View full review »
UD
Director at a media company with 11-50 employees

We have a web infrastructure that uses Amazon Web Services containers with everything included, and we use Datadog to monitor them all.

View full review »
DT
Director of Engineering at a tech vendor with 201-500 employees
  • Monitoring
  • Analytics
  • Tracing
  • APM
View full review »
AA
Principal Engineer at a comms service provider with 51-200 employees

We mainly use it to send metrics about CV and memory usage, in addition to the number of files descriptors on a socket.

View full review »
VG
Senior Software Engineer at a tech vendor with 501-1,000 employees

We use this solution to monitor our Kubernetes clusters, nodes, deployments, daemon sets, replica sets, and pods.

View full review »
YS
Software Developer at AhnLab, Inc.

We use it to store editorial content.

We started out on the on-premise version, then moved to the AWS version.

View full review »
Buyer's Guide
Datadog
April 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
767,847 professionals have used our research since 2012.