Our clients use it for monitoring applications. Its deployment depends on our customer's use case.
It is 100% cloud. We have got a multi-tenant environment, so we segment it out.
Our clients use it for monitoring applications. Its deployment depends on our customer's use case.
It is 100% cloud. We have got a multi-tenant environment, so we segment it out.
It helps us to be more proactive. We can help customers with their e-commerce applications for any networking issues. We can also help them in any area from a development standpoint. It could be a non-prod environment where they're going through testing and various functionalities. It helps them be able to be more successful with their deployments.
The visibility that it provides is valuable. It is helping in being proactive around incident management. It is helping us to be able to get more visibility into our customers' applications so that we can assist them at the application layer. We also provide them the infrastructure from an AWS standpoint. We are able to make sure that our customers are aware of certain critical things around the analytical piece of either the network or the application. We're able to call customers before they even know about the issue. From there, we can start putting together some change management processes and help them a bit.
It can have a more modernized pricing mechanism. We're actually working with them to figure out how to become more modular and have a better and more modernized pricing mechanism. The issue with Datadog is that you have to buy the whole suite of different products, and you kind of get stuck in the old utilization of 40% of their suite. Most organizations today break down between application development, networking, and security. Therefore, there should be a way to break down different modules into just app dev, infosec, networking, etc. Customers have various needs across their business lines, and sometimes, they're just not willing to have tools that they're not using 100%. AppDynamics is probably a little bit better in terms of being modular.
I have been using this solution for almost four years.
We haven't lost any customers for Datadog. It must be stable.
As long as you're willing to pay for 100% but utilize only 40%, it can scale and do anything you want. In an organization, its users are usually the app group, the security group, and the network group.
We're certified in Datadog, and we have our own internal engineers to support the customers. We handle steps two and three.
It is usually pretty complex.
It has a module-based pricing model.
I would advise others to review the overall functionality. If you're looking for different APN tools, then Datadog is a good tool. If you're not looking for it to handle all aspects of your environment and your application from the security infrastructure aspect, there are other tools out there that you could possibly utilize for each one of those areas.
We do a lot of proof of concepts in helping our customers understand the micro and macro pieces of deployment. We're able to be a true advocate and value-add for our customers in utilizing the tool.
I would rate Datadog a seven out of ten. This space is a very competitive space, and a lot of organizations are trying to figure out how to become better in the full life cycle of a deployment. There'll be a lot of changes for different companies going forward.
We are trying to get a handle on observability. Currently, the overall health of the stack is very anecdotal. Users are reporting issues, and Kubernetes pods are going down. We need to be more scientific and be able to catch problems early and fix them faster.
Given the fact that we are a new company, our user base is relatively small, yet growing very fast. We need to predict usage growth better and identify problem implementations that could cause a bottleneck. Our relatively small size has allowed us to be somewhat complacent with performance monitoring. However, we need to have that visibility.
We are still taking baby steps with Datadog. Hence, it's hard to come up with quantifiable information. The most immediate benefit is aggregating performance metrics together with log information. Having a better understanding of observability will help my team focus on the business problems they are trying solve and write code that is conducive to being monitored, instead of reinventing the wheel and relying on their own logic to produce metrics that are out of context
The most useful feature is the APM. Being able to quickly view which requests are time-consuming, and which calls have failed is invaluable. Being able to click on a UI and be pointed to the exact source of the problem is like magic.
I'm also very intrigued by log management, although I haven't had quite a chance to use it very effectively. In particular, the trace and span IDs don't quite seem to work for me. However, I'm very keen on getting this to work. This will also help my developers to be more diligent and considerate when creating log data.
As a new customer, the Datadog user interface is a bit daunting. It gets easier once one has had a chance to get acquainted with it, yet at first, it is somewhat overwhelming. Maybe having a "lite" interface with basic features would make it easier to climb the learning curve.
Maybe the feature already exists. However, I'm not sure how to keep dashboard designs and synthetic tests in source control. For example, we may replace a UI feature, and rebuild a test accordingly in a pre-production environment, yet once the code is promoted to production, the updated test would also need to be promoted.
We have just started using the solution and have only used it for about two months.
We're new at this. That said, so far, there haven't been any issues to report.
I have not had the opportunity to evaluate the scalability.
Customer support is full of great folks! We're beginning our Datadog journey, so I haven't had that much experience. The little I have had has been great.
Positive
This is all new.
We used to work with New Relic. New Relic has an amazing APM solution. However, it also became cost-prohibitive
Since we are relatively greenfield, it was relatively painless to set up the product.
Our in-house DevOps team did the implementation.
I don't know what the ROI is at this stage.
I'm not sure what the exact pricing is.
So far, it's been great!
The RUM is implemented for customer support session replays to quickly route, triage, and troubleshoot support issues which can be sent to our engineering teams directly.
Customer Support will log in directly after receiving a customer request and work on the issue. Engineers will utilize the replay along with RUM to pinpoint the issue combined with APM and Infra trace to be able to look for signals to find the direct cause of the customer impact.
Incident management will be utilized to open a Jira ticket for engineering, and it integrates with ITSM systems and on-call as needed.
The RUM solution has improved our ability to triage faster and hand more capabilities to our customer support.
The RUM is implemented for customer support. It can quickly route, triage, and troubleshoot support issues that are sent to our engineering teams.
Customer support can log in and start troubleshooting after receiving a customer request. The replay and RUM help pinpoint the issue. This functionality is combined with APM and Infra trace to be able to look for the cause of the issue. Incident management is leveraged to open a Jira ticket for engineering, and it can integrate with ITSM systems and on-call as needed.
RUM with session replay combined with a future use case to support synthetics will help to identify issues earlier in our process. We have not rolled this out yet but plan for it as a future use case for our customer support process. This, combined with integrated automation for incident management, will drive down our MTTR and time spent working through tickets. Overall, we are hoping to use this to look at our data and perfection rate over time in a BI-like way to reduce our customer support headcount by saving on time spent.
I would like to see retention options greater than 30-days for session replay. I'd also like to see forwarding options for retention to custom solutions, and a greater ability to event and export data from the tooling overall to BI/DW solutions for reporting across the long term and to see trends as needed.
I've used the solution for about nine months.
So far, stability has been great.
I'd like to see more bells and whistles added over time. Widgets are coming soon to help with RUM.
Support is very good. They are responsive and gave us the help we need.
Positive
We have utilized New Relic, however, not for RUM. We went with Datadog to potentially switch the entire platform into an all-in-one solution that makes sense for a company of our size.
We started on the beta, and the documentation was lagging behind. We also needed direct instructions and links from the customer support/account representative that was not immediately available by searching online.
We implemented the solution ourselves.
Ideally, this will inform our strategy to not increase our customer support headcount as significantly into 2023 and beyond.
The pricing is a bit confusing. However, the RUM session replay, in general, is very inexpensive compared to whole solutions.
We looked into LogRocket and New Relic.
I'd advise other users to try it out.
We use it to monitor and alert our ECS instances as well as other AWS services, including DynamoDB, API Gateway, etc.
We have it connected to Pagerduty for alerting all our cloud applications.
We also use custom RUM monitoring and synthetic tests for both our internal and public-facing websites.
For our cloud applications, we can use Datadog to define our SLOs, and SLIs and generate dashboards that are used to monitor SLOs and report them to our senior leadership.
Datadog has been able to improve our cloud-native monitoring significantly, as CloudWatch doesn't have enough features to create robust, sustainable dashboards that are easily able to present all the information in an aggregated manner in one place for a combination of applications, databases, and other services including our UI applications.
RUM monitoring is also something we didn't have before Datadog. We had Splunk, which was a lot harder to set up than Datadog's custom RUM metrics and its dashboards.
I really enjoy the RUM monitoring features of Datadog. It allows us to monitor user behavior in a way we couldn't before.
It's useful to be able to obfuscate sensitive information by setting up custom RUM actions and blocking the default ones with too much data.
I also like being able to generate custom metrics and monitors by adding facets to existing logging. Datadog can parse logs well for that purpose. The primary method of error detection for our external website is synthetic tests. This is extremely valuable for us as we have a large user base.
At times, it can be hard to generate metrics out of logs. I've seen some of those break over time and have flakey data available.
Creating a monitor out of the metric and using it in a dashboard to generate our SLIs and SLOs has been hard, especially in cases where the data comes from nested logging facets.
I've used the solution for two years.
The stability is pretty good.
The solution is pretty scalable! It's hard to set up all the infra (terraform code) required to link private links in Datadog to all of our different AWS accounts.
They offer good support. Solutions are provided by the team when needed. For example, we had to delete all our RUM metrics when we accidentally logged sensitive data and the CTO of Datadog stepped in to help out and prioritize it at the time.
Positive
We previously used Splunk and some internal tools. We switched due to the fact that some cloud applications don't integrate well with pre-existing solutions.
The initial setup for connecting our different AWS accounts via Datadog private link wasn't great. There was a lot of duplicate terraform that had to be written. The dashboard setup is way easier.
We installed it with the help of a vendor team.
Our return on investment is great and is so much better than CloudWatch. We can easily integrate with Pagerduty for alerting.
Our company set up the product for us, so the engineers didn't need to be involved with pricing.
The pricing structure isn't very clear to engineers.
We looked into Splunk and some internal tools.
We are using the solution for migrating out of the data center. Old apps need to be re-architected. We plan to move to multi-cloud for disaster recovery and avoid vendor lockouts. The migration is a mix between an MSP (Infosys) and in-house devs. The hard part is ensuring these apps run the same in the cloud as they do on-prem. Then we also need to ensure that we improve performance when possible. With deadlines approaching quickly, it is important not to cut corners which is why we needed observability.
The product has created a paradigm shift in how we deploy monitoring. Before, we had a one-to-one lookup in service now. This wouldn't scale, as teams wouldn't be able to create monitors on the fly and would have to wait on us to contact the ServiceNow team to create a custom lookup. Now, in real-time, as new instances are spun up and down, they are still guaranteed to be covered by monitoring. This used to require a change request, and now it is automatic.
For use, the most valuable features we have are infrastructure and APM metrics. The seamless integration between Datadog and hundreds of apps makes onboarding new products and teams a breeze.
We rely heavily on the API crawlers that Datadog uses for cloud integrations. These allow us to pick up and leverage the tags teams have already deployed without having also to make them add them at the agent level. Then we use Datadogs conditionals in the monitor to dynamically alert hundreds of teams, and with the ServiceNow integration, we can also assign tickets based on the environment. Now, our top teams are using APM/profiler to find bottlenecks and improve the speed of our apps.
The real issue with this product is cost control. For example, when logs first came out, they didn't have any index cuts. This leads to runaway logs and exploding costs.
It seems that admin cost control granularity is an afterthought. For example, synthetics have been out for over four years, yet there are no ways to limit teams from creating tests that fire off every minute. If we could say you can't test more than once every five minutes that would save us 5X on our bill.
I've been using the solution for about three years.
The solution is very stable. There are not too many outages, and they fix them fast.
It is easy to scale. It's why we adopted it.
Before premium support, I would avoid using them since it was so bad.
Neutral
We previously used App Dynamics. It isn't built for the cloud and is hard to deploy at scale.
The initial setup was not complex. We just had to teach teams the concept of tags.
We implemented the solution in-house. It was me. I am the SME for Datadog at the company.
We have seen an ROI. It has saved months of time and reduced blindspots for all app teams.
We'd advise new users to be careful with logs, and the APM as those are the ones that can get expensive fast.
We looked into Dynatrace. However, we found the cost to be high.
One of the things we use it for is the same thing that we use FullStory for, which is to replay customer interactions with our platform. However, it also does the monitoring. It's like monitoring cloud tools. We're really mostly monitoring our own software to make sure that everything is functioning properly. We can check a bunch of things, and we can even play back customer sessions. It’s basically monitoring our application.
It really provides a lot of visibility in terms of how our software is working. If there are any problems, it surfaces them right away. We get alerts in Slack. It's really an essential tool for a company that provides software as a service.
I really like the replay, the ability to replay sessions, as I'm in sales engineering, so I sometimes need to know what my prospects are doing during a proof of value. I can actually see all the mouse moving and clicking on buttons and stuff like that. I can actually tell what they've been doing. There’s a lot of the other monitoring stuff as well. The development team uses it for monitoring and finds it very helpful.
It’s been kind of in the middle of many different things. The dashboards and the performance of the software have been great.
I haven't really noticed anything that they could improve upon. Maybe they could add in some features to go both ways, to maybe make some configuration changes, etc. That's a little bit outside of what Datadog does, though. It's really very full-featured, so I don't really have any complaints.
I haven't really fully looked at the documentation as I know where I need to go and look at things. It could probably be a little bit of a better user experience. There are so many functions there that sometimes navigating your way around is a little bit hard. They have a really nice menu system. However, there's so much there. It's possible that I skipped a guided tour when I started.
It’s not intuitive to everyone. There are a lot of technical features.
I’ve been using the solution for the last five months. However, the company may have used it for a year and a half.
The solution has been stable and reliable.
We haven’t had a problem with scalability. It’s been good.
We have 25 to 30 users on it currently. Our entire organization is under 60 people. Although not everyone is on it, a lot of our staff are. The sales, engineering, and customer success teams are all on it.
We may increase usage. No doubt that will come naturally with time. We’re hiring more people, and likely new hires will use it.
I have not had occasion yet to reach out to support.
We’ve also been using FullStory.
I wasn’t part of the implementation. The one thing I will say is that when they added the functionality to review sessions, it made our use of another product, FullStory, almost obsolete. I'll have to see if we will continue using FullStory or if we can rely completely on Datadog.
I am a customer and end-user.
We’re on the most recent version and keep it updated.
I’d rate it nine out of ten. The user experience could be slightly better.
We are using the solution for migrating out of the data center. Old apps need to be re-architected. We are planning on moving to multi-cloud for disaster recovery and to avoid vendor lockouts.
The migration is a mix between an MSP (Infosys) and in-house developers. The hard part is ensuring these apps run the same in the cloud as they do on-premises. Then we also need to ensure that we improve performance when possible. With deadlines approaching quickly it's important not to cut corners - which is why we needed observability
Using the product has caused a paradigm shift in how we deploy monitoring. Before, we had a one-to-one lookup in ServiceNow. This wouldn't scale, as teams wouldn't be able to create monitors on the fly and would have to wait on us to contact the ServiceNow team to create a custom lookup. Now, in real-time, as new instances are spun up and down, they are still guaranteed to be covered by monitoring. This used to require a change request, and now it is automatic.
For use, the most valuable features we have are infrastructure and APM metrics.
The seamless integration between Datadog and hundreds of apps makes onboarding new products and teams a breeze.
We rely heavily on the API crawlers Datadog uses for cloud integrations. These allow us to pick up and leverage the tags teams have already deployed without having to also make them add it at the agent level. Then we use Datadog's conditionals in the monitor to dynamically alert hundreds of teams.
With the ServiceNow integration, we can also assign tickets based on the environment. Now our top teams are using the APM/profiler to find bottlenecks and improve the speed of our apps
The real issue with this product is cost control. For example, when logs first came out they didn't have any index cuts. This caused runaway logs and exploding costs.
It seems that admin cost control granularity is an afterthought. For example, synthetics have been out for over four years, yet there is no way to limit teams from creating tests that fire off every minute. If we could say you can't test more than once every five minutes, that would save us 5X on our bill.
I've used the solution for about three years.
The solution is very stable. There are not too many outages, and they fix them fast.
It is easy to scale. That is why we adopted it.
Before premium support, I would avoid using them as it was so bad.
Neutral
We previously used AppDynamics. It isn't built for the cloud and is hard to deploy at scale.
The initial setup was not difficult. We just had to teach teams the concept of tags.
We did the implementation in-house. It was me. I am the SME for Datadog at the company.
The solution has saved months of time and reduced blindspots for all app teams.
I'd advise users to be careful with logs and the APM as those are the ones that can get expensive fast.
We looked into Dynatrace. However, we found the cost to be high.
We are using the solution from a monitoring and management perspective. We use it for alerts.
The solution's real-time user management and session analytics as well as the APM, are the solution's most valuable aspects.
We could easily identify the production box, and we could handle alerts.
The initial setup is very straightforward.
It's stable.
They have flag integrations, mail integrations, and other options as well.
We didn't do a deep analysis on this at this moment to identify the disadvantages.
We have noticed that Session Replays are unavailable on the mobile app. We'd like mobile app integrations. They could have better log reporting. That said, almost everything that we require is there.
We've only used the solution for two months. We just wanted to understand the complexities and the benefits that we might get from this product.
The solution is stable and reliable. The performance is good. There are no bugs or glitches. It doesn't crash or freeze.
It may be scalable. However, we haven't actually tested anything.
We have six or seven people on the solution at any given time. They're there to handle fixes or do an analysis of logs or monitor the infrastructure.
I'm not sure if we plan to increase usage. We are not growing very rapidly. We're just preparing our environment for the upcoming year.
We've dealt with support. They are helpful and responsive.
Positive
We have used AWS CloudWatch and AWS inbuilt application of the features, however, we thought that it will be good to have a separate application to do all these kinds of monitoring and to have something for handling alerting mechanisms.
We are looking for the best available platforms on the market, which also would give us better flexibility while having everything come from one place. We found that Datadog is good enough for now, and we are still exploring what it can do and our options. We are not completely in with Datadog. We've been exploring it only for four or five months.
It's an easy product to set up. The integration was very smooth and they have multiple options. It was pretty good integration-wise.
The setup itself only took a day or two. It doesn't take long.
We handled the setup ourselves in-house. We did not need any outside assistance.
For some features, it was quite expensive. For example, APM and RAM, and Session Replays. We are using it in a very limited way and are not using it for every microservice. We are using it for a minimal budget. As we are at an exploratory phase and are doing pay-as-you-go. Once we feel confident, we will likely get a yearly license.
We are customers.
We're working on the latest version. I couldn't recall the version number.
I'd rate the solution six out of ten. I need to evaluate the solution further.