Buyer's Guide
Application Performance Management (APM)
December 2022
Get our free report covering Dynatrace, Datadog, Microsoft, and other competitors of New Relic APM. Updated: December 2022.
670,523 professionals have used our research since 2012.

Read reviews of New Relic APM alternatives and competitors

DevOps Leader at a legal firm with 501-1,000 employees
Real User
Good executive-level dashboards with powerful automation and AI capabilities, but the management interface could be more intuitive
Pros and Cons
    • "The user interface for the management functions is not particularly intuitive for even the most common features."

    What is our primary use case?

    Our primary use case is the consolidation of observability platforms.

    How has it helped my organization?

    Looking at Dynatrace's automation and AI capabilities, automation is generally a great place to start. In products where there has been no observability or a very limited amount, the automation can give a great deal of insight, telling people things that they didn't know that they needed to know.

    Davis will do its best to provide root cause analysis, but you, as a human, are still responsible for joining as many of the dots together as possible in order to provide as big a picture as possible. As long as you accept that you still have to do some work, you'll get a good result.

    I have not used Dynatrace for dynamic microservices within a Kubernetes environment in this company, but I have had an AWS microservice cluster in the past. Its ability to cope with ephemeral incidences, as Kubernetes usually are, was very good. The fact that we didn't have to manually scale out to match any autoscaling rules on the Kubernetes clusters was very useful. Its representation of them at the time wasn't the best. Other products, Datadog, for example, had a better representation in the actual portal of the SaaS platform. That was about three years ago, and Dynatrace has changed, but I haven't yet reused the Kubernetes monitoring to see if it has improved in that regard.

    Given that Dynatrace is a single platform, as opposed to needing multiple tools, the ease of management is good because there is only one place to go in order to manage things. You deal with all of the management in one place.

    The unified platform has allowed our teams to better collaborate. In particular, because of the platform consolidation, using Dynatrace has made the way we work generally more efficient. We don't have to hop between seven different monitoring tools. Instead, there's just one place to go. It's increased the level of observability throughout the business, where we now have development looking at their own metrics through APM, rather than waiting until there's a problem or an issue and then getting a bug report and then trying to recreate it.

    It's increased visibility for the executive and the senior management, where they're getting to see dashboards about what's happening right now across the business or across their products, which didn't used to exist. There's the rate at which we can monitor new infrastructure, or applications, or custom devices. We had a rollout this week, which started two days ago, and by yesterday afternoon, I was able to provide dashboards giving feedback on the very infrastructure and applications that they had set the monitoring up on the day before.

    As we've only been using Dynatrace in production for the past month in this company, the estimate as to the measurement of impact isn't ready yet. We need more time, more data, and more real use cases as opposed to the synthetic outages we've been creating. In my experience, Dynatrace is generally quite accurate for assessing the level of severity. Even in scenarios where you simply rely on the automation without any custom thresholds or anything like that, it does a good job of providing business awareness as to what is happening in your product.

    Dynatrace has a single agent that we need to install for automated deployment and discovery. It uses up to four processes and we found it especially useful in dealing with things like old Linux distros. For example, Gentoo Linux couldn't handle TLS 1.2 for transport and thus, could not download the agent directly. We only had to move the one agent over SSH to the Gentoo server and install it, which was much easier than if we'd had to repeat that two or three times.

    The automated discovery and analysis features have helped us to proactively troubleshoot products and pinpoint the underlying root cause. There was one particular product that benefited during the proof of concept period, where a product owner convened a war room and it took about nine hours of group time to try and reason out what might be the problem by looking at the codebase and other components. Then, when we did the same exercise for a different issue but with Dynatrace and the war room was convened, we had a likely root cause to work from in about 30 minutes.

    In previous companies where the deployment has been more mature, it was definitely allowing DevOps to concentrate on shipping quality rather than where I am now, which is deploying Dynatrace. The biggest change in that organization was the use of APM and the insights it gave developers.

    Current to the deployment of Dynatrace, we adopted a different methodology using Scrum and Agile for development. By following the Scrum pattern of meetings, we were able to observe the estimated time in the planning sessions for various tasks. It started to come down once the output of the APM had been considered. Ultimately, Dynatrace APM provided the insight that allowed the developers to complete the task faster.

    What is most valuable?

    The most valuable features for us right now are the auto-instrumentation, the automatic threshold creation, and the Davis AI-based root cause analysis, along with the dashboarding for executives and product owners.

    These features are important because of the improved time it takes for deployment. There is a relatively small team deploying to a relatively large number of products, and therefore infrastructure types and technology stacks. If I had to manually instrument this, like how it is accomplished using Nagios or Zabbix, for example, it would take an extremely long time, perhaps years, to complete on my own. But with Dynatrace, I can install the agent, and as long as there is a properly formed connection between the agent and the SaaS platform, then I know that there is something to begin working with immediately and I can move on to the next and then review it so that the time to deployment is much shorter. It can be completed in months or less.

    We employ real user monitoring, session replay, and synthetic monitoring functionalities. We have quite a few web applications and they generally have little to no observability beyond the infrastructure on which the applications run. The real user monitoring has been quite valuable in demonstrating to product owners and managers how the round-trips, or the key user actions, or expensive queries, for example, have been impacting the user experience.

    By combining that with session replay and actually watching through a problematic session for a user, they get to experience the context as well as the raw data. For a developer, for example, it's helpful that you can tell them that a particular action is slow, or it has a low Apdex score, for example, but if you can show them what the customer is experiencing and they can see state changes in the page coupled with the slowness, then that gives a much richer diagnostic experience.

    We use the synthetics in conjunction either with the real user monitoring or as standalone events for sites that either aren't public-facing, such as internal administration sites, or for APIs where we want to measure things in a timely manner. Rather than waiting for seasonal activity from a user as they go to work, go home, et cetera, we want it at a constant rate. Synthetics are very useful for that.

    The benefit of Dynatrace's visualization capabilities has been more apparent for those that haven't used Dynatrace before or not for very long. When I show a product owner a dashboard highlighting the infrastructure health and any problems, or the general state of the infrastructure with Data Explorer graphs on it, that's normally a very exciting moment for them because they're getting to see things that they could only imagine before.

    In terms of triaging, it has been useful for the sysadmins and the platform engineering team, as they normally had to rely on multiple tools up until now. We have had a consolidation of observability tools, originally starting with seven different monitoring platforms. It was very difficult for our sysadmins as they watched a data center running VMware with so many tools. Consolidating that into Dynatrace has been the biggest help, especially with Davis backing you up with RCAs.

    The Smartscape topology has also been useful, although it is more for systems administrators than for product owners. Sysadmins have reveled in being able to see the interconnectedness of various infrastructures, even in the way that Dynatrace can discover things to which it isn't directly instrumented. When you have an agent on a server surrounded by other servers, but they do not have an agent installed, it will still allow a degree of discovery which can be represented in the Smartscape topology and help you plan where you need to move next or just highlight things that you hadn't even realized were connected.

    What needs improvement?

    The user interface for the management functions is not particularly intuitive for even the most common features. For example, you can't share dashboards en masse. You have to open each dashboard, go into settings, change the sharing options, go back to dashboards, et cetera. It's quite laborious. Whereas, Datadog does a better job in the same scenario of being a single platform of making these options accessible.

    User and group management in the account settings for user permissions could be improved.

    The way that Dynatrace deals with time zones across multiple geographies is quite a bone of contention because Dynatrace only displays the browser's local time. This is a problem because when I'm talking with people in Canada, which I do every day, they either have to run, on the fly, time recalculations in their heads to work out the time zone we're actually talking about as relevant to them, or I have to spin up a VM in order to open the browser with the time zone set to their local one in order to make it obvious to them without them having to do any mental arithmetic.

    For how long have I used the solution?

    Personally, I have been using Dynatrace since November of 2018. At the company I am at, we have been using it for approximately four months. It was used as a PoC for the first three months, and it has been in production for the past month.

    What do I think about the stability of the solution?

    The SaaS product hasn't had any downtime while I've been at my current company. I've experienced downtime in the past, but it's minimal.

    What do I think about the scalability of the solution?

    To this point, I've not had any problems with the scalability, aside from ensuring that you have provisioned enough units. However, that is another point that is related to pricing.

    Essentially, its ability to scale and continue to work is fine. On the other hand, its ability to predict the required scalability in order to purchase the correct number of various units is much harder.

    How are customer service and support?

    Talking about Dynatrace as a company, the people I've spoken to have always been responsive. The support is always available, partly because of our support package. As a whole, Dynatrace has always been a very responsive entity, whether I've been dealing with them in North America or in the UK.

    Which solution did I use previously and why did I switch?

    We have used several other solutions including Grafana, Prometheus, Nagios, Zabbix, New Relic, AWS CloudWatch, Azure App Insights, and AppDynamics. We switched to Dynatrace in order to consolidate all of our observability platforms.

    Aside from differences that I discuss in response to other questions, other differences would come from the product support rather than the product itself. Examples of this are Dynatrace University, the DT One support team, the post-sales goal-setting sessions, and training.

    We're yet to have our main body of training, but we're currently scheduled to train on about 35 modules. Whereas, last year, when I rolled out Datadog, the training wasn't handled in the same way. It was far more on request for specific features. Whereas, this is an actual curriculum in order to familiarize end users with the product.

    How was the initial setup?

    In my experience, the initial setup has been straightforward, but I've done it a few times. When I compare it to tools like Nagios, Zabbix, Grafana, and Prometheus, it is very straightforward. This is largely for two reasons.

    First, they're not SaaS applications, whereas Dynatrace is, and second, the amount of backend configuration you have to do in preparation for those tools is much higher. That said, if we were to switch to Dynatrace Managed rather than Dynatrace SaaS, I imagine that the level of complexity for Dynatrace would rise significantly. As such, my answer is biased towards Dynatrace SaaS.

    What was our ROI?

    In my previous company, it allowed a very small team to manage what was a very fast-moving tech stack. In my current company, it is still very early.

    The consolidation of tools due to implementing Dynatrace has saved us money, although it's tricky to measure the impact. The list price of Dynatrace was more than the previous list price spend on monitoring tools because the various platforms had been provided as open-source tools, were provided through hosting companies, or had been acquired as part of acquisitions of other companies.

    The open-source applications that we used included Grafana, Prometheus, Nagios, and Zabbix. New Relic through Carbon60 in Canada, as an example, was provided through a hosting company. Also, we acquired a Canadian company or had been acquired as part of acquisitions of other companies, AppDynamics, in a Canadian company, for example, with us in the budget of the previous company rather than our own company.

    The hope was that Dynatrace through consolidation would release the material cost of the administrative overheads of tools like Prometheus and Grafana and the cost of hosting infrastructure for solutions like Nagios, Zabbix, Prometheus, Grafana, et cetera. This means that it is more of an upstream cost-saving, where we would be saving human effort and hosting costs by consolidating into a SaaS platform, which is pretty much all-in-one.

    What's my experience with pricing, setup cost, and licensing?

    Dynatrace's pricing for their consumption units is rather arcane compared to some of the other tools, thus making forward-looking calculations based on capacity planning quite hard. This is because you have to do your capacity planning, work out what that would mean in real terms, then translate that into Dynatrace terms and try to ensure you have enough Davis units, synthetics units, DEM units, and host units.

    Catching those and making sure you've got them all right for anything up to a year in advance is quite hard. This means that its ability to scale and continue to work is fine but predicting the correct number of various units to purchase is much harder.

    The premium support package is available for an additional charge.

    What other advice do I have?

    At this point, we have not yet integrated Dynatrace with our CICD tool, which is Azure DevOps. However, in the future, our plan is to provide post-release measurements and automated rollbacks when necessary. Even further down the road, there's ServiceNow on the roadmap, which we're currently bringing in from an Australian acquisition in order to try and promote the ITSM side of the business.

    There is nothing specific that has been implemented so far, although there have been general degrees of automation. When we get Agile, DevOps, and ServiceNow in place, the degree of automation will increase dramatically. For example, automated rollbacks in the case of deployment failure or change management automation through the current state of the target system are being included in the ServiceNow automation.

    The automation that has been done to alleviate the effort spent on manual tasks is still very light because I'm the only person doing the work. I generally don't have time to do the ancillary tasks at the moment, such as creating automations. It's mostly a case of deploying instruments, observing, and moving on. When we come back to revisit it, then we'll look at the automations.

    My advice for anybody who is looking into implementing Dynatrace is to make sure you talk constantly with your Dynatrace representatives during the PoC, or trial phase because there is invariably far more that Dynatrace can do than you realize. We only know what we know. I'm not suggesting that you let Dynatrace drive but instead, constantly provide the best practices. You will achieve faster returns afterward, whether that's labor savings, or recovery time, or costs from downtime. Basically, you want to make sure that you leverage the expertise of the company.

    In summary, this is a very good product but they need to sort out their user interface issues and provide a more logical experience.

    I would rate this solution a seven out of ten.

    Which deployment model are you using for this solution?

    Public Cloud
    Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
    Technical Consultant at a manufacturing company with 5,001-10,000 employees
    Real User
    The best full stack observability compared to any other tool
    Pros and Cons
    • "For full stack observability, Elastic is the best tool compared with any other tool ."
    • "Elastic APM's visualization is not that great compared to other tools. It's number of metrics is very low."

    What is our primary use case?

    Elastic APM is a kind of log aggregation tool and we're using it for that purpose. 

    What is most valuable?

    Elastic APM is very new so we haven't explored much on it, but it's quite interesting. It comes with a free offering included in the same license. So we are looking to explore more. It is still not as mature as other tools like Kibana, AppDynamics or New Relic products related to application performance monitoring. Elastic APM is still evolving, but it's quite interesting to be able to get all the similar options and features in Elastic APM.

    What needs improvement?

    In terms of what could be improved, Elastic APM's visualization is not that great compared to other tools. It's number of metrics is very low. Their JVM metrics are much less while running on CPU memory and on top of that you get a thread usage. They're not giving much on application performance metrics. In that respect, they have to improve a little bit. If you compare that with other tools, such as New Relic, which is also not giving many insights, it would be good to get internal calls or to see backend calls. We are not getting this kind of metric.

    On the other hand, if you go to the trace view, it gives you a good backend calls view. That backend call view is also capturing everything, and we need some kind of control over it, which it does not have. For example, if I don't want some of the sequence selected, there should be controls for that. Moreover you need to do all these things manually. Nowadays, just imagine any product opted to do conservation manually, that would be really disastrous. We don't want to do that manually. For now this needs to be either by API or some kind of automated procedure. If you want to install the APM Agent, because it is manual we would need to tune it so that the APIs are available for the APM site. That's one drawback.

    Additionally, the synthetic monitoring and real user monitoring services are not available here. Whereas in New Relic the user does get such services.

    The third drawback I see is the control site. For now, only one role is defined for this APM. So if I want to restrict the user domain, for example, if in your organization you have two or three domains, domain A, domain B, domain C, but you want to give access to the specific domain or a specific application, I am not sure how to do that here.

    Both the synthetic and process monitoring should be improved. For the JVM, Java Process Monitoring, and any process monitoring, they have to have more metrics and a breakdown of the TCP/IP, and the tools are giving me - they don't provide many metrics in size. You get everything, but you fail to visualize it. The New Relic only focuses on transactions, and Elastic APM also focuses on similar stuff, but I am still looking for other options like thread usage, backend calls, front end calls, or how many front end and backend calls. This kind of metric is definitely required.

    We don't have much control. For example, some backend calls trigger thousands of prepared statements, update statements, or select statements, and we don't have any control. If I only want select statement, not update statements, this kind of control should be there and properly supplied. The property file is very big and it is still manual, so if you want control agent properties you need UI control or API control. Nowadays, the world is looking for the API site so they'll be able to develop more smartly. They are looking for these kinds of options to enrich their dashboard creation and management.

    For how long have I used the solution?

    I'm new to Elastic APM, but I do have very good APM knowledge since I have been using APM almost 10 years and Elastic APM for just two years. I see that Elastic APM is still evolving.

    How are customer service and technical support?

    Elastic APM's technical support is pretty good and we have a platinum license for log aggregation. They respond very quickly and they follow a very good strategy. They have one dedicated resource especially for us. I'm not sure if that is common for other customers, but they assigned a very dedicated resource. So for any technical issue a dedicated resource will respond. Then, if that resource is busy or not available someone will attend that call or respond with support. In that way, Elastic support fully understands your environment.

    Otherwise, if you go with the global support model, they have to understand your environment first and keep asking the same question again and again. How many clusters do you have, what nodes do you have, these kind of questions. Then you need to supply that diagnosis. This is a challenge. If they have a dedicated or a support resource they usually don't ask these questions because they'll understand your environment very well because they have worked with you on previous cases. In that sense they provide very good support and answer the question immediately.

    They provide immediate support. Usually they get back you the same or the next day. I think it's pretty good compared to any other support. It was even very good compared to New Relic.

    What other advice do I have?

    There are two advantages to Elastic APM. It is open source and if somebody wants to try it out in their administration it's free to use. Also, it has full stack observability. For full stack observability, Elastic is the best tool compared with any other tool like New Relic or AppDynamics or Dynatrace. I'm not sure about Dynatrace, since I never worked with it, but I have worked with AppDynamics and New Relic. However, with their log aggregation side, there is still a lot to get implemented here.

    I'd like bigger flexibility. That means we would get all the system logs, all the cloud logs, all the kinds of logs aggregated in a single location. On top of that, if they could have better metrics for handling data together it would give a greater advantage for observability. The Observability platform is pretty good because you already have logged data and information like that. If you just add APM data and visualize, you will get much needed information. How are you are going to visualize and how are you going to identify the issues?

    For this purpose, Elastic is best. If you are really looking for an observability platform, Elastic provides both of these two options, APM plus log aggregation. But still they have to improve or they have to provide APIs for synthetic monitoring, internet monitoring, etc... If I think about synthetic monitoring, you can't compare New Relic with Elastic today. Elastic is much better.

    These are the improvements they have to look at. They support similar functionalities of synthetic monitoring, so it's not a hundred percent APM friendly, but if you look at their observability platform, their full stack observability together with their log aggregation, Elastic APM is a greater advantage.

    On a scale of one to ten, I would rate Elastic APM an eight out of 10.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    M ANakib - PeerSpot reviewer
    Senior Cloud Consultant at CloudThat
    Real User
    Top 5
    Helps us test the performance and efficiency of cloud applications; good log analysis
    Pros and Cons
    • "For me, the best feature is the log analysis with Azure Monitor's Log Analytics. Without being able to analyze the logs of all the activities that affect the performance of a machine, your monitoring effectiveness will be severely limited."
    • "Although it's not always the case, the price can sometimes get expensive. This depends on a number of factors, such as how many services you are trying to integrate with Azure Monitor and how much storage they're consuming each month (for example, how large are the log files?)."

    What is our primary use case?

    As a Microsoft MVP and MCT, I help our company's clients use Azure Monitor in different ways depending on what types of activity they want to monitor. From a big picture perspective, this entails checking and testing both the performance and efficiency of cloud-based applications.

    Starting from the basics, such as when monitoring VMs, you are able to integrate Azure Monitor with the virtual machine so as to create alerts and notifications that are automatically triggered when, for example, the machine's performance rises above 60-70%.

    You can also use Azure Monitor to help you understand the total internal activity, including the CPU or RAM performance, of specific applications on the cloud. And beyond that, there are many other metrics detailing which users have logged in (or are concurrently logged in) to the application or VM, while showing the ratios of their activity in total.

    What is most valuable?

    For me, the best feature is the log analysis with Azure Monitor's Log Analytics. Without being able to analyze the logs of all the activities that affect the performance of a machine, your monitoring effectiveness will be severely limited.

    What needs improvement?

    Although it's not always the case, the price can sometimes get expensive. This depends on a number of factors, such as how many services you are trying to integrate with Azure Monitor and how much storage they're consuming each month (for example, how large are the log files?).

    Of course, this totally depends on the particular customer's environment. As the implementer, we can typically only advise on the technical outcomes for a certain usage scenario of Azure Monitor, and not necessarily the advantages or disadvantages of paying for Azure Monitor in their particular use case. For example, if they are paying $400 per month, the advantages for the customer might be that they reduce technical headaches in ensuring proper service performance without having to invest in a separate IT member to handle the monitoring. And for many customers, this makes good business sense, which is why when we propose the use of Azure Monitor in such a way and give them an example, they often take us up on the proposal, despite the costs involved.

    For how long have I used the solution?

    I have been using Azure Monitor for five years.

    What do I think about the stability of the solution?

    Azure Monitor is definitely a stable product once we have deployed it to our target. Azure Monitor, when deployed to a VM, will give you a stable interface for taking readings on various aspects of a machine's performance, such as what's going on with the CPU, memory, and data communications.

    When it comes to more advanced usage with deeper monitoring, Azure Monitor can integrate with AI to make the process of advising what to do about resolving certain problems much easier.

    What do I think about the scalability of the solution?

    Azure Monitor is a key part of service integration and it is easy to scale and extend as needed. You can also integrate other services, even third-party ones, with it.

    It provides options for scaling it to your own needs in various ways, such as by providing a choice of what types of logs will be generated and how many days you would like to keep the logs for. It also provides options for how many servers you would like to monitor, and whether you will be logging on an hourly, per-minute, or per-second basis.

    How are customer service and support?

    I am largely satisfied with how the technical support from Microsoft explains their solutions to the issues that we've raised. Although, most of the time, they simply share the most current link from their blog of knowledge base that explains the specific activity that must be done to resolve a problem. This is because Microsoft engineers don't actually provide the technical work themselves, but instead act as advisors, where companies like mine will then be the implementer of the suggested solutions.

    How was the initial setup?

    The setup for Azure Monitor itself is easy to understand and, for generating logs and such, it's a simple process of configuring how you would like to gather your logs and for how long you would like to keep those records, etc.

    It won't take anyone that much time to deploy their monitoring system with Azure Monitor. This is especially true if you understand your business needs well, because you can then quickly integrate Azure Monitor (and Log Analytics or Workbooks) inside your application as appropriate. As an estimate, it could take up to three or four hours at most.

    What about the implementation team?

    We implement Azure Monitor in-house. Our deployment plan is to first obtain the Azure identity and then make an effort to better understand the service that will be running on the Azure cloud. For example, is it IaaS or PaaS? If it is based on PaaS, then we'll go straight to the application inside the PaaS, and if it's on a VM, then we'll go to the VM (or container) instance and set up the deployment from there. Here, the deployment also depends on the specific resources or Azure subscriptions in use.

    What's my experience with pricing, setup cost, and licensing?

    Customers of Azure Monitor must pay an amount that depends largely on how many services they need to integrate and the storage space required in terms of logs, etc. If they only have a few small services to monitor, the price won't be too high, but on the opposite side of the spectrum, it can certainly get pricey.

    Fortunately, customers can use the calculator provided by Azure to easily estimate the total monthly costs based on the customer's requirements. 

    Which other solutions did I evaluate?

    Along with Azure, I also have experience with New Relic, for which I have handled both deployment and connectivity.

    What other advice do I have?

    I can definitely recommend Azure Monitor for most people, but before recommending it to all I would caution that whether you choose to use it depends largely on the type of service that you are running on the cloud.

    One must first be clear about the outcomes of the service that is to be monitored; if it is a critical service or application where you absolutely must monitor each data point or watch for incidents, then you will certainly want to use Azure Monitor.

    On the other hand, some companies are running services on the cloud that are not actually critical in the grand scheme of their operations, and to these companies I would not blindly encourage using Azure Monitor. Of course, if they are interested nevertheless, then I would say, sure, they can go forward with it as long as they understand that they will have to pay Microsoft accordingly in order to get something out of Azure Monitor. 

    I would rate Azure Monitor an eight out of ten.

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Microsoft Azure
    Disclosure: My company has a business relationship with this vendor other than being a customer: Implementer
    Flag as inappropriate
    Moses Arigbede - PeerSpot reviewer
    Head of DevOps at Partsimony
    Real User
    Top 10
    Easy to set up with great custom dashboards but needs to improve non-.NET infrastructure
    Pros and Cons
    • "The solution is stable and reliable."
    • "I would like to be able to see metrics about individual running containers on the host machines."

    What is our primary use case?

    Although the decision to implement Stackify was not mine, when I took over the position, it was one of the tools that were being used by the company. We were using it for APM management, infrastructure monitoring, and log aggregation.

    What is most valuable?

    One of the features that we loved the most, is we've been able to create custom dashboards. The company I worked for is an umbrella company. We now have smaller departments under this particular company, each of them standing by themselves and each of them has its own independent development teams. At this center, we have three departments - we have security, infrastructure, and DevOps. Everybody keys into the offering from those three.

    At a particular point, we needed to bridge the gap between the business guys and the tech guys and had to create dashboards based on metrics from Stackify for individual businesses. The business guys, in layman's terms, could see how well the infrastructure was doing and it was also a driving point for business, with the technology decisions. For example: How much money do we need to spend on which server for how long? Which of them do we need to upgrade and why? It allowed us to be able to tell that story easily since we could securely share pre-configured dashboards and the dashboard was responsive. The alerts were on point and it allowed us to succeed in that venture.

    The solution is stable and reliable.

    It's scalable.

    The initial setup is easy.

    What needs improvement?

    They need to improve non-.NET infrastructure.

    We always had difficulty when it comes to reporting or metrics that come from Linux operating systems and Docker containers. For anything that runs within the Unix environment, we always had problems with them, however, if it was a document-based application, Stackify was 100%, it gave everything. Now, the aggregation agent, the metric agent for Stackify for Linux, collects everything. When I say everything, I mean, everything. It collects so much information that we now started to term it as useless data as all that ingestion will just come in and overwhelm your log retention limit for the month and really this spike up your cost at the end of the month. You'll need to do a lot in order to train down the data coming in from all your Linux environments, to get to what you really need, which actually takes some time as well.

    I would like to be able to see metrics about individual running containers on the host machines. Stackify has not really gotten that right, as far as I'm concerned. Netdata has done a better job and New Relic has also done a better job. They need to improve on that. We need to be able to see the individual resource usage of containers running within a particular host.

    For how long have I used the solution?

    I've used the solution for around four years. 

    What do I think about the stability of the solution?

    The solution is stable and reliable. There are no bugs or glitches. It doesn't crash or freeze. 

    What do I think about the scalability of the solution?

    The product is scalable as long as your payment plan allows you to do so. You can add as many users as possible.

    We have between 80 and 100 users on the solution. 

    How are customer service and support?

    We don't need the help of technical support. The documentation would suffice if you need help. Community engagement also does the trick.

    Which solution did I use previously and why did I switch?

    At the time that Stackify was adopted, I know for a fact that New Relic was still working on their product. According to the reviews, Stackify actually came up on top, if you are talking about .NET applications, which the company basically almost did everything about, all the products being developed were .NET based. Stackify seemed like the logical choice to go with.

    How was the initial setup?

    The initial setup is straightforward - especially if you've been doing it for a long time. You just install the agent and let it run and that's basically it. I also like the integration with the software development ID. Stackify automatically has to collect metrics about your application and the client signature is also very, very clean.

    We did not deploy Stackify on-prem. This is on the cloud. All our agents were pushing data to the cloud. I'm not sure how long the deployment took.

    What's my experience with pricing, setup cost, and licensing?

    We had a monthly billing and license fee. Apart from that monthly billing, I'm not aware of the initial costs.

    The price is variable. It depends on how much data we have received in that particular month. Usually, it goes up to $2,000, or, at times, $3,000 USD per month.

    What other advice do I have?

    My company is a customer. 

    I'm not sure which version we were using. The company where I used Stackify, I'm not anymore. While I was with them, I was the head of DevOps. It was basically one of the tools I primarily administer.

    I have not had experience with Stackify as a free user. If you would want to use Stackify as a beginner, your needs would not be properly satisfied by Stackify. You would think Stackify is overkill for your new business. From my perspective, I would advise the person to probably go to Netdata, as it's open-source and it's free, or use Datadog for aggregation and the like. If this new user is an enterprise that can bear the cost and if the majority, if not all of the applications are written in .NET, or if they have a need to interface with their business leaders and share metrics about how the infrastructure is doing, then Stackify will be a very good choice. 

    I'd rate them six out of ten. I want them to improve on Linux systems and give users much more freedom to select the kind of metrics they want to pull in from the infrastructure.

    One thing that happens as a new user on Stackify, is when you install the agent it pulls everything and if you're not careful, your log allowance will just be exhausted as you are actually pulling too much data. Stackify should make it flexible by letting you know that these are the metrics that will be proven. I was feeling as if they were being tricky by sending you special particular nodes and you see every data point has been checked already - every single one. All of them are accumulating in the log base and before you know it, your allowance is done, it's finished. When that happens, everything freezes. You get no metrics, no analytics, nothing whatsoever, as you've exhausted your limit.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    Flag as inappropriate
    Principal Enterprise Systems Engineer at a healthcare company with 10,001+ employees
    Real User
    Top 10
    An out-of-the-box solution that allows you to quickly build dashboards
    Pros and Cons
    • "I like that you can build out a dashboard pretty quickly. There are some things that come out of the box that you don't really need to do, which is great because they're default settings."
    • "I think better access to their engineers when we have a problem could be better."

    What is our primary use case?

    We deploy agents on-premise to collect data on on-premise VM instances. We don't use Datadog in our cloud network. We do have some Cloud apps that we have it on and we also have Containers. We have it on their headquarters, the main software for them is on their own Cloud.

    Eventually, we're building out the process now and using it better. We plan to use Datadog for root cause analysis relating to any kinds of issues we have with software, with applications going down, latency issues, connection issues, etc. Eventually, we're going to use Datadog for application performance, monitoring, and management. To be proactive around thresholds, alerts, bottlenecks, etc. 

    Our developers and QA teams use this solution. They use it to analyze network traffic, load, CPU load, CPU usage, and then Tracey NPM, API calls for their application. There are roughly 100 users right now. Maybe there's 200 total, but on a given day, maybe 13 people using this solution.

    How has it helped my organization?

    It hasn't improved the way our organization functions yet, because there's a lot of red tape to cut through with cultural challenges and changes. I don't think it's changed the way we do things yet, but I think it will — absolutely it will. It's just going to take some time.

    What is most valuable?

    I like that you can build out a dashboard pretty quickly. There are some things that come out of the box that you don't really need to do, which is great because they're default settings. Once you install the agent on the machine, they pick up a lot of metrics for you that are going to be 70 or more percent of what you need. Out of the box, it's pretty good.

    For how long have I used the solution?

    I have been using Datadog every day since September 2020. I also used it at a previous company that I worked for.

    What do I think about the stability of the solution?

    Stability-wise, it's great.

    What do I think about the scalability of the solution?

    It seems like it'll scale well. We're automating it with Ansible scripts and service now so that when we build a new virtual machine it will automatically install Datadog on that box.

    How are customer service and technical support?

    The tool itself is pretty good and the customer service is good, but I think they're a growing company. I think better access to their engineers when we have a problem could be better. For example, if I asked the question, "Hey, how do I install it on this type of component?" We'll try to get an engineer on the phone with us to step us through everything, but that's a challenge because they're so busy.

    Technically-wise, everything's fine. We don't need any support, everything that I need to do, I can do right out of the box. But as far as, in the knowledge of their engineers on how to configure it on given systems that we have, that's maybe at six because they're just not as available as I would've hoped.

    Which solution did I use previously and why did I switch?

    We were using AppDynamics. Technically, we still have it in-house because it's tightly wound into certain systems, but we'll probably pull that off slowly over time. The reason we added Datadog and eventually we'll fully switch over is due to cost. It's more cost-friendly to do it with Datadog.

    Which other solutions did I evaluate?

    Yes, we looked at Dynatrace, AppDynamics, and New Relic. Personally, I wouldn't have chosen Datadog for the POC if it were up to me. Datadog was a leader, but New Relic was looking really good. In the end, the people above me decided to go with Datadog — it's a big company, so they wanted to move fast, which makes sense.

    What other advice do I have?

    If you're interested in using Datadog, just do your homework, as we did. We're happy so far I think; time will tell as we are still rolling things out. It's a very good company. It's going to be a year before we really can tell anything. If you do your homework, you'll find that if you're really concerned with cost, it's good.

    There are some strengths that AppDynamics and Dynatrace have that Datadog I don't think will have down the road, but they're not things we necessarily need — they're outliers. It would be nice to have them, but we can manage without them.

    Know what you want. There is no need to pay for solutions like Dynatrace or AppDynamics that are more expensive or things that are just nice to have if you don't absolutely need to have them. That's something people need to understand. You just have to make sure you understand what it is that you need out of the tool — they are all a little different, those three. I would say to anybody that's going with Datadog: you just have to be patient at the beginning. It's a very busy company right now. They're very hot in the market.

    Overall, on a scale from one to ten, I would give Datadog a rating of eight. It does what we need it to do, and it seems to be pretty user-friendly in terms of setting things up.

    Features-wise, I'd give them a rating of ten out of ten. The better access we get to assistance from the engineers on how to configure dashboards and pulling metrics that we need, that would bring it up a little bit. So overall it would be harder and it would have to be perfect for it. I would say maybe they could bring it to a nine.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    Buyer's Guide
    Application Performance Management (APM)
    December 2022
    Get our free report covering Dynatrace, Datadog, Microsoft, and other competitors of New Relic APM. Updated: December 2022.
    670,523 professionals have used our research since 2012.