What is our primary use case?
Our primary use case is the consolidation of observability platforms.
How has it helped my organization?
Looking at Dynatrace's automation and AI capabilities, automation is generally a great place to start. In products where there has been no observability or a very limited amount, the automation can give a great deal of insight, telling people things that they didn't know that they needed to know.
Davis will do its best to provide root cause analysis, but you, as a human, are still responsible for joining as many of the dots together as possible in order to provide as big a picture as possible. As long as you accept that you still have to do some work, you'll get a good result.
I have not used Dynatrace for dynamic microservices within a Kubernetes environment in this company, but I have had an AWS microservice cluster in the past. Its ability to cope with ephemeral incidences, as Kubernetes usually are, was very good. The fact that we didn't have to manually scale out to match any autoscaling rules on the Kubernetes clusters was very useful. Its representation of them at the time wasn't the best. Other products, Datadog, for example, had a better representation in the actual portal of the SaaS platform. That was about three years ago, and Dynatrace has changed, but I haven't yet reused the Kubernetes monitoring to see if it has improved in that regard.
Given that Dynatrace is a single platform, as opposed to needing multiple tools, the ease of management is good because there is only one place to go in order to manage things. You deal with all of the management in one place.
The unified platform has allowed our teams to better collaborate. In particular, because of the platform consolidation, using Dynatrace has made the way we work generally more efficient. We don't have to hop between seven different monitoring tools. Instead, there's just one place to go. It's increased the level of observability throughout the business, where we now have development looking at their own metrics through APM, rather than waiting until there's a problem or an issue and then getting a bug report and then trying to recreate it.
It's increased visibility for the executive and the senior management, where they're getting to see dashboards about what's happening right now across the business or across their products, which didn't used to exist. There's the rate at which we can monitor new infrastructure, or applications, or custom devices. We had a rollout this week, which started two days ago, and by yesterday afternoon, I was able to provide dashboards giving feedback on the very infrastructure and applications that they had set the monitoring up on the day before.
As we've only been using Dynatrace in production for the past month in this company, the estimate as to the measurement of impact isn't ready yet. We need more time, more data, and more real use cases as opposed to the synthetic outages we've been creating. In my experience, Dynatrace is generally quite accurate for assessing the level of severity. Even in scenarios where you simply rely on the automation without any custom thresholds or anything like that, it does a good job of providing business awareness as to what is happening in your product.
Dynatrace has a single agent that we need to install for automated deployment and discovery. It uses up to four processes and we found it especially useful in dealing with things like old Linux distros. For example, Gentoo Linux couldn't handle TLS 1.2 for transport and thus, could not download the agent directly. We only had to move the one agent over SSH to the Gentoo server and install it, which was much easier than if we'd had to repeat that two or three times.
The automated discovery and analysis features have helped us to proactively troubleshoot products and pinpoint the underlying root cause. There was one particular product that benefited during the proof of concept period, where a product owner convened a war room and it took about nine hours of group time to try and reason out what might be the problem by looking at the codebase and other components. Then, when we did the same exercise for a different issue but with Dynatrace and the war room was convened, we had a likely root cause to work from in about 30 minutes.
In previous companies where the deployment has been more mature, it was definitely allowing DevOps to concentrate on shipping quality rather than where I am now, which is deploying Dynatrace. The biggest change in that organization was the use of APM and the insights it gave developers.
Current to the deployment of Dynatrace, we adopted a different methodology using Scrum and Agile for development. By following the Scrum pattern of meetings, we were able to observe the estimated time in the planning sessions for various tasks. It started to come down once the output of the APM had been considered. Ultimately, Dynatrace APM provided the insight that allowed the developers to complete the task faster.
What is most valuable?
The most valuable features for us right now are the auto-instrumentation, the automatic threshold creation, and the Davis AI-based root cause analysis, along with the dashboarding for executives and product owners.
These features are important because of the improved time it takes for deployment. There is a relatively small team deploying to a relatively large number of products, and therefore infrastructure types and technology stacks. If I had to manually instrument this, like how it is accomplished using Nagios or Zabbix, for example, it would take an extremely long time, perhaps years, to complete on my own. But with Dynatrace, I can install the agent, and as long as there is a properly formed connection between the agent and the SaaS platform, then I know that there is something to begin working with immediately and I can move on to the next and then review it so that the time to deployment is much shorter. It can be completed in months or less.
We employ real user monitoring, session replay, and synthetic monitoring functionalities. We have quite a few web applications and they generally have little to no observability beyond the infrastructure on which the applications run. The real user monitoring has been quite valuable in demonstrating to product owners and managers how the round-trips, or the key user actions, or expensive queries, for example, have been impacting the user experience.
By combining that with session replay and actually watching through a problematic session for a user, they get to experience the context as well as the raw data. For a developer, for example, it's helpful that you can tell them that a particular action is slow, or it has a low Apdex score, for example, but if you can show them what the customer is experiencing and they can see state changes in the page coupled with the slowness, then that gives a much richer diagnostic experience.
We use the synthetics in conjunction either with the real user monitoring or as standalone events for sites that either aren't public-facing, such as internal administration sites, or for APIs where we want to measure things in a timely manner. Rather than waiting for seasonal activity from a user as they go to work, go home, et cetera, we want it at a constant rate. Synthetics are very useful for that.
The benefit of Dynatrace's visualization capabilities has been more apparent for those that haven't used Dynatrace before or not for very long. When I show a product owner a dashboard highlighting the infrastructure health and any problems, or the general state of the infrastructure with Data Explorer graphs on it, that's normally a very exciting moment for them because they're getting to see things that they could only imagine before.
In terms of triaging, it has been useful for the sysadmins and the platform engineering team, as they normally had to rely on multiple tools up until now. We have had a consolidation of observability tools, originally starting with seven different monitoring platforms. It was very difficult for our sysadmins as they watched a data center running VMware with so many tools. Consolidating that into Dynatrace has been the biggest help, especially with Davis backing you up with RCAs.
The Smartscape topology has also been useful, although it is more for systems administrators than for product owners. Sysadmins have reveled in being able to see the interconnectedness of various infrastructures, even in the way that Dynatrace can discover things to which it isn't directly instrumented. When you have an agent on a server surrounded by other servers, but they do not have an agent installed, it will still allow a degree of discovery which can be represented in the Smartscape topology and help you plan where you need to move next or just highlight things that you hadn't even realized were connected.
What needs improvement?
The user interface for the management functions is not particularly intuitive for even the most common features. For example, you can't share dashboards en masse. You have to open each dashboard, go into settings, change the sharing options, go back to dashboards, et cetera. It's quite laborious. Whereas, Datadog does a better job in the same scenario of being a single platform of making these options accessible.
User and group management in the account settings for user permissions could be improved.
The way that Dynatrace deals with time zones across multiple geographies is quite a bone of contention because Dynatrace only displays the browser's local time. This is a problem because when I'm talking with people in Canada, which I do every day, they either have to run, on the fly, time recalculations in their heads to work out the time zone we're actually talking about as relevant to them, or I have to spin up a VM in order to open the browser with the time zone set to their local one in order to make it obvious to them without them having to do any mental arithmetic.
For how long have I used the solution?
Personally, I have been using Dynatrace since November of 2018. At the company I am at, we have been using it for approximately four months. It was used as a PoC for the first three months, and it has been in production for the past month.
What do I think about the stability of the solution?
The SaaS product hasn't had any downtime while I've been at my current company. I've experienced downtime in the past, but it's minimal.
What do I think about the scalability of the solution?
To this point, I've not had any problems with the scalability, aside from ensuring that you have provisioned enough units. However, that is another point that is related to pricing.
Essentially, its ability to scale and continue to work is fine. On the other hand, its ability to predict the required scalability in order to purchase the correct number of various units is much harder.
How are customer service and support?
Talking about Dynatrace as a company, the people I've spoken to have always been responsive. The support is always available, partly because of our support package. As a whole, Dynatrace has always been a very responsive entity, whether I've been dealing with them in North America or in the UK.
Which solution did I use previously and why did I switch?
We have used several other solutions including Grafana, Prometheus, Nagios, Zabbix, New Relic, AWS CloudWatch, Azure App Insights, and AppDynamics. We switched to Dynatrace in order to consolidate all of our observability platforms.
Aside from differences that I discuss in response to other questions, other differences would come from the product support rather than the product itself. Examples of this are Dynatrace University, the DT One support team, the post-sales goal-setting sessions, and training.
We're yet to have our main body of training, but we're currently scheduled to train on about 35 modules. Whereas, last year, when I rolled out Datadog, the training wasn't handled in the same way. It was far more on request for specific features. Whereas, this is an actual curriculum in order to familiarize end users with the product.
How was the initial setup?
In my experience, the initial setup has been straightforward, but I've done it a few times. When I compare it to tools like Nagios, Zabbix, Grafana, and Prometheus, it is very straightforward. This is largely for two reasons.
First, they're not SaaS applications, whereas Dynatrace is, and second, the amount of backend configuration you have to do in preparation for those tools is much higher. That said, if we were to switch to Dynatrace Managed rather than Dynatrace SaaS, I imagine that the level of complexity for Dynatrace would rise significantly. As such, my answer is biased towards Dynatrace SaaS.
What was our ROI?
In my previous company, it allowed a very small team to manage what was a very fast-moving tech stack. In my current company, it is still very early.
The consolidation of tools due to implementing Dynatrace has saved us money, although it's tricky to measure the impact. The list price of Dynatrace was more than the previous list price spend on monitoring tools because the various platforms had been provided as open-source tools, were provided through hosting companies, or had been acquired as part of acquisitions of other companies.
The open-source applications that we used included Grafana, Prometheus, Nagios, and Zabbix. New Relic through Carbon60 in Canada, as an example, was provided through a hosting company. Also, we acquired a Canadian company or had been acquired as part of acquisitions of other companies, AppDynamics, in a Canadian company, for example, with us in the budget of the previous company rather than our own company.
The hope was that Dynatrace through consolidation would release the material cost of the administrative overheads of tools like Prometheus and Grafana and the cost of hosting infrastructure for solutions like Nagios, Zabbix, Prometheus, Grafana, et cetera. This means that it is more of an upstream cost-saving, where we would be saving human effort and hosting costs by consolidating into a SaaS platform, which is pretty much all-in-one.
What's my experience with pricing, setup cost, and licensing?
Dynatrace's pricing for their consumption units is rather arcane compared to some of the other tools, thus making forward-looking calculations based on capacity planning quite hard. This is because you have to do your capacity planning, work out what that would mean in real terms, then translate that into Dynatrace terms and try to ensure you have enough Davis units, synthetics units, DEM units, and host units.
Catching those and making sure you've got them all right for anything up to a year in advance is quite hard. This means that its ability to scale and continue to work is fine but predicting the correct number of various units to purchase is much harder.
The premium support package is available for an additional charge.
What other advice do I have?
At this point, we have not yet integrated Dynatrace with our CICD tool, which is Azure DevOps. However, in the future, our plan is to provide post-release measurements and automated rollbacks when necessary. Even further down the road, there's ServiceNow on the roadmap, which we're currently bringing in from an Australian acquisition in order to try and promote the ITSM side of the business.
There is nothing specific that has been implemented so far, although there have been general degrees of automation. When we get Agile, DevOps, and ServiceNow in place, the degree of automation will increase dramatically. For example, automated rollbacks in the case of deployment failure or change management automation through the current state of the target system are being included in the ServiceNow automation.
The automation that has been done to alleviate the effort spent on manual tasks is still very light because I'm the only person doing the work. I generally don't have time to do the ancillary tasks at the moment, such as creating automations. It's mostly a case of deploying instruments, observing, and moving on. When we come back to revisit it, then we'll look at the automations.
My advice for anybody who is looking into implementing Dynatrace is to make sure you talk constantly with your Dynatrace representatives during the PoC, or trial phase because there is invariably far more that Dynatrace can do than you realize. We only know what we know. I'm not suggesting that you let Dynatrace drive but instead, constantly provide the best practices. You will achieve faster returns afterward, whether that's labor savings, or recovery time, or costs from downtime. Basically, you want to make sure that you leverage the expertise of the company.
In summary, this is a very good product but they need to sort out their user interface issues and provide a more logical experience.
I would rate this solution a seven out of ten.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.