What is our primary use case?
Primary use case would be exposing application performance, and incidents and errors within the application. It has performed exceptionally well.
How has it helped my organization?
Understanding and visibility, and the ability to provide the same answers across the different archetypes of support personnel, maintenance personnel, business personnel, executives, middle managers like myself, where we're all telling the same story and we're all working off of that same story, to understand what's going on.
The main benefit has been time, definitely. Over my career I've spent hundreds, if not thousands, of hours in war rooms picking problems apart, over-analyzing issues, chasing red herrings, and this type of solution, or solution set, not just AppMon but Dynatrace, and even the Synthetic portion, really helps us narrow down what we're looking for.
What is most valuable?
Capture of 100% of the traffic. Exposure to downstream services, that might not necessarily be new, to everybody who's using applications. It triggers them and captures them and it gives visibility to some pieces that might be forgotten or even obscured.
What needs improvement?
If it is AppMon, I would really like ease of integration developed into Logstash. The business transaction data doesn't have a natural feed through the GUI, through the configuration. We have to do a little jiggering in between to get it to feed, so I'd like to have that out-of-the-box. That'd be great. We have now, out-of-the-box UEM integration, I'd like to have the rest out of the box as well.
And if it's Dynatrace we're talking about, I really think they're on the right track as it is, because of all the AI and all the session replay and all these fantastic things we've been shown.
And if it's the Dynatrace Synthetic which we also use, I would love to have higher-level analytics across the tests. Where today we get errors and generate them per test, but we have clusters of tests that are for the same application, I'd love to see a little bit more analysis done across series of tests, so that we can have higher roll-ups of actionable information.
What do I think about the stability of the solution?
This is an interesting question. We've had our challenges in the past because our primary tool over the five years has been AppMon, and AppMon has had a series of evolutions. We started with the 4.2 version and we've come all the way to version 7 at this point. It was never intended to be a high-availability solution or a clustered solution, and some of those improvements have been made more recently. But historically, it was fragile.
Like I said, I have a very large implementation. Over six thousand agents with AppMon. Some of our servers are very highly loaded, over a thousand agents, and when we talk about our online banking, mobile banking platforms, we drive significant load and it can really impact the viability of the servers.
To be fair, we were pushing the product to its limits, and it even prompted some of the architectural changes within Dynatrace itself, and within the AppMon tool, to allow for larger footprints. But generally, and lately, it's been extremely stable.
What do I think about the scalability of the solution?
The AppMon product hasn't been historically as scalable. That is one of the reasons we're really excited about Dynatrace product, because it was redesigned for scalable environments with scalability itself in mind.
How are customer service and technical support?
The technical support has been fantastic, even getting right up to third-level support and getting changes overnight.
A small anecdote: We needed some changes to the UE mobile agent and we needed them in a hurry. And support turned that ask around in two days, which was phenomenal.
And then, I started talking to some of the guys in Boston, Detroit about some of the exciting changes they're making for their support model where they can have off-site guardians. I actually employ two guardians myself at a time. I have them on a one year contract. Putting them in-house has been invaluable.
The idea of other organizations being able to use Dynatrace guardian hours, and doing it piece meal as they need it, is great because not everybody needs as much hand-holding, but everybody needs a little help some time. The response time and the knowledge has been tremendous.
Which solution did I use previously and why did I switch?
We've used a number of tools. We've used SCOM and Wily Introscope and Groundworks. We've used Nagios, Zabbix. We've used HPE RUM which was terrible. It cost a lot of FT overhead. There have been a few others, I just can't remember them offhand.
A lot of them were siloed, very siloed approaches to monitoring. Some of them have similar approaches, DC RUM is the same as HPE RUM, but the manpower overhead is significant. The challenge there is they just don't talk to each other. And they're not providing the same information to the same people because people craft the output to what they want, and they're not trying to tell the same story. Dynatrace just attempts to tell the truth.
To be honest, I wasn't part of the board of smarty-pants that brought the solution in, but I can imagine the criteria they looked at included breadth of coverage of technologies, the cost, and ease of use. Either way, I thank that team because it changed our lives.
What other advice do I have?
When it comes to the nature of digital complexity, the role of AI when it comes to IT's ability to scale in the cloud and manage performance problems is absolutely crucial. Last year I spent a large portion of my time doing an investigation into AI capabilities for IT operations, and I evaluated several products in the market space. I found they're all very, very immature, but it's an absolute necessity for us going forward.
We're a very large bank and we have hundreds of thousands of users, thousands and thousands of applications. When you start scaling up to the cloud with microservices, the sheer volume of data is so massive that human beings can't evaluate it anymore. It's not possible. AI is the only way that we're going to be able to move forward into the future with these types of architectures, and still get the value out of the data that we're recording.
I've definitely used so many siloed monitoring tools in the past. The challenge is when it comes to clustering and high-availability - that type of solutioning where we look at strict node-based siloing and then application based siloing. Even then you're limiting yourself to the purview of what's in that container or what's in that application, and if you're not looking outside of yourself then you're really just looking for a culture of "not me," instead of fostering a culture of this is what it is. Let's work together.
If we had just one solution that could provide real access and not just top line data, I think it would probably free us up in terms of manpower and work hours, to allow us to do more value-add things. If all we're doing is working with top level data, then you have to spend a lot more time digging deeper to find your cause or to find actionable insights into the applications, and that chews up manpower. In this day and age, IT overhead really has become "Let's look at the employee first and cut that first." So, if we need to move in that direction, having something that provides real answers helps us to make that adjustment.
I rate Dynatrace an eight out of 10. I never want to give a perfect score because there's always room for improvement. But it's been a great journey for me and I look forward to many more years with it.
I'd recommend you look at Dynatrace. It's really the only one worth looking at.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
thanks for your frankness! It encourages me to recommend DynaTrace to Ops teams,
grtz, Erik