What is most valuable?
Monitoring: Time to Threshold (TTT) and Time over Threshold (TOT)
I work with enterprise-size IT environments with 10,000+ servers. These features help to reduce the event noise to a level that operation teams are able to manage. Rather than sending alarms directly from the server agents, TTT and TOT use predictive analytics on the metric data, which enables greater flexibility for event thresholding.
Visualisation: Unified Service Manager (USM)
USM is the core web portlet within the Unified Management Portal (UMP). From here, it is possible to dynamically group infrastructure components together, which is very useful for multiple reasons:
- Dynamic groups propagate all events from the infrastructure within. This allows for service-orientated, technology-based and business views, which greatly increases visibility of the entire IT infrastructure in a single pane-of-glass approach.
- Dynamic groups allow for sets of infrastructure to have monitoring applied automatically in a type of ‘policy-based monitoring’.
- Dynamic groups allow for configuring sets of infrastructure to be placed into maintenance mode, either on an ad-hoc basis or scheduled period.
USM allows the operator to drill down into the dynamic groups, to device views where event and metric data is combined to clearly visualise the current operating status of the infrastructure.
How has it helped my organization?
We are currently migrating from an IBM Tivoli solution. CA UIM will improve effectiveness of monitoring, increase visibility of IT infrastructure, reduce time to fix (MTR) and lower solution maintenance. In a large organization, CA UIM has the capability to reduce overall FTE substantially.
What needs improvement?
Parts of the Unified Management Portal are not written in HTML5. I would like all components (Portlets) to be HTML5. This would increase the speed and responsiveness of the site, and possibly improve the appearance.
Improved network monitoring and topology mapping: Although this functionality does exist, it requires enhancement. CA UIM is very accomplished at monitoring the majority of IT infrastructure and is capable of collecting and alerting on the vast majority of metrics across network device vendors. However, the configuration of network device monitoring could be improved. The latest SNMP_Collector, and ICMP (ping) probes only allow for monitoring of discovered devices and are configurable via the web-based Admin Console. The previous equivalent probes were less dynamic but more flexible, being configured via both the Admin Console and the client-based Infrastructure Manager. A combination approach of dynamic and manual network device monitoring would obviously be more beneficial.
As for network topology mapping, this is achieved via disparate pairs of discovery_agent and topology_agent probes located in each network segment gathering device information via ICMP, SNMP, Telnet and SSH. This mechanism actually works really well, but it’s the way the data is collated, interpolated and represented in the topology views which requires attention. Having some predefined views to depict different network layers, the ability to show routing of traffic or bandwidth utilisation would be great. Also, it would be nice if more detailed device information was available via a mouse-over.
Note: I have not used the topology mapping in UIM 8.4, but I’m not aware of any significant improvements.
For how long have I used the solution?
I have used it for eight years.
What was my experience with deployment of the solution?
One thing to note with this product is that in my experience, when configured and spec’d out correctly, CA UIM is very stable and fully scalable.
Scalability and security comes from a hub-based architecture. Hubs can be scaled horizontally and vertically, and can connect across DMZs or similar secure zones via the use of UIM application-layer SSL tunnels.
The internal UIM agent deployment mechanisms aren’t necessarily suitable for enterprise customers. With the use of BladeLogic or similar software deployment tools, it becomes a very easy and uneventful process. One thing to bear in mind is that the Unix agents are required to be installed as root, or root-equivalent, user to avoid potential issues.
How are customer service and support?
The product support in recent times has improved significantly. I have had issues resolved competently and within a satisfactory timeframe.
The only bugbear is on occasion when reporting a product defect, CA respond that it is working as designed and ask the customer to add an ‘idea’ on the forum to be voted on by the users.
For example, I noticed that when using the process monitoring probe, the Windows memory usage metric was collecting how much memory the process was using as prescribed, whereas the Linux memory usage metric was collecting the ‘virtual’ counter and not the correct ‘resident’ counter. I was initially told that this was by design but after some discussion, CA admitted this was a defect and swiftly added the correct counter to the probe.
Which solution did I use previously and why did I switch?
I have evaluated and used many monitoring tools, from open-source to enterprise-class solutions and everything in between. They all have good and bad points, but scalability and flexibility seem to be most discussed, followed by stability and security.
CA UIM comes out on top very often as it excels in all four of the above criteria, and is also easy to deploy and comparably simple to operate.
- Scalability: Addressed elsewhere.
- Flexibility: Is achieved via the use of UIM’s REST API available for custom integrations and the ability to build custom monitoring probes using supplied SDKs.
- Stability: Difficult to prove in a POC; however, I can testify that when implemented correctly with appropriate self-monitoring, the tool does not tend to fail without outside influence.
- Security: The solution infrastructure can be connected securely and effectively hardened. The solution is fully multitenant compliant, which means inventory, metrics and events can be isolated between groups of operators. This is particularly useful for MSPs who allow customers to log on and view infrastructure status or service levels.
Several products I have evaluated claim to be multitenant compliant, but are in fact only able to monitor multiple ‘customers’, but not segregate the event and metric data.
CA UIM can be used as a standalone monitoring solution in many small- and medium-size organisations. It tends to be integrated into other CA products for large and enterprise-size organisations, where greater/granular application transaction monitoring is required, more in-depth network monitoring necessary and full service views are essential.
How was the initial setup?
For an enterprise-class infrastructure monitoring tool, I would suggest that it is very straightforward to implement after some basic training.
To paraphrase an unnamed CA UIM Sales guy, ‘When discussing CA UIM implementation times, we tend to talk in weeks and not months’. This was in response to a potential mid-size customer asking how quickly they could get up and running.
For a moderately large and complex server and application monitoring solution, I would suggest that CA UIM would take at least 25% less time to implement over an equivalent IBM Tivoli solution.
What about the implementation team?
I have been involved in both in-house and vendor team implementation scenarios over the years. On this occasion, I'm virtually the sole resource responsible for implementing a 10,000-server solution.
It is likely, almost imperative, that someone new to CA UIM should seek some professional assistance during the design phase, either from the vendor or an independent consultant. Failing this, a CA UIM training course is advisable.
As with all monitoring solutions, prior to implementation, make sure to perform a requirement-gathering exercise, encompassing topics such as 'Infrastructure Functionality', 'Security, Encryption & Resilience', 'Presentation & Reporting', 'Event Handling', 'Integration', as well as all the various types of monitoring such as ‘OS level’, ‘Application’, ‘Database’, ‘Storage & Virtualisation’, etc.
The requirements under each of these headings would should be associated with one or more 'use cases', in order to validate functionality or compliance.
Without the above, it is difficult to know if you have successfully implemented the solution, and what areas are lacking or needing improvement.
What was our ROI?
The pricing model is modular and based upon level and type of monitoring by quantity.
What other advice do I have?
Create an initial design document to help plan your implementation and identify potential issues beforehand. This document will inevitably evolve throughout the implementation and will provide a reference and a guide.
*Disclosure: I am a real user, and this review is based on my own experience and opinions.
How would improved network monitoring help your workflow?