Buyer's Guide
Network Monitoring Software
May 2023
Get our free report covering Zabbix, Paessler AG, LogicMonitor, and other competitors of SolarWinds NPM. Updated: May 2023.
709,643 professionals have used our research since 2012.

Read reviews of SolarWinds NPM alternatives and competitors

Jonathon Marshall - PeerSpot reviewer
Principal Engineer at Computex Technology Solutions
MSP
Great support experience, absolutely worth the money, and very helpful for quick discovery and troubleshooting
Pros and Cons
  • "They allow for integrations into their platform via API with PSA tools like ConnectWise Manage and ConnectWise Automate. They have a lot of add-on integration and plug-ins for a lot of the big names and IT RMM stacks commonly used in my industry space. These integrations are absolutely valuable. With the integrations into ConnectWise, we are able to automatically create and close tickets across systems."
  • "When you need to tailor an onboarding for a customer who wants different triggers and conditions for alerts that don't come out of the box in their default alert set for certain device types, you can make it happen and create those, but doing so isn't that easy."

What is our primary use case?

It has got a lot of use cases, but in my opinion, it's probably the best full-stack network monitoring management and alerting platform that's out there for routers, switches, firewalls, and non-server infrastructure.

How has it helped my organization?

It makes it a lot easier for our IT teams to have visibility into remote and distributed networks. Once you get your IT team members used to it, when you're having an issue, for example, while trying to SSH to something, they will go to Auvik first just because they have the geographic map, and they have these little dummy-proof exclamation marks. So, there might be an issue here. The way Auvik portrays the network from the outside looking in is like being Zeus on a little cloud. We can see what's going on with all our devices which we couldn't see before without having to log into each device individually, or we had to use a diagram that we made when they were set up and refer to that. Now, we have a live reactive changing diagram that allows our network guys to go straight to the actual device that's causing the network issue somewhere in this region and start troubleshooting that right away versus having to troubleshoot three, four, or five devices in that general area blindly, and then, eventually getting to the device they need to work on. It has saved an insurmountable amount of hours of network outages and down networks. It has also reduced our response times. We are able to get that information really quickly, and we don't have to go back and forth. What used to be a four-hour fix is now done in 30 minutes.

It has been great to allow our teams to focus on high-value tasks and delegate low-level tasks to junior staff. It has been great just because of the integration with our PSA ticketing system and the way we can set triggers, priorities, and levels of urgency with notes and all the other cool features they have there. It allows us to route tickets appropriately and then, they already have little checklists that pop up for common alerts that say, "If it's this and this, try this. If not, escalate to senior staff." It has sped that up quite a bit. Often, there's a lot of noise, and by getting the alerting down right to where there are actionable incidents that come in, it has sometimes added a little extra time for the tier one guys because often there are just too many alerts. You have one device that brings down a whole network, but you get alerts on every single device that's inside that network, whereas you only need to know the one. Sometimes, it's not easy from the face value to know which specific device it's until you get used to the tool and the customer.

Auvik keeping our device inventories up-to-date has helped save us time and money. We don't miss a lot of the warranty and inversion roll-ups, and some of our commitments where we have to do quarterly upgrades of the router, switch, and firewall environment. They are the kind of upgrades that aren't done automatically for anyone because you can't do those in the middle of the day. So, our ability to track assets, models, versions, and even warranty expiration dates, which they pull from public databases automatically for you, is invaluable.

What is most valuable?

They allow for integrations into their platform via API with PSA tools like ConnectWise Manage and ConnectWise Automate. They have a lot of add-on integration and plug-ins for a lot of the big names and IT RMM stacks commonly used in my industry space. These integrations are absolutely valuable. With the integrations into ConnectWise, we are able to automatically create and close tickets across systems. As alerts and new information comes into Auvik, when an issue or a trigger that was alarmed has been resolved, and it detects that it has gone away, based on our threshold, it can talk back to our ticketing system and auto-close it and send a notification. It's phenomenal. You don't have to wait on an email to go to another email, and then that email creates a ticket. It's very useful.

The network visualization is great in terms of overall intuitiveness. They couldn't do any more than doing a coloring book with pop-up pictures and coloring stuff. They made it easy for you to know where to look. They guide you to the right place. I always use the term Windows 85 just because they tried to simplify it so much and make it so easy that it became difficult for people because they are used to doing more steps. They're like, "Wait, that can't be right. That's all I had to do? There have to be more steps." Some of the things are hidden in plain sight, but when you find it once, you're good. The diagrams and the groupings of the sections are very down and out. Like Merkle Tree, they are easy to navigate, and then, they have a lot of cross-referencing hooks inside those sections of the UI that lead you back to the next expected place you'd want to go after making a change in that section. It's nice.

What needs improvement?

The monitoring and management functions or the out-of-the-box functions are fairly easy to use. When you need to tailor an onboarding for a customer who wants different triggers and conditions for alerts that don't come out of the box in their default alert set for certain device types, you can make it happen and create those, but doing so isn't that easy. Luckily, Auvik support is usually the best. They respond very quickly. You can message them right on a chat. You always get someone who knows what they're talking about, and then, they get you in the right direction. From a user perspective, customizing it's not intuitive, but it can be done with their help.

Its asset inventory is amazing. The only thing that they're still lacking is the ability to make it easier to import assets into their system when onboarding. Other than that, exporting and pulling data that is set up in Auvik is very easy, and it has made QBR with customers and things like that a lot of fun.

So, there should be more custom reporting options when importing or exporting. It should have better data ingestion capabilities, and we should be able to import more than just a CSV. They should also improve it in terms of customization for customer tenants and reporting and onboarding options for migrating from non-Auvik systems or no network monitoring systems into Auvik. It's still a very manual process even with the discovery. The onboardings are probably the longest part.

There is a hidden or unspoken bottleneck that I would like to see improved. When there are 800 to 1,000 devices in one subtenant, that is huge performance segregation. Generally, you're not going to have a lot of customers that have that much, but the solution is to create different sub tenants and such, but it's more of a hassle than it's worth. In the future, I would like to see if they could find a way to break through that bottleneck for the namespace tenants or for the customer tenants to where I could have all the customer network devices in one tenant. They could even be sectionalized inside the tenant, or there could be a way to mask the US1, US2, Customer-1, Customer-2, or whatever namespace in a way that they all also show up in the same portal tenant customer organization, and they all tie into our PSA tools with same API integration. I would like to see that happen. That's been the biggest hurdle for our enterprise customers and deployments because when you're first doing discovery and you start scanning, it starts pulling in everything like printers, computers, phones, and all the stuff you don't need. It adds up to 1,000 really quickly, and then the UI or refresh rate on the tool cripples drastically. That's the biggest thing, but it's not something that can't be overcome either by the options and suggestions they provide as of today. In those kinds of situations, it just requires a little bit of extra work to set up the additional tenants and get everything integrated.

For how long have I used the solution?

I've been using it for about five years.

What do I think about the stability of the solution?

Its stability is great. 

What do I think about the scalability of the solution?

I couldn't speak on their actual infrastructure because of the hosted solution. So far, I've seen just massive fast scaling from their infrastructure side just based on namespaces alone. I haven't seen any limitations personally other than the bottleneck I have, but that's not a limitation when there's a solution to create satellite tenants that will talk to each other for the same customer. If that were to continue on, I haven't seen anything that would stop me from creating unlimited 1,000 device namespaces per customer all tied into the same functions of their stack.

How are customer service and support?

I would rate their support a 10 out of 10. It's like they look out for me when I message support. For the last five years, every time I messaged them, they sent me the best guy they had, or that's the experience I've had. I have had nothing but a great experience with their support. I never had to get them on the phone either. It has always been through the chat, which is amazing.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We only use Auvik for routers, switches, and firewalls or just the network. We don't use it for any servers. We use a combined stack for that piece. Before we moved to Auvik, we used to use two extra tools, and then Auvik replaced those two. Now, we're just down to using two main tools to manage the entire customer infrastructure. We got Auvik and ConnectWise Automate.

In the past 13 years, I've used SolarWinds, NetNat, and Kaseya Traverse. We have used a good handful of managed service provider-focused tools. I used LabTech's very limited network monitoring management tool before they got bought and the name was changed to ConnectWise and ConnectWise Automate, but essentially, LabTech was the same tool as Automate. Anyone in the MSP business over in Houston was either using the Kaseya RMM tool or they were using LabTech's RMM tool to manage their customers. They mainly excelled at just workstations and server management, but they had some limited network functionality or network monitoring and management that you could do. Outside of that, this is the first one that would do it all. Usually, you had to get a vendor-specific one. You'd be doing a bunch of different vendor tools. You'd have a Cisco tool, you'd have a Fortinet tool, or you'd have a SonicWall tool. Each one of those tools monitored and managed just that class of product. It's nice to have one that does it all.

In terms of comparison of Auvik's cloud-based solution versus other on-prem network monitoring solutions, the only thing they have is collectors. They got the collectors, and all they do is that they relay information via HTTPS back to the AWS. AWS does all the magic with the databases.

How was the initial setup?

Deployments are extremely straightforward. My response would be biased because I have been using it for a while, but I don't see anything that someone who doesn't use it regularly might see as a problem or hurdle. I've worked with the support and used the tool so often. So, I know the little caveats where if something is wrong with the way it's talking to a device, if I wait 30 seconds and set the device to unmanaged and then set it back to managed again, I can reset it and reconnect the service. So, it's super easy. Their level of support is quick and very knowledgeable because their support doesn't work with any non-technical people because all of their customers are IT teams. You could probably log into a tenant, and if you have no idea what you're doing, just pop in there in message chat, and you can probably have them walk you through it at a fast enough rate to get you up in line and managing the day-to-day tasks for the customer in that tenant portal in just a matter of a week or less, depending on the size of the network. It could be a matter of a couple of hours.

We have our own process. We streamlined the onboarding process. We took the bits and pieces out of the Auvik documentation that we found to be more relevant and valuable during the initial customer discussions. When you're dealing with a lot of customers who also have internal IT departments, you have to lay out a lot of different concerns, questions, and things that evolve around their specific operations that you just can't predict from the get-go. So, we have our own process where it picks out the protocols that are relevant, the level of permissions that we need, the service accounts that we need, etc. We set those requirements and expectations in our scope with the customer, and they sign off on it that they get us this information within a certain timeframe. That helps speed up the process out of the box. Assuming everything is perfect and we have all of the access and all the keys to the kingdom of someone we're trying to deploy out of the box, we should have no problem deploying it very quickly. That's because all the credentials that we need to manage those devices are automated by an Auvik service account for logins, remote sessions, and SNMP. If all those are plugged in before we deploy the collector, and as we deploy the collector, it does all that magic for us. That's the automation piece involving connecting, discovering, pulling information, and wrapping everything together.

What about the implementation team?

I got a new guy who works with me now, but for the last three years, it has been solely me deploying Auvik for every customer and internally for our operations as well. I deploy it, configure it, and then I hand it off to Ops to maintain it, and they handle it from there. 

In terms of maintenance, it doesn't require much maintenance. In the past few years, there were some instances where they couldn't automatically update collectors from certain versions to certain versions when you passed a certain point. So, you just have to go in and update or just redeploy collectors for customers, which is due to the nature of how they are set up. You could have one that just breaks. You can spin off a brand new one in less than 30 minutes, and you're back to where you were before.

What was our ROI?

Every time we onboard a new customer to provide our IT services, there's a kickoff call that just says, "Hey, we're doing this." Auvik provides us the ability to perform discovery as soon as we have keys to their infrastructure.

There has been a reduction in our mean time to resolution (MTTR). From incident to resolution, it has probably cut that time down in half for the operations side.

What's my experience with pricing, setup cost, and licensing?

The prices change based on your partnership with them and based on the bulk amount that you buy and the account rep you're talking to. It depends on negotiations and the number of customers you have. 

It's absolutely worth the money. I would probably charge more if I were them. They don't charge you for anything that's not a router, switch, or firewall controller, or a network device. So, you can throw anything like servers and ESX hosts. You can throw network storage and all that stuff in there, and they have functionality in there for you to build out, monitor, and manage those as well, which you don't get charged for. You only get charged per device for a switch, router, or firewall, which is nice. You can have a collector for a customer, and it's just a minimal fee for the tenant. It's pretty neat. You can deploy as many collectors as you want to talk to that tenant for the customer on the fly and do discoveries. We also handle some emergency requests such as, "We need to figure out what we have on our network because we got ransomware, and we need to make sure all of our devices and all of our assets have the new antivirus. We're supposed to have 6,000 devices, but we're only showing this many." There have been times when we've literally just used the tool for discovery on a customer to collect a full report of assets and then used that to fix another whole different type of issue and provide solutions for more revenue to additional projects for that engagement. We use it ad hoc. We use it for month-to-month management of infrastructures. Now, we use it for discoveries and emergency projects where we need to collect a lot of information very quickly when we don't have any other IT at the other end to provide information on situations.

Which other solutions did I evaluate?

When I got hired with Computex, now Calian, they hired me because they didn't know what to do with Traverse. I made the decision and met with the engineering team. I was certainly 90% of the reason for the decision for them to move away from Kaseya's Traverse tool to Auvik's tool. I made that decision when I came on because I had a lot of background in it, and they had an acquisition where they had that tool for half of their businesses they were providing IT for, and then, they had Traverse. I convinced them to get away from Traverse because it wasn't a good tool, and then we moved over to a tool that did what we needed.

I had to do a lot of training. I had to host a lot of training and calls and some webinars for our NOC team, but once we got a hang of it, we were able to display it while the customer was at our NOC. We could display the active live network monitoring diagrams on our dashboards with all our other systems. It gives everyone a warm feeling when they can look over and see what's going on.

What other advice do I have?

I would advise first figuring out what you're trying to accomplish. If you are trying to ad hoc or duct tape other tools, rethink. Auvik performs and shows the most value when it becomes your sole tool for all of your network monitoring and management and alerting. If you're trying to ad hoc, duct tape, or throw in just for a feature or a filler for another product, you're just going to run into more headache. You only need Auvik to manage all of those things. If you're looking to Auvik for server management, workstation management, it's possible, but it's not built for that. So, make sure that it's for network devices only. It's not really designed to manage storage and hypervisors and remote access. It's not a day-to-day help desk support tool for you to hop on to user workstations and troubleshoot from that standpoint.

If you want just another monitoring solution, Auvik can do it, but Auvik's magic is the fact that it's a full stack. It's not just monitoring. It's full network management, remote access, and preventative maintenance. It's a full RMM tool. So, if you're looking for strictly an alerting tool for your network, you'd be wasting some very well-engineered features on the product by going with Auvik just for that. 

Its ease of use isn't too important for us, but it depends on the kind of use because we have layered access and levels of skill sets that are allowed to do certain things in it. From a broader perspective, 90% of the engineers that work for a managed services provider and 90% of the guys on our support desk aren't going to be there changing anything. It's just going to be the project team that sets it up, onboards it, and configures it. Once that process is standardized for us, there are only minor tweaks, based on the customer type, when we set up new clients. It becomes pretty streamlined. The only time that the ease of use helps is in the beginning when you first start using the tool itself. Once you've been a partner with Auvik, you've onboarded a few customers, and you've dug your way in and out, and around it, and you do a couple hundred after that, it's not as relevant.

It hasn't helped reduce repetitive, low-priority tasks through automation. They don't have much automation in the platform itself. The only automated thing that they do is to monitor conditions, and then the routing of the alerts, who they go to, and how those are handled. In terms of automation of maintenance on the network, there isn't any function like that in Auvik that I'm aware of. It's mostly just analytics monitoring and a remote access tool.

I would rate it a nine out of ten.

Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Flag as inappropriate
Sanket - PeerSpot reviewer
Chief Manager at a marketing services firm with 501-1,000 employees
Real User
Top 20
With the help of the solution, we can predict and prevent failures
Pros and Cons
  • "One of the most valuable features of ITRS Geneos is the active time feature that helps with the trading applications that I support."
  • "I would like ITRS Geneos to develop an app, where instead of going to specific login terminals or logging into laptops or desktops to check alerts, we can have visibility in the app itself."

What is our primary use case?

We are using ITRS Geneos for availability and performance monitoring. With respect to availability, there have been no observations of any business functionalities being impacted. All capacity parameters are okay. We are also using the solution for the performance part, where all the latency, the number of messages, the rates, and the number of external client logins are monitored. Lastly, we use the solution to monitor all the hardware capacity parameters such as CPU, disk, and memory usage for all applications. 

We use the application logs, as well as OEM logs. We monitor the processing rates, the number of messages processed, and the number of external login clients. We also monitor whether any exception handling or code has been created. Because our application does not involve a database, we are not monitoring database activity. We monitor inside the applications, to see whether the server is on primary mode or has gone to secondary mode, whether the failure has happened or not, as well as the configuration part that all the application configs are intact, there are no changes in the application configs, that we monitor. We also monitor the applications to make sure the connectivity across the various modules and various servers are there, which are interconnected either through PCP or our protocol. We are also monitoring the downstream systems and upstream systems. 

How has it helped my organization?

Our previous solution alerted us through an SMS or email. The alert would go to one person whose job is to monitor and if they were busy with another activity we would be delayed in responding to the situation. With ITRS Geneos, we can see everything on the dashboard. We can see the relationship between the two email alerts. We can also see the whole picture of where the issue is, allowing us to take action quickly. We can detect the actual connections between the two alerts. Suppose my alert comes from one server, and another alert comes from the client, we know there is a potential issue between the two servers and client connectivity. An email or SMS alert will never give us the full picture. So, at that time, I had to go into the system and focus on what the issue actually was. That was quite difficult. But now, with the new system, it's more proactive. We have started to monitor the system more closely because many of the alerts that we didn't have before are now enabled. For example, we have thresholds that warn us when we need to reduce space.

We first started to use ITRS Geneos in our department in 2014. After that, the solution was implemented in other departments, and the downstream systems inherited the monitoring. Trading is just one part of the organization where monitoring is now taking place; other systems, such as risk surveillance and collateral management, have also been upgraded. As an organization, we need to have a holistic view. This means that if I am a HOD, I need to have a holistic view of not only trading but also downstream systems. Previously, the senior manager did not have any view of what the issue was, where it was, and what the impact was. This is because there are many downstream systems that can be impacted. Now, with the holistic view in place, the senior manager is able to see where the issue is and where the impact is.

The solution provides lightweight data collection. We have recently had a business dashboard as well. Along with order messages and client logins, we are also monitoring capacity reports, view logs, and business reports. We have captured all of this information and put it in our database. We can see a historical view of user logins, latency, and whether or not it has improved.

With the help of the solution, we can predict and prevent failures. Many times, one incident can lead to another event. For example, if we have three modules and there is a buildup in module C, and there are observations and issues, ITRS Geneos can help monitor the buildup. If one of our people is trying to resolve the issues in module C, the monitoring person can also monitor the back pressure going to module A, then I will start checking the impact for modules A and C. Since we have the complete end-to-end connectivity part, with ITRS Geneos, I can make sure that the issue on the one server or the one module can be prevented on the other modules as well. We have prevented multiple P1, and P2 issues using ITRS Geneos.

We monitor the number of incidents proactively every month. This number is based on the number of incidents that we receive from ITRS Geneos. Roughly, 20 to 30 percent of all incidents are proactively detected and avoided using the solution. Whenever any incident comes up, the first question we ask is why wasn't it caught in the monitoring. In the last eight years, we have actually progressed so much with the solution, we configured multiple rules and multiple samplers, using ITRS Geneos because of each and every incident we learn from.

What is most valuable?

One of the most valuable features of ITRS Geneos is the active time feature that helps with the trading applications that I support. The HP OpenView solution we previously used worked on a 24-hour seven days week cycle but the market hours are 9:00 AM to 3:30 PM. 

ITRS Geneos has good GUIs that provide us with a single view of all the various applications. We have around 500 servers, and monitoring 500 servers on individual devices is a bit challenging. Putting the various applications in a single GUI is very difficult. In ITRS Geneos, all the GUI and user-friendly dashboards are there. All the information is available at a glance, we have five segments in case any one segment, or any one problem occurs, we can highlight it, and we follow the steps. We then access the active console which is a step-by-step detailed view to resolve the issue.

What needs improvement?

Currently, the most valuable thing for an individual is a mobile device. Since that is where people are currently tracking everything, we have multiple applications or apps that are for various products. I would like ITRS Geneos to develop an app, where instead of going to specific login terminals or logging into laptops or desktops to check alerts, we can have visibility in the app itself. Using the ITRS Geneos app, we could see the error message during our travels or wherever we are. 

I would like to see the capacity of messages for forecasting increased. Since the NSE is the number one derivative stock exchange in the work for three consecutive years, the number of messages is important. We use the capacity planner in ITRS to forecast our data needs for the next two months. The planner is important because the volume of data we produce is becoming more and more volatile compared to when we first started using ITRS Geneos in 2014.

For how long have I used the solution?

I have been using the solution for eight years.

What do I think about the stability of the solution?

ITRS Geneos is a stable product and we have not had any downtime in the past eight years. The stability portion is good, but we do need to configure new applications correctly for the solution to stay stable.

What do I think about the scalability of the solution?

The solution is scalable. We are the number one derivative stock exchange, and our number of servers has increased significantly in recent years. In 2014, the number of servers was 50, but now it is 500, ten times as many. Our gateway servers were also limited to around five, but now there are 15. This increase in scalability has allowed our maintenance team to deploy more gateway servers and improve our monitoring gateways. We are also looking into redundancy and unused data to further improve performance.

We currently have 120 people using the solution in our department and around 400 from all of the departments, including vendors.

How are customer service and support?

The technical support for ITRS is currently very good. The person who developed or implemented the ITRS dashboard previously provided technical support for us, so he has a good understanding of the system. Additionally, many of the deadlines for ITRS are set by top management. ITRS is seen as a high priority, so there is good response time from the ITRS-managed service currently.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

I work in a trading production support role. Eight years ago, we made the switch from using HP OpenView for monitoring purposes to using ITRS Geneos. This was because there was no GUI available with HP OpenView.

We have never considered any lower-cost or open-source alternatives because the team is comfortable with the current system. The L1 monitoring team is the team that checks the dashboards and provides alerts. They are so familiar with the system that we have never thought of getting away from it.

How was the initial setup?

There were a few team members involved with the deployment. I was there with respect to providing any details to ITRS Geneos or the signup portion of the solution, the testing of the ITRS, such as whether alerts are showing properly or not. At that time, I was involved in all these areas of deployment because we have various segments in our module. We started with only a specific segment. Initially, we focused on the GUI design, how it should look and what rules should be configured, and where it should come from. All those nitty-gritty details of importance are required for the ITRS dashboard. I was not responsible for the creation of the samplers, the installation of Netprobes, or the entering of rules. There was a separate team responsible for those tasks.

Developers faced a challenge when there was no database, but they soon found ways to overcome this obstacle. They had to monitor the application, logs, and other aspects of the system. After four to six months, they started to figure out how to monitor specific aspects of the system if there were multiple items that needed to be monitored. We need to consider three things: the first is whether ITRS will have a file or tooling mechanism; the second is whether there will be any impact on production; and the third is whether ITRS will have a built-in script. We have considered these three aspects of ITRS in order to go deeper into the application so Netprobes can act as the agents on the production machine to eliminate any impact. After all the items were gathered, the most important step was putting everything into one active console. This is where I accessed the files and did all of the reading. The fifth month was when we had all the GUIs created and the sixth month was when we started the testing part, making sure all the alerts are configured and displaying correctly. We did all the testing at the end of the day because it was in the production environment. There was some hard work that was put in, and the initial two months were difficult because we didn't know how to do it, or what needed to be planned. We completed deployment for all five segments we support within one year. In the first six months, we completed deployment for only one segment. Subsequently, for the next four months, we deployed with another two segments, and subsequently, in the last two months the entire deployment was complete.

What about the implementation team?

The implementation was completed in-house.

What other advice do I have?

I give the solution a nine out of ten.

We had only two people doing the setup, the manager and one other person. I was mostly a tester, testing the applications and making sure everything was in order for the solution. There were four of us in total during the deployment process. There are currently five to six people on a separate tooling team that handles ITRS Geneos. I am no longer part of the ITRS team.

Previously, ITRS Geneos was only implemented in trading operations from 2014 to 2015. In 2016, we started to use it in other departments as well. By 2017, all the departments were using the ITRS Geneos. There are five to six departments where the solution is used. In fact, we have created an exchange one view as well as a dashboard where we can look at six to seven departments as a single view. In case any specific department has any issues, the warning critical alert will come from that view.

The solution requires maintenance. There is a separate tooling team that takes care of ITRS Geneos. There are five people who are looking into all the maintenance, patching, and all areas of the gateway servers for the solution. Additionally, there are two more activities. The first activity is with respect to the uptime availability part related to gateway servers. The second activity is with respect to releases. If there are any application changes, whether the change has been done in the solution or not, they will also be taken care of. We are also evaluating new dashboards and features, such as our use of ICT and forecasting. There is a separate team who looks into all of this maintenance and development. Plus there is an onsite team in Manila.

For any mission-critical projects, I recommend ITRS Geneos because time is crucial. Everything needs to be resolved within five minutes, and the SLA is strict. To resolve incidents within a five-minute window, we need to monitor and escalate within 30 seconds. The team should focus on monitoring and recovery within the first 30 seconds.

Which deployment model are you using for this solution?

On-premises
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Flag as inappropriate
VENKATESHREDDY - PeerSpot reviewer
Associate IT Director at a tech services company with 201-500 employees
Real User
Top 5Leaderboard
Simple and easy to use, and has good probe server and reporting features
Pros and Cons
  • "The features we found most valuable in ManageEngine OpManager are the probe server and reporting because they're pretty good features."
  • "What I'd like ManageEngine OpManager to improve on is artificial intelligence. In particular, the machine learning feature should be integrated with the sensor flow. Doing this will give leverage, especially when you look at other products such as the Cisco DNA Center. When a switch goes down, I should be able to build on the correlation of other physical devices it's connected to so that I can integrate that with my CA CMDB. The ManageEngine OpManager team needs to draw a long-term roadmap where that feature becomes an integral part of the solution because right now, machine learning in ManageEngine OpManager is a long process. The solution doesn't have MLS search and I want to see ML being developed and applied for CA CMDB to greatly reduce the burden of tying everything. For example, if I have a data center switch that goes down now, I should know what server it's connected to, and when that switch goes down at twenty-four ports, I would get twenty-four alerts for different devices plugged in. I should be able to make a correlation that the major problem lies in the switch and not with the twenty-four elements connected to that switch. That is where machine learning should come into play and the ManageEngine OpManager AI should indicate "This is where the root of your problem is." It could be difficult, but this is a feature that should be improved or added to the solution."

What is most valuable?

The features we found most valuable in ManageEngine OpManager are the probe server and reporting because they're pretty good features.

What needs improvement?

What I'd like ManageEngine OpManager to improve on is artificial intelligence.  In particular, the machine learning feature should be integrated with the sensor flow. Doing this will give leverage, especially when you look at other products such as the Cisco DNA Center. When a switch goes down, I should be able to build on the correlation of other physical devices it's connected to so that I can integrate that with my CA CMDB. The ManageEngine OpManager team needs to draw a long-term roadmap where that feature becomes an integral part of the solution because right now, machine learning in ManageEngine OpManager is a long process. The solution doesn't have MLS search and I want to see ML being developed and applied for CA CMDB to greatly reduce the burden of tying everything.

For example, if I have a data center switch that goes down now, I should know what server it's connected to, and when that switch goes down at twenty-four ports, I would get twenty-four alerts for different devices plugged in. I should be able to make a correlation that the major problem lies in the switch and not with the twenty-four elements connected to that switch. That is where machine learning should come into play and the ManageEngine OpManager AI should indicate "This is where the root of your problem is."  It could be difficult, but this is a feature that should be improved or added to the solution.

What I'd like to see in the next version of ManageEngine OpManager is for the machine learning and AI to be tied with the correlation engine because, at the moment, the existing correlation is lacking in terms of contextual awareness. Contextual awareness has to be more customized so that the operator or the person building or customizing the tool should have the flexibility to tie it with the CA CMDB application and the underlying inventory. This way, ManageEngine OpManager becomes more of a straightforward, next-generation solution concerning monitoring. For example, another player like New Relic has an application management module that's far ahead, but it's still lagging and would still need to catch up in terms of managing needs.

ManageEngine OpManager has integrations through APIs, but still needs further enhancement because there are still certain applications that would take hits, so the level of contextual awareness needs to be built on more, and that takes time.

For how long have I used the solution?

I've been using ManageEngine OpManager for about one year now.

How are customer service and support?

I found the support for ManageEngine OpManager fantastic. I've called the support team several times, and the team always came back to me immediately. There wasn't any delay with the response, and whenever there was a feature that my company felt was missing in ManageEngine OpManager, the product engineering team would include that in the roadmap. The team also indicated other features still being worked on and what has been worked on. My interaction with the ManageEngine OpManager support team has been good so far. There's no doubt about it.

Presale, after-sales, and whenever there's an issue, I would rate the support provided by the product engineers of ManageEngine OpManager as five out of five. I would recommend the solution in terms of the support provided because, in comparison with other products, the support team listens to the customers. It's been a nice experience for me, so far.

Which solution did I use previously and why did I switch?

My company has been comparing products and found out that it's better to stick with the existing product, ManageEngine OpManager, instead of going with another because now the team knows about the issues and downsides and how to address those.

How was the initial setup?

The initial setup for ManageEngine OpManager was very simple. It was very, very easy. On a scale of one to five, with five being very easy and one being very difficult, my rating for the setup is four out of five.

What was our ROI?

Concerning the ROI from ManageEngine OpManager, we're very happy and satisfied because we didn't have to pay for additional overheads, particularly because we now don't need that many admins, so we were able to save. We're able to run the show with just three operators.

On a scale of one to five, with one being the worst and five being the best, I would rate the ROI we get from ManageEngine OpManager as four out of five.

What's my experience with pricing, setup cost, and licensing?

Pricing for ManageEngine OpManager depends on the number of nodes you onboard and whatever pricing is reflected on the ManageEngine portal which offers discounts from ten percent to fifteen percent. It purely depends on the sales volume and the negotiation.

In my organization, there are about one thousand two hundred nodes which cost around $29,000 per year. There's no additional support fee or maintenance fee from ManageEngine OpManager.

Pricing for the solution is highly competitive and I would rate it five out of five. ManageEngine OpManager is one of the most competitive options out there.

Which other solutions did I evaluate?

I did a small POC on SolarWinds, and unfortunately, the supply chain code got hacked and that impacted other SolarWinds products a lot. SolarWinds took a beating, and if that hack didn't happen and there wasn't much impact on other product lines, SolarWinds would be the clear winner over ManageEngine OpManager. Price-wise and support-wise, both solutions are good, but because of the hack, there is a security concern with SolarWinds, and this was a major reason to push back on SolarWinds, otherwise, my company would have gone with that solution.

I also compared ScienceLogic with ManageEngine OpManager, and though ScienceLogic has its engine, the engine fails whenever I try to build custom reports that my customers need. In ManageEngine OpManager, on the other hand, the process is simple. You just export it to a CSV format and you can do whatever you want with your other tools, so it's very easy, but in ScienceLogic, doing it would take a lot of time because of the connectors.

ManageEngine OpManager works better than other solutions because you don't need to be a guru to do simple tasks on it.

What other advice do I have?

I'm currently using ManageEngine OpManager.

Within my organization, thirty people use ManageEngine OpManager daily. The solution is being supported 24 x 7.

In terms of maintenance, a stable code is released every six months, so my team has to go back and plan it accordingly, which means keeping all the servers redundant. During the design process itself, the ManageEngine OpManager team did indicate that every quarter, certain packages would be released because the solution is dependent on Java and other frameworks, so it needs to be patched accordingly. Whenever a new patch is released, or there's a major code release at the design level, you need to keep servers more redundant, so you won't run into any issues, and this has helped my company reduce downtime. The monitoring has been always available and ManageEngine OpManager has never gone down. Only one staff is required for its maintenance.

My advice to others looking into implementing the solution is that if you're a legacy customer and you have budget constraints, ManageEngine OpManager is the way to go because it's a very simple solution for deployment, monitoring, and device availability. Though other competitors have advanced features, if I'm not using those features, then it doesn't matter. I'll be better of with ManageEngine OpManager because it's simple to use.

My rating for ManageEngine OpManager is eight out of ten.

My company is a customer of ManageEngine OpManager, not a partner.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
Head of Data Architect at LendingTree
Real User
Top 20
Instantaneous response when monitoring logs and KPIs
Pros and Cons
  • "CloudWatch immediately hooks up and connects to the KPIs and all the metrics."
  • "It would be beneficial for CloudWatch to provide an API interface and some kind of custom configuration."

What is our primary use case?

We use the solution to monitor our AWS resources. We used Azure extensively but a couple of years back we moved to use both Azure and AWS. Currently, we have three main use cases. 

Our predominant use case is monitoring our S3 which includes terabytes of data. We monitor all the buckets and containers plus who has access to them, the thresholds, and user data. We constantly watch all the KPIs and CloudWatch metrics. 

Our second use case is watching logs and processes for other products such as AWS tools, AWS Glue, and Redshift which includes a few terabytes of data.

Our third use case is minor and with Athena.

Our fourth use case is new. We just started using SageMaker for a small POC and want to complete all of our data modeling and logs.

In the future, we will be using the solution with Airflow, which will become one of our biggest use cases. 

CloudWatch works very well with any of the AWS resources so we always monitor through it.

How has it helped my organization?

Our business flow has improved because we monitor email thresholds and immediately get an alert from CloudWatch if use goes beyond thresholds. Without this alert, we would have to use external monitoring. 

What is most valuable?

It is valuable that CloudWatch collects all the metrics. I primarily like the RUM. There is an instantaneous response when monitoring logs and KPIs. CloudWatch immediately hooks up and connects to the KPIs and all the metrics. 

What needs improvement?

Even though the product works well with most AWS, it is a nightmare to use with Snowflake. Snowflake is a SaaS product hosted on AWS, but using it with CloudWatch still doesn't give us the support we need so we rely on separate monitoring. 

We have many databases such as MongoDB and SQL Server, RDS, and PostgreSQL. For these, CloudWatch is good but a little basic and additional monitoring tools are required. It's challenging to use one monitoring tool for S3 and another monitoring tool for Snowflake. 

It would be beneficial for CloudWatch to provide an API interface and some kind of custom configuration because everybody uses APIs now. Suppose Snowflake says we'd get all the same things with MongoDB such as APIs, hookups, or even monitoring. That would allow us to build our own custom solution because that is the biggest limitation of CloudWatch. If you go a bit beyond AWS products even if they're hosted on AWS, CloudWatch doesn't work very well. 

I'd also like an improved UI because it hasn't significantly improved in a few years and we want to see it at a more granular level. I get my KPIs in a bucket usage for yesterday but I'd like to see them by a particular date and week. We have three buckets rolled by hundreds of people and I want to see use cases for an individual to determine where I need to customize and provide more room. I want aggregation on multiples, not one terameter. 

For how long have I used the solution?

I have been using the solution for two years. 

What do I think about the stability of the solution?

The solution is very stable with absolutely no issues. We used to see a delay when we were setting up three buckets but now we receive instantaneous notifications. 

What do I think about the scalability of the solution?

The solution is definitely scalable. Most of our development environment uses it and we are running three teams of 150-200 people. Usage levels are different between developers and the support team so the total users at one time is 100-150. 

The solution is managed by our internal AWS maintenance team. Seven people manage our cloud environment and seven manage our platform side for not just CloudWatch, but everything on AWS.

We still need to find a solution for Snowflake and Tableau environments unless CloudWatch provides better support in the future. 

How are customer service and support?

The support staff are seasoned professionals and are good. Amazon provides the benchmark for support and nothing else compares.

Which solution did I use previously and why did I switch?

On-premises, we have used other solutions like Sumo Logic, Azure Logic Apps and others. Not everyone uses AWS so we have a lot of tools we use.

Previously we used some main external app logic but it didn't work well with AWS tools. I would have to figure it out and configure Aurora to do something or find a way to do S3 buckets. Those solutions worked well for on-premises, but not with AWS and clouds.

How was the initial setup?

The setup for this solution is pretty simple and anyone can do it if they are on AWS. Setting up all our VPC and private links connecting to our gateways took some time, but CloudWatch setup was a no-brainer and took a couple of days. 

What about the implementation team?

Our implementation was done in conjunction with a third party. We like to bring in a few engineers to work with our engineers and then we partner with a third party like Slalom to help with integration. Our process is a mix of all three with AWS staff helping for a couple of weeks and Slalom for a couple of months. Our team slowly takes over management. 

What was our ROI?

We plan to increase our usage because we don't have another monitoring tool right now. With the Airflow orchestration, our CloudWatch use will significantly increase as we monitor all of our RUM, notifications, jobs, and runs. Our runs and billings will increase 20-30% once we start using Airflow. 

Because CloudWatch doesn't support all externally hosted products, I rate it a nine out of ten for ROI. 

What's my experience with pricing, setup cost, and licensing?

I don't know specifics about pricing because we pay for all our AWS services in a monthly bundle and that includes CloudWatch, Redshift, VPCs, EC2s, S3s, A39s, and others. We spend about $5 million per year on AWS with CloudWatch being about 5% of that cost. 

Which other solutions did I evaluate?

I did not evaluate other solutions. Once we moved to AWS, we looked for a tool that was native to that cloud. That is the process we are currently undertaking for Snowflake and Tableau because CloudWatch doesn't support them well. We do try to use CloudWatch as much as possible. 

What other advice do I have?

The solution is pretty good because it automatically comes and works well with AWS. Before you use any product from AWS, think about whether it is supported or how it will interface. I suggest using the solution with one product at a time and then transitioning to important interfaces. 

If you find you can't configure the solution with Redshift for example, and are struggling to build your S3 even though both use S3, then you may have to find another monitoring solution. It makes sense to follow Amazon's best practices. They advise not to use certain monitoring components alone but to use them as an integral part of your system. Monitor your ecosystem and think of a high-level picture of it rather than just determining that CloudWatch must be a part of Redshift. This solution is just one part of an entire system. 

I would rate the solution a nine out of ten. 

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Monitoring Manager at a transportation company with 10,001+ employees
Real User
Top 20
The monitoring is easy to set up, and the reports are filled with helpful information
Pros and Cons
  • "Another feature we use is Business Activity, which provides us with an end-user perspective when a service is down or isn't working correctly. This is helpful when monitoring the KPIs. When we see a device or server that isn't working, we find the root cause."
  • "Centreon introduced network discovery in the most recent update. However, it doesn't work well. Our previous monitoring tool could discover networking equipment on the network and identify the relationships between the devices."

What is our primary use case?

We use Centreon to monitor the server and network devices. It also provides reporting that informs our capacity planning with regard to storage on the server disk. We use it on-premises. We have a central server and around 14 monitoring pollers on the remote side that report to the main server.

I'm the monitoring administrator, so I set up some dashboards for my colleagues and users in the IT department that are customized to provide the information they want to see. They want to look at network devices and create a map of the equipment that allows them to see usage and if something has broken down. 

How has it helped my organization?

Centreon's ability to model IT service maps has been crucial for us. It enables us to get a quick look at our business needs. Centreon can measure the service performance to show if everything is working well. 

What is most valuable?

The reporting and monitoring features are the most valuable. The monitoring is easy to set up, and the reports are filled with helpful information. You can quickly find and fix the problem when there is an incident. 

We use out-of-the-box reports instead of customizing them. When we deploy something, it comes with some reporting templates, so we just use those. We use the existing reporting template to get information about the monitoring device. It's easy to apply the template to our reports. We can do it in two to five minutes.  

Another feature we use is Business Activity, which provides us with an end-user perspective when a service is down or isn't working correctly. This is helpful when monitoring the KPIs. When we see a device or server that isn't working, we find the root cause. 

We also use Plugin Packs to monitor most of our equipment because it's simple to deploy a monitoring template. If I want to monitor a device, I install the Plugin Pack and the required package. If the Plugin Pack doesn't give me the information I want, I can write a plugin to monitor the device. We do this for a few instruments.

Anomaly Detection is another handy feature that we use to discover some issues. I estimate that using Anomaly Detection has cut our resolution time in half. 

What needs improvement?

Centreon introduced network discovery in the most recent update. However, it doesn't work well. Our previous monitoring tool could discover networking equipment on the network and identify the relationships between the devices. 

For how long have I used the solution?

I have been using Centreon for about three years now.

What do I think about the scalability of the solution?

Centreon is scalable. We started with a central server and four remote pollers and continued adding some pollers. We can add or remove pollers without problems. Currently, we have around 14 pollers, and they all work well. When I want to add or remove a poller, I don't need to change the configuration on the central server or resize the server. We have around 200 users, and we are monitoring around 5,000 devices.

How are customer service and support?

I rate Centreon's support nine out of 10. They respond quickly when we have issues. Someone is always available. If I open a ticket, I usually get a response in 10 or 15 minutes. 

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We previously used SolarWinds Network Performance Monitor. When we compared the two solutions, we found that Centreon had stronger reporting features and a customizable dashboard. Centreon was also more affordable. SolarWinds' network discovery feature is better, but I think Centreon can improve this aspect. 

How was the initial setup?

Setting up Centreon isn't complex. All the documentation is clear and easy to follow. The total deployment time depends on the size of your deployment. It might take only two or three hours if it's just one monitoring server. You only need to download the application and install it on your network. If you want to start from scratch, setting up the environment might take four hours.

It can take around five to 10 days to deploy for a complex deployment. It might take that long if you need to set up some monitoring on the remote side and deploy a reporting server or other components,

What about the implementation team?

A five-person in-house team deployed the solution with help from Centreon. Their team helped us achieve our goals.

What's my experience with pricing, setup cost, and licensing?

I don't have a lot of information about the price. A different team handles procurement. However, I know the price is based on the number of devices monitored, and you get a discount for a larger number. 

What other advice do I have?

I rate Centreon nine out of 10. This product provides comprehensive reporting for our top management. When my manager asks for a report, it's easy for me to generate one based on the reporting template. We can also set up some level agreements on some services, and from the reports, we can check to see if maintaining this level of equipment will work. 

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
Buyer's Guide
Network Monitoring Software
May 2023
Get our free report covering Zabbix, Paessler AG, LogicMonitor, and other competitors of SolarWinds NPM. Updated: May 2023.
709,643 professionals have used our research since 2012.