What is our primary use case?
Our primary use for the solution was to monitor the network and servers.
How has it helped my organization?
We used this solution for monitoring our applications and our internal web applications, which we couldn't do with Icinga (the previous software we used). That product was very good but the application monitoring of it wasn't good at all. Now I hear that application monitoring is very good in Icinga EN, so we might try it again.
What is most valuable?
The most valuable thing about the Zabbix product is that it was easy to install and manage.
What needs improvement?
There are a lot of things that can use improvement which is why we are seeking a new solution. Network monitoring is a problem. It gives too many false positives. For example, it notifies us that a server is down while I'm using that server — the server it claims is down — to do the search. A moment after the search is complete, everything is OK again. It's called flapping. It has some flapping control, but it's not as good as other products. I used to use Icinga and it is a better product in that respect.
For how long have I used the solution?
We had been using this product for about four years or more.
What do I think about the stability of the solution?
I know it doesn't just go dead on me, so the stability is okay I guess.
What do I think about the scalability of the solution?
I don't really know how the product scales as I haven't tried to scale it up or I haven't had the need. Considering that we will be moving away from the product, I don't need to bother with that right now.
How are customer service and technical support?
Their online help is okay, but not as amazingly good as Icinga, for example. Icinga has the largest user base as far as I can tell and I use IRC for help a lot and there's a lot of people in it. There are few helping with Zabbix. The forums are pretty up-to-date.
As far as tech support itself, I have emailed them about the flapping issue, but the main issue that bothers me is the flapping and infrastructure monitoring which is not very good. The support team suggested a few tricks which would help but not as much as I want. The support team is responsive, but nothing was resolved.
Which solution did I use previously and why did I switch?
We are actually looking for another solution that meets our needs better. That is not because Zabbix is necessarily a bad solution. Our needs changed and there are better solutions available.
I previously used ManageEngine OpManager in another organization, so I guess the reason as to why I switched was that I changed jobs. That product was excellent, but it is also out of the reach of my current budget.
How was the initial setup?
Installation was not hard and it was very straightforward. The initial deployment without any complex setup took a few hours. Then the setup took an additional three days to a week. In total about a week to be completely deployed with all the servers monitored and everything working.
I did the deployment alone. I set it up and everything, and left it to other people to monitor as per their responsibility. I have 10 admins monitoring it. I've stopped monitoring it at this point, or in the half past year or so, as I am looking for a better solution.
The admins use it daily. Even now, I'm getting all the emails and I'm a little bit bothered from the continued flapping. Every admin is responsible for a different aspect of the monitoring so they get their dedicated reports. For example, one admin is responsible for the ELP servers, so he's getting only the email for the ELP related stuff.
What about the implementation team?
We did not implement through a vendor. I did the implementation myself. It's very easy, you don't need anyone just to install it. For more complicated stuff that I was not sure of, I asked around on forums. I did the entire implementation myself and it's very easy.
What was our ROI?
Any product that is saving you from a network meltdown is really worth more than you pay for it. This product did its job and some are still using it in the organization while we search for a better solution.
What other advice do I have?
I would rate this product a 7 out of 10. This is because it needs better flapping control and better infrastructure monitoring. Other products provide this already. I don't think I have much to say that isn't answered elsewhere. The product's benefit is that it is easy to get up and running.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
I'm surprised that you believe that scalability and performance are an issue. Having said that, much depends on the configuration that is implemented. I'm not suggesting that this is the case with your implementation, but we have often seen customers simply implementing the Zabbix "appliance" (a single VM) as a production instance. A full Zabbix implementation is highly scalable and performant with many hundreds of thousands of "values per second" possible. Of course, as with any monitoring software it does require a degree of administration. For example, if you are using only a single server for Zabbix you may wish to consider splitting the web server, zabbix server and database across multiple VM's. We also implement multiple web server front-end which are load balanced.
It's also possible to tune the number of pollers that are running to further improve performance.
I hope that gives you some ideas.