We look at this solution for both security monitoring and operational monitoring use cases. It helps us to understand any kinds of security incidents, typical-scene use cases, and IT operations, including DevOps and DevSecOps use cases.
Competitor
# Comparisons
Rating
Buyer's Guide
Cloud Monitoring Software
February 2023

Get our free report covering Dynatrace, New Relic, Microsoft, and other competitors of Datadog. Updated: February 2023.
686,748 professionals have used our research since 2012.
Read reviews of Datadog alternatives and competitors
Product Director at a insurance company with 10,001+ employees
Gives us a single, integrated tool to simplify support and reduce downtime
Pros and Cons
- "Those 400 days of hot data mean that people can look for trends and at what happened in the past. And they can not only do so from a security point of view, but even for operational use cases. In the past, our operational norm was to keep live data for only 30 days. Our users were constantly asking us for at least 90 days, and we really couldn't even do that. That's one reason that having 400 days of live data is pretty huge. As our users start to use it and adopt this system, we expect people to be able to do those long-term analytics."
- "One major area for improvement for Devo... is to provide more capabilities around pre-built monitoring. They're working on integrations with different types of systems, but that integration needs to go beyond just onboarding to the platform. It needs to include applications, out-of-the-box, that immediately help people to start monitoring their systems. Such applications would include dashboards and alerts, and then people could customize them for their own needs so that they aren't starting from a blank slate."
What is our primary use case?
How has it helped my organization?
We had multiple teams that were managing multiple products. We had a team that was managing ELK and another team that was managing ArcSight. My team was the "data bus" that was aggregating the onboarding of people, and then sending logs through different channels. We had another team that managed the Kafka part of things. There was a little bit of a loss of ownership because there were so many different teams and players. When an issue happened, we had to figure out where the issue was happening. Was it in ELK? Was it in ArcSight? Was it in Kafka? Was it in syslog? Was it on the source? As a company, we have between 25,000 and 40,000 sources, depending on how you count them, and troubleshooting was a pretty difficult exercise. Having one integrated tool helped us by removing the multiple teams, multiple pieces of equipment, and multiple software solutions from the equation. Devo has helped a lot in simplifying the support model for our users and the sources that are onboarding.
We have certainly had fewer incidents, fewer complaints from our users, and less downtime.
Devo has definitely also saved us time. We have reduced the number of teams involved. Even though we were using open-source and vendor products, the number of teams that are involved in building and maintaining the product has been reduced, and that has saved us time for sure. Leveraging Devo's features is much better than building everything.
What is most valuable?
It provides multi-tenant, cloud-native architecture. Both of those were important aspects for us. A cloud-native solution was not something that was negotiable. We wanted a cloud-native solution. The multi-tenant aspect was not a requirement for us, as long as it allowed us to do things the way we want to do them. We are a global company though, and we need to be able to segregate data by segments, by use cases, and by geographical areas, for data residency and the like.
Usability-wise, Devo is much better than what we had before and is well-positioned compared to the other tools that we looked at. Obviously, it's a new UI for our group and there are some things that, upon implementing it, we found were a little bit less usable than we had thought, but they are working to improve on those things with us.
As for the 400 days of hot data, we have not yet had the system for long enough to take advantage of that. We've only had it in production for a few months. But it's certainly a useful feature to have and we plan to use machine learning, long-term trends, and analytics; all the good features that add to the SIEM functionality. If it weren't for the 400 days of data, we would have had to store that data, and in some cases for even longer than 400 days. As a financial institution, we are usually bound by regulatory requirements. Sometimes it's a year's worth of data. Sometimes it's three years or seven years, depending on the kind of data. So having 400 days of retention of data, out-of-the-box, is huge because there is a cost to retention.
Those 400 days of hot data mean that people can look for trends and at what happened in the past. And they can not only do so from a security point of view, but even for operational use cases. In the past, our operational norm was to keep live data for only 30 days. Our users were constantly asking us for at least 90 days, and we really couldn't even do that. That's one reason that having 400 days of live data is pretty huge. As our users start to use it and adopt this system, we expect people to be able to do those long-term analytics.
What needs improvement?
One major area for improvement for Devo, and people know about it, is to provide more capabilities around pre-built monitoring. They're working on integrations with different types of systems, but that integration needs to go beyond just onboarding to the platform. It needs to include applications, out-of-the-box, that immediately help people to start monitoring their systems. Such applications would include dashboards and alerts, and then people could customize them for their own needs so that they aren't starting from a blank slate. That is definitely on their roadmap. They are working with us, for example, on NetFlow logs and NSG logs, and AKF monitoring.
Those kinds of things are where the meat is because we're not just using this product for regulatory requirements. We really want to use it for operational monitoring. In comparison to some of the competitors, that is an area where Devo is a little bit weak.
For how long have I used the solution?
We chose Devo at the end of 2020 and we finished the implementation in June of this year. Technically, we were using it during the implementation, so it has been about a year.
I don't work with the tool on a daily basis. I'm from the product management and strategy side. I led the selection of the product and I was also the product manager for the previous product that we had.
What do I think about the stability of the solution?
Devo has been fairly stable. We have not had any major issues. There has been some down time or slowness, but nothing that has persisted or caused any incidents. One place that we have a little bit of work to do is in measuring how much data is being sent into the product. There are competing dashboards that keep track of just how much data is being ingested and we need to resolve which we are going to use.
What do I think about the scalability of the solution?
We don't see any issues with scalability. It scales by itself. That is one of the reasons we also wanted to move to another product. We needed scalability and something that was auto-scalable.
How are customer service and support?
Their tech support has been excellent. They've worked with us on most of the issues in a timely fashion and they've been great partners for us. We are one of their biggest customers and they are trying really hard to meet our needs, to work with us, and to help us be successful for our segments and users.
They exceeded our expectations by being extremely hands-on during the implementation. They came in with an "all hands on deck" kind of approach. They worked through pretty much every problem we had and, going forward, we expect similar service from them.
Which solution did I use previously and why did I switch?
We were looking to replace our previous solution. We were using ArcSight as our SIEM and ELK for our operational monitoring. We needed something more modern and that could fulfill the roadmap we have. We were also very interested in all the machine learning and AI-type use cases, as forward-facing capabilities to implement. In our assessment of possible products, we were impressed by the features of AI/ML and because the data is available for almost a year. With Devo, we integrated both operational and SIEM functions into one tool.
It took us a long time to build and deploy some of the features we needed in the previous framework that we had. Also, having different tools was leading to data duplication in two different platforms, because sometimes the security data is operational data and vice versa. The new features that we needed were not available in the SIEM and they didn't have a proper plan to get us there. The roadmap that ArcSight had was not consistent with where we wanted to go.
How was the initial setup?
It was a complex setup, not because the system itself is complex but because we already had a system in place. We had already onboarded between 15,000 and 20,000 servers, systems, and applications. Our requirement was to not touch any of our onboarding. Our syslog was the way that they were going to ingest and that made it a little bit easier. And that was also one of our requirements because we always want to stay vendor-agnostic. That way, if we ever need to change to another system, we're not going to have to touch every server and change agents. "No vendor tie-in" is an architectural principle that we work with.
We were able to move everything within six months, which is absolutely amazing. That might be a record. Not only Devo was impressed at how efficiently we did it, but so were people in our company.
We had a very strong team on our end doing this. We went about it very clinically, determining what would be in scope and what would not be in scope for the first implementation. After that, we would continue to tie up any loose ends. We were able to meet all of our deadlines and pivot into Devo. At this point, Devo is the only tool we're using.
We have a syslog team that is the log aggregator and an onboarding team that was involved in onboarding the solution. The syslog team does things like the opening of ports and metrics of things like uptime. We also have four engineers on the security side who are helping to unleash use cases and monitor security. There's also a whole SOC team that does incident management and finding of breaches. And we have three people who are responsible for the operational reliability of Devo. Because it's a SaaS product, we're not the ones running the system. We're just making sure that, if something goes wrong, we have people who are trained and people who can troubleshoot.
We had an implementation project manager who helped track all of the implementation milestones. Our strategy was to set out an architecture to keep all the upstream components intact, with some very minor disruptions. We knew, with respect to some sources, that legacy had been onboarded in certain ways that were not efficient or useful. We put some of those pieces into the scope during the implementation so that we would segregate sources in ways that would allow better monitoring and better assessment, rather than mixing up sources. But our overall vision for the implementation was to keep all of that upstream architecture in place, and to have the least amount of disruption and need for touching agents on existing systems that had already been onboarded. Whatever was onboarded was just pointed at Devo from syslog. We did not use their relays. Instead, we used our syslog as the relays.
What's my experience with pricing, setup cost, and licensing?
Devo was very cost-competitive. We understood that the cost came without the monitoring of content, right out-of-the-box, but we knew they were pointed in that direction.
Devo's pricing model, only charging for ingestion, is how most products are licensed. That wasn't different from other products that we were looking at. But Devo did come with that 400 days of hot data, and that was not the case with other products. While that aspect was not a requirement for us, it was a nice-to-have.
Which other solutions did I evaluate?
We started off with about 10 possibilities and brought it down to three. Devo was one of the three, of course, but I prefer not to mention the names of the others.
But among those we started off with were Elastic, ArcSight, Datadog, Sumo, Splunk, Microsoft systems and solutions, and even some of the Google products. One of our requirements was to have an integrated SIEM and operational monitoring system.
We assessed the solutions at many different levels. We looked at adherence to our upstream architecture for minimal disruption during the onboarding of our existing logs. We wanted minimal changes in our agents. We also assessed various use cases for security monitoring and operational monitoring. During the PoC we assessed their customer support teams. We also looked at things like long-term storage and machine learning. In some of these areas other products were a little bit better, but overall, we felt that in most of these areas Devo was very good. Their customer interface was very nice and our experience with them at the proof-of-value [PoV] level was very strong.
We also felt that the price point was good. Given that Devo was a newer product in the market, we felt that they would work with us on implementing it and helping us meet our roadmap. All three products that we looked for PoV had good products. This space is fairly mature. They weren't different in major ways, but the price was definitely one of the things that we looked at.
In terms of the threat-hunting and incident response, Devo was definitely on par. I am not a security analyst and I relied on our SIEM engineers to analyze that aspect.
What other advice do I have?
Get your requirements squared and know what you're really looking for and what your mandatory requirements are versus what is optional. Do a proof of value. That was very important for us. Also, don't only look at what your needs are today. Long-term analytics, for example, was not necessarily something we were doing, but we knew that we would want to do that in the coming years. Keep all of those forward-looking use cases in mind as well when you select your product.
Devo provides high-speed search capabilities and real-time analytics, although those are areas where a little performance improvement is needed. For the most part it does well, and they're still optimizing it. In addition, we've just implemented our systems, so there could be some optimizations that need to be done on our end, in the way our data is flowing and in the way we are onboarding sources. I don't think we know where the choke points are, but it could be a little bit faster than we're seeing right now.
In terms of network visibility, we are still onboarding network logs and building network monitoring content. We do hope that, with Devo, we will be able to retire some of our network monitoring tools and consolidate them. The jury is still out on whether that has really happened or not. But we are working actively towards that goal.
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Chief Manager at a marketing services firm with 501-1,000 employees
With the help of the solution, we can predict and prevent failures
Pros and Cons
- "One of the most valuable features of ITRS Geneos is the active time feature that helps with the trading applications that I support."
- "I would like ITRS Geneos to develop an app, where instead of going to specific login terminals or logging into laptops or desktops to check alerts, we can have visibility in the app itself."
What is our primary use case?
We are using ITRS Geneos for availability and performance monitoring. With respect to availability, there have been no observations of any business functionalities being impacted. All capacity parameters are okay. We are also using the solution for the performance part, where all the latency, the number of messages, the rates, and the number of external client logins are monitored. Lastly, we use the solution to monitor all the hardware capacity parameters such as CPU, disk, and memory usage for all applications.
We use the application logs, as well as OEM logs. We monitor the processing rates, the number of messages processed, and the number of external login clients. We also monitor whether any exception handling or code has been created. Because our application does not involve a database, we are not monitoring database activity. We monitor inside the applications, to see whether the server is on primary mode or has gone to secondary mode, whether the failure has happened or not, as well as the configuration part that all the application configs are intact, there are no changes in the application configs, that we monitor. We also monitor the applications to make sure the connectivity across the various modules and various servers are there, which are interconnected either through PCP or our protocol. We are also monitoring the downstream systems and upstream systems.
How has it helped my organization?
Our previous solution alerted us through an SMS or email. The alert would go to one person whose job is to monitor and if they were busy with another activity we would be delayed in responding to the situation. With ITRS Geneos, we can see everything on the dashboard. We can see the relationship between the two email alerts. We can also see the whole picture of where the issue is, allowing us to take action quickly. We can detect the actual connections between the two alerts. Suppose my alert comes from one server, and another alert comes from the client, we know there is a potential issue between the two servers and client connectivity. An email or SMS alert will never give us the full picture. So, at that time, I had to go into the system and focus on what the issue actually was. That was quite difficult. But now, with the new system, it's more proactive. We have started to monitor the system more closely because many of the alerts that we didn't have before are now enabled. For example, we have thresholds that warn us when we need to reduce space.
We first started to use ITRS Geneos in our department in 2014. After that, the solution was implemented in other departments, and the downstream systems inherited the monitoring. Trading is just one part of the organization where monitoring is now taking place; other systems, such as risk surveillance and collateral management, have also been upgraded. As an organization, we need to have a holistic view. This means that if I am a HOD, I need to have a holistic view of not only trading but also downstream systems. Previously, the senior manager did not have any view of what the issue was, where it was, and what the impact was. This is because there are many downstream systems that can be impacted. Now, with the holistic view in place, the senior manager is able to see where the issue is and where the impact is.
The solution provides lightweight data collection. We have recently had a business dashboard as well. Along with order messages and client logins, we are also monitoring capacity reports, view logs, and business reports. We have captured all of this information and put it in our database. We can see a historical view of user logins, latency, and whether or not it has improved.
With the help of the solution, we can predict and prevent failures. Many times, one incident can lead to another event. For example, if we have three modules and there is a buildup in module C, and there are observations and issues, ITRS Geneos can help monitor the buildup. If one of our people is trying to resolve the issues in module C, the monitoring person can also monitor the back pressure going to module A, then I will start checking the impact for modules A and C. Since we have the complete end-to-end connectivity part, with ITRS Geneos, I can make sure that the issue on the one server or the one module can be prevented on the other modules as well. We have prevented multiple P1, and P2 issues using ITRS Geneos.
We monitor the number of incidents proactively every month. This number is based on the number of incidents that we receive from ITRS Geneos. Roughly, 20 to 30 percent of all incidents are proactively detected and avoided using the solution. Whenever any incident comes up, the first question we ask is why wasn't it caught in the monitoring. In the last eight years, we have actually progressed so much with the solution, we configured multiple rules and multiple samplers, using ITRS Geneos because of each and every incident we learn from.
What is most valuable?
One of the most valuable features of ITRS Geneos is the active time feature that helps with the trading applications that I support. The HP OpenView solution we previously used worked on a 24-hour seven days week cycle but the market hours are 9:00 AM to 3:30 PM.
ITRS Geneos has good GUIs that provide us with a single view of all the various applications. We have around 500 servers, and monitoring 500 servers on individual devices is a bit challenging. Putting the various applications in a single GUI is very difficult. In ITRS Geneos, all the GUI and user-friendly dashboards are there. All the information is available at a glance, we have five segments in case any one segment, or any one problem occurs, we can highlight it, and we follow the steps. We then access the active console which is a step-by-step detailed view to resolve the issue.
What needs improvement?
Currently, the most valuable thing for an individual is a mobile device. Since that is where people are currently tracking everything, we have multiple applications or apps that are for various products. I would like ITRS Geneos to develop an app, where instead of going to specific login terminals or logging into laptops or desktops to check alerts, we can have visibility in the app itself. Using the ITRS Geneos app, we could see the error message during our travels or wherever we are.
I would like to see the capacity of messages for forecasting increased. Since the NSE is the number one derivative stock exchange in the work for three consecutive years, the number of messages is important. We use the capacity planner in ITRS to forecast our data needs for the next two months. The planner is important because the volume of data we produce is becoming more and more volatile compared to when we first started using ITRS Geneos in 2014.
For how long have I used the solution?
I have been using the solution for eight years.
What do I think about the stability of the solution?
ITRS Geneos is a stable product and we have not had any downtime in the past eight years. The stability portion is good, but we do need to configure new applications correctly for the solution to stay stable.
What do I think about the scalability of the solution?
The solution is scalable. We are the number one derivative stock exchange, and our number of servers has increased significantly in recent years. In 2014, the number of servers was 50, but now it is 500, ten times as many. Our gateway servers were also limited to around five, but now there are 15. This increase in scalability has allowed our maintenance team to deploy more gateway servers and improve our monitoring gateways. We are also looking into redundancy and unused data to further improve performance.
We currently have 120 people using the solution in our department and around 400 from all of the departments, including vendors.
How are customer service and support?
The technical support for ITRS is currently very good. The person who developed or implemented the ITRS dashboard previously provided technical support for us, so he has a good understanding of the system. Additionally, many of the deadlines for ITRS are set by top management. ITRS is seen as a high priority, so there is good response time from the ITRS-managed service currently.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
I work in a trading production support role. Eight years ago, we made the switch from using HP OpenView for monitoring purposes to using ITRS Geneos. This was because there was no GUI available with HP OpenView.
We have never considered any lower-cost or open-source alternatives because the team is comfortable with the current system. The L1 monitoring team is the team that checks the dashboards and provides alerts. They are so familiar with the system that we have never thought of getting away from it.
How was the initial setup?
There were a few team members involved with the deployment. I was there with respect to providing any details to ITRS Geneos or the signup portion of the solution, the testing of the ITRS, such as whether alerts are showing properly or not. At that time, I was involved in all these areas of deployment because we have various segments in our module. We started with only a specific segment. Initially, we focused on the GUI design, how it should look and what rules should be configured, and where it should come from. All those nitty-gritty details of importance are required for the ITRS dashboard. I was not responsible for the creation of the samplers, the installation of Netprobes, or the entering of rules. There was a separate team responsible for those tasks.
Developers faced a challenge when there was no database, but they soon found ways to overcome this obstacle. They had to monitor the application, logs, and other aspects of the system. After four to six months, they started to figure out how to monitor specific aspects of the system if there were multiple items that needed to be monitored. We need to consider three things: the first is whether ITRS will have a file or tooling mechanism; the second is whether there will be any impact on production; and the third is whether ITRS will have a built-in script. We have considered these three aspects of ITRS in order to go deeper into the application so Netprobes can act as the agents on the production machine to eliminate any impact. After all the items were gathered, the most important step was putting everything into one active console. This is where I accessed the files and did all of the reading. The fifth month was when we had all the GUIs created and the sixth month was when we started the testing part, making sure all the alerts are configured and displaying correctly. We did all the testing at the end of the day because it was in the production environment. There was some hard work that was put in, and the initial two months were difficult because we didn't know how to do it, or what needed to be planned. We completed deployment for all five segments we support within one year. In the first six months, we completed deployment for only one segment. Subsequently, for the next four months, we deployed with another two segments, and subsequently, in the last two months the entire deployment was complete.
What about the implementation team?
The implementation was completed in-house.
What other advice do I have?
I give the solution a nine out of ten.
We had only two people doing the setup, the manager and one other person. I was mostly a tester, testing the applications and making sure everything was in order for the solution. There were four of us in total during the deployment process. There are currently five to six people on a separate tooling team that handles ITRS Geneos. I am no longer part of the ITRS team.
Previously, ITRS Geneos was only implemented in trading operations from 2014 to 2015. In 2016, we started to use it in other departments as well. By 2017, all the departments were using the ITRS Geneos. There are five to six departments where the solution is used. In fact, we have created an exchange one view as well as a dashboard where we can look at six to seven departments as a single view. In case any specific department has any issues, the warning critical alert will come from that view.
The solution requires maintenance. There is a separate tooling team that takes care of ITRS Geneos. There are five people who are looking into all the maintenance, patching, and all areas of the gateway servers for the solution. Additionally, there are two more activities. The first activity is with respect to the uptime availability part related to gateway servers. The second activity is with respect to releases. If there are any application changes, whether the change has been done in the solution or not, they will also be taken care of. We are also evaluating new dashboards and features, such as our use of ICT and forecasting. There is a separate team who looks into all of this maintenance and development. Plus there is an onsite team in Manila.
For any mission-critical projects, I recommend ITRS Geneos because time is crucial. Everything needs to be resolved within five minutes, and the SLA is strict. To resolve incidents within a five-minute window, we need to monitor and escalate within 30 seconds. The team should focus on monitoring and recovery within the first 30 seconds.
Which deployment model are you using for this solution?
On-premises
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Last updated: Jan 11, 2023
Flag as inappropriateNetwork Engineer at a mining and metals company with 501-1,000 employees
Sped up my resolution time because we can drill down and look at the net flow information faster
Pros and Cons
- "It's all intuitive and straightforward. The out-of-the-box alerts provided everything I needed, but I've made a couple of additional alerts. You can schedule maintenance windows in Auvik, and the solution won't send any alerts during that time."
- "The mapping automatically finds all the interfaces but tags some of them incorrectly. For instance, if it can't find how a CPU interface is connected, it will use the MAC address last seen on the router and sometimes attribute cloud-connected devices to the route, but it's not actually there. That's not a true connection."
What is our primary use case?
We use Auvik for monitoring networks across all of our sites for alerts, reporting, configuration backups, and troubleshooting. Auvik does a little bit of everything when it comes to networking.
I'm not the only person that utilizes Auvik, but I'm the only network engineer. The infrastructure team uses it for server monitoring. Security guys can also access it, but I'm the primary caretaker.
I monitor 34 sites with 200 managed devices, and about another hundred are unmanaged. Altogether, I have over 2,600 devices that are not networked. If you subtract the network from that, it's about 2,300 devices that aren't network devices, including printers, servers, and computers. Auvik crawls and finds those kinds of things on the network. That's what I mean by total picture.
How has it helped my organization?
I previously used SolarWinds, which I call a Swiss Army knife of network monitoring systems. SolarWinds is great. It does many things, but it's gotten too bloated and slow. It's not as intuitive as Auvik. SolarWinds didn't do mapping on its own, and the mapping provided was kind of clunky to get running because you have to manage the licensing and everything. Even after tweaking SolarWinds, I couldn't get the mapping capabilities Auvik gives me.
Also, SolarWinds wasn't a one-stop shop. Auvik is the closest I've gotten to a single pane of glass. It's hard to judge whether Auvik has saved time over SolarWinds after two months because I'm still doing some slight tweaks. It took me months to get SolarWinds the way we need it here. Auvik is still a pretty new product for us. Though it's meeting our basic needs, I'm the kind of guy who likes to squeeze every bit of juice out of my fruit.
The out-of-the-box alerts were pretty on-point, so I've only had to create two alerts on my own. The reporting is easy to access, so pulling reports is more straightforward. That saves time.
Also, I don't need to add devices to Auvik. It automatically crawls, finds them, and puts them in the inventory. I don't have to go back and draw maps. Auvik does that. Mapping in SolarWinds requires their map tool, a separate product you must install on the server itself. Drawing maps on that was painful. Discovery isn't something I need to do anymore. When I added five new devices to a site, it found them all and brought them into inventory. I didn't have to do that.
Auvik automatically keeps the device inventories updated. I'm shutting down SolarWinds this week. On Friday, I did my final inventory comparing SolarWinds and Auvik. I have not been updating SolarWinds, and Auvik has about 20 more devices on the network side alone because I don't have to go back through and update the inventory. It'll pull it in itself. When something is added, I get an alert saying the new device has been added to the network.
Auvik has sped up my resolution time because you can drill down in Auvik and look at the net flow information faster. The alerts also help, but if this is a data-driven event, I need to look at the net flow, which is much quicker.
What is most valuable?
The monitoring and alerts are easy to use and set up. Discovery is the first step in monitoring, and that's a piece of cake with Auvik. It'll scan your networks once you get the credentials set up and automatically find newly added equipment as long as the same credentials are already on that gear. Auvik makes my job a lot easier. I don't have to keep going back to a monitoring system to add devices each time we bring something new. That part alone saves me time.
It's all intuitive and straightforward. The out-of-the-box alerts provided everything I needed, but I've made a couple of additional alerts. You can schedule maintenance windows in Auvik, and the solution won't send any alerts during that time. With other products, you have to turn off the alerts on each device if you don't set it up correctly. Ease of use is crucial because I'm the only network engineer at a company of 900, so I have many things to do.
I have a single pane of glass. It's easier to go into one system where everything is easy to find. It's a one-stop-shop with everything you need instead of going into multiple products to get it done. I don't consider Auvik entirely cloud-based because you have collectors onsite. The portal for viewing your infrastructure is cloud-based. You don't need to get into a VPN or anything like that to get to it. It's two-factor authentication, so it's a little harder for bad actors to get to your data.
The ability to log in and run commands from the cloud is helpful. You can access a full command line on the device, so I don't need to VPN into the infrastructure, which helps when troubleshooting. It's also beneficial that it's not on-prem. If my leading site, where the on-prem solution is located, goes down, no place is being monitored. As long as the internet connection is up and the collector is running, all my sites are being monitored.
What needs improvement?
The mapping automatically finds all the interfaces but tags some of them incorrectly. For instance, if it can't find how a CPU interface is connected, it will use the MAC address last seen on the router and sometimes attribute cloud-connected devices to the route, but it's not actually there. That's not a true connection.
It isn't going to the cloud. It's going directly back to the router. I've talked to Auvik support about that already. They're looking into it. Overall, mapping could be a little better. Though they do a great job, there's still room for improvement. It's 100% accurate for some sites but only 90% for others. It gives you a complete view of how things are connected for the most part. Auvik still struggles with wireless bridges and things of that nature. However, Auvik isn't the only product missing that, and there is a simple way to make those connections myself.
For how long have I used the solution?
I did a couple of trials with Auvik, but we've officially been using the solution for about three months now.
What do I think about the stability of the solution?
I'm through my testing phase, and now that I'm in my third month using Auvik, I can say it's pretty stable. I had one issue with Syslog, but they fixed it. They made a change that caused an unforeseen issue in Syslog. They resolved the problem in the next release.
What do I think about the scalability of the solution?
Auvik's scalability is pretty good. I'm monitoring 30-plus sites. I was running 30 of them off one collector, so the scalability is pretty good.
How are customer service and support?
I rate Auvik support nine out of 10.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
SolarWinds does many of the same things, but Auvik has a different approach. If we have some networking event, we can find the problem machine in Auvik and see what it has been talking to a lot faster.
Auvik is a little more agile. We can find things a little faster with Auvik than in SolarWinds. We don't need to dig as much. The graphical nature of the product makes it easier to navigate.
How was the initial setup?
Deploying Auvik is very straightforward. I implemented it pretty much out of the box. When I had my customer success meetings with Auvik support, I had already done everything they told me to do. I'm experienced in setting up things, so I had it up and running by the time we met to review our technical onboarding.
I can onboard a small site in 10 minutes. Once you input credentials at the top level, it's only a matter of putting in subnets that you want scanned, waiting for them to be scanned, and verifying everything is there. It's about 10 minutes per site once your credentials are squared away.
Once that is ready, it takes Auvik an hour or so per site to stitch everything together. Much of it is on the backend because it makes all those maps and everything like that, which takes time. It has to pull in the data from SNMP and CDP. It looks at all the interfaces and stitches together maps, so it depends on how many collectors you have. It takes longer if you're running a couple of collectors for an entire enterprise because a few collectors are doing a lot of work.
It's much faster if you have a collector at every site. It's probably 15 to 20 minutes per site. I only used one collector when I started because I wanted to see how hard I could push it. It took much less time to set up than SolarWinds. The discovery is pretty simple for what you have to do from my end. As long as you have your credentials at the top level, you add a new site, throw in your subnets, and it finds them for you.
Auvik doesn't require any maintenance after deployment. I wanted to stress test the collector to see what might break it. I had 30 sites on one collector at one time, but I decided to go back to the suggested implementation.
With a single collector on 30-plus sites, the daily tasks were completed, and we weren't close to using up the CPU or memory on this device—this wasn't a beefy server. It was built to their specs but not overly powerful. Once your collector runs, you don't need to do much with this product because the brains are in the cloud. If your collector goes down, bringing up a new one is a piece of cake.
What was our ROI?
It's apparent off the bat how much time I'm saving by doing tasks because of the ease of use. Once I got everything discovered, it was evident that I would save time by automatically drawing maps and keeping them updated. I immediately noticed that I would save time, and time is money. I always have several projects and no longer worry about my inventory because Auvik does this for me.
Once the devices are configured, and the collectors are installed, I don't need to add anything to the monitoring system or make sure the backups are there. Auvik grabs it for me.
What's my experience with pricing, setup cost, and licensing?
It's worth the price, depending on how you use the product. Price is a significant component of any purchase; for me, it all goes back to visibility. I have more visibility into everything now than I had before. SolarWinds was on every node, and every interface had to be licensed. With Auvik, the cost could be the same or more depending on the level of visibility you want. The price and value vary according to your network infrastructure and the information you want.
If you want a complete picture of your entire network, then Auvik is a better choice. SolarWinds is a better option if you're only looking at network devices. I think Auvik's price per node is a tad high. That's probably my only knock against Auvik. Your network nodes are billable, including servers, printers, or other devices. You have visibility into those things as well. In other products, each one of those devices is a billable node, so Auvik gives us a little bit more visibility than we had before because now we have more devices in the system.
Which other solutions did I evaluate?
We looked at Entuity and Datadog.
What other advice do I have?
I rate Auvik nine out of 10. I deduct a point for the mapping and reporting. I like everything else that Auvik does. The only aspect I don't like 100% is the mapping. Also, they have canned reports instead of a built-in report builder. You have to extract the data in Power BI or some other way. They have great pieces, but I can't customize them and create my own within their system.
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Last updated: Dec 29, 2022
Flag as inappropriateHead of Data Architect at LendingTree
Instantaneous response when monitoring logs and KPIs
Pros and Cons
- "CloudWatch immediately hooks up and connects to the KPIs and all the metrics."
- "It would be beneficial for CloudWatch to provide an API interface and some kind of custom configuration."
What is our primary use case?
We use the solution to monitor our AWS resources. We used Azure extensively but a couple of years back we moved to use both Azure and AWS. Currently, we have three main use cases.
Our predominant use case is monitoring our S3 which includes terabytes of data. We monitor all the buckets and containers plus who has access to them, the thresholds, and user data. We constantly watch all the KPIs and CloudWatch metrics.
Our second use case is watching logs and processes for other products such as AWS tools, AWS Glue, and Redshift which includes a few terabytes of data.
Our third use case is minor and with Athena.
Our fourth use case is new. We just started using SageMaker for a small POC and want to complete all of our data modeling and logs.
In the future, we will be using the solution with Airflow, which will become one of our biggest use cases.
CloudWatch works very well with any of the AWS resources so we always monitor through it.
How has it helped my organization?
Our business flow has improved because we monitor email thresholds and immediately get an alert from CloudWatch if use goes beyond thresholds. Without this alert, we would have to use external monitoring.
What is most valuable?
It is valuable that CloudWatch collects all the metrics. I primarily like the RUM. There is an instantaneous response when monitoring logs and KPIs. CloudWatch immediately hooks up and connects to the KPIs and all the metrics.
What needs improvement?
Even though the product works well with most AWS, it is a nightmare to use with Snowflake. Snowflake is a SaaS product hosted on AWS, but using it with CloudWatch still doesn't give us the support we need so we rely on separate monitoring.
We have many databases such as MongoDB and SQL Server, RDS, and PostgreSQL. For these, CloudWatch is good but a little basic and additional monitoring tools are required. It's challenging to use one monitoring tool for S3 and another monitoring tool for Snowflake.
It would be beneficial for CloudWatch to provide an API interface and some kind of custom configuration because everybody uses APIs now. Suppose Snowflake says we'd get all the same things with MongoDB such as APIs, hookups, or even monitoring. That would allow us to build our own custom solution because that is the biggest limitation of CloudWatch. If you go a bit beyond AWS products even if they're hosted on AWS, CloudWatch doesn't work very well.
I'd also like an improved UI because it hasn't significantly improved in a few years and we want to see it at a more granular level. I get my KPIs in a bucket usage for yesterday but I'd like to see them by a particular date and week. We have three buckets rolled by hundreds of people and I want to see use cases for an individual to determine where I need to customize and provide more room. I want aggregation on multiples, not one terameter.
For how long have I used the solution?
I have been using the solution for two years.
What do I think about the stability of the solution?
The solution is very stable with absolutely no issues. We used to see a delay when we were setting up three buckets but now we receive instantaneous notifications.
What do I think about the scalability of the solution?
The solution is definitely scalable. Most of our development environment uses it and we are running three teams of 150-200 people. Usage levels are different between developers and the support team so the total users at one time is 100-150.
The solution is managed by our internal AWS maintenance team. Seven people manage our cloud environment and seven manage our platform side for not just CloudWatch, but everything on AWS.
We still need to find a solution for Snowflake and Tableau environments unless CloudWatch provides better support in the future.
How are customer service and support?
The support staff are seasoned professionals and are good. Amazon provides the benchmark for support and nothing else compares.
Which solution did I use previously and why did I switch?
On-premises, we have used other solutions like Sumo Logic, Azure Logic Apps and others. Not everyone uses AWS so we have a lot of tools we use.
Previously we used some main external app logic but it didn't work well with AWS tools. I would have to figure it out and configure Aurora to do something or find a way to do S3 buckets. Those solutions worked well for on-premises, but not with AWS and clouds.
How was the initial setup?
The setup for this solution is pretty simple and anyone can do it if they are on AWS. Setting up all our VPC and private links connecting to our gateways took some time, but CloudWatch setup was a no-brainer and took a couple of days.
What about the implementation team?
Our implementation was done in conjunction with a third party. We like to bring in a few engineers to work with our engineers and then we partner with a third party like Slalom to help with integration. Our process is a mix of all three with AWS staff helping for a couple of weeks and Slalom for a couple of months. Our team slowly takes over management.
What was our ROI?
We plan to increase our usage because we don't have another monitoring tool right now. With the Airflow orchestration, our CloudWatch use will significantly increase as we monitor all of our RUM, notifications, jobs, and runs. Our runs and billings will increase 20-30% once we start using Airflow.
Because CloudWatch doesn't support all externally hosted products, I rate it a nine out of ten for ROI.
What's my experience with pricing, setup cost, and licensing?
I don't know specifics about pricing because we pay for all our AWS services in a monthly bundle and that includes CloudWatch, Redshift, VPCs, EC2s, S3s, A39s, and others. We spend about $5 million per year on AWS with CloudWatch being about 5% of that cost.
Which other solutions did I evaluate?
I did not evaluate other solutions. Once we moved to AWS, we looked for a tool that was native to that cloud. That is the process we are currently undertaking for Snowflake and Tableau because CloudWatch doesn't support them well. We do try to use CloudWatch as much as possible.
What other advice do I have?
The solution is pretty good because it automatically comes and works well with AWS. Before you use any product from AWS, think about whether it is supported or how it will interface. I suggest using the solution with one product at a time and then transitioning to important interfaces.
If you find you can't configure the solution with Redshift for example, and are struggling to build your S3 even though both use S3, then you may have to find another monitoring solution. It makes sense to follow Amazon's best practices. They advise not to use certain monitoring components alone but to use them as an integral part of your system. Monitor your ecosystem and think of a high-level picture of it rather than just determining that CloudWatch must be a part of Redshift. This solution is just one part of an entire system.
I would rate the solution a nine out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Aug 18, 2022
Flag as inappropriateAssociate Principal - Cloud Solutions at Apexon
Provides a one-stop place to look at what's happening across all the resources, however visualization tools are lacking
Pros and Cons
- "Recently, they have improved their integration with other resources, so we get even more robust data."
- "The length of latency is terrible and needs to be improved."
What is our primary use case?
We use Azure Monitor to log all the activity logs and the resource logs to something similar to AWS CloudWatch. It is enabled for most of the services as it goes hand in hand with application insights.
Alerts are set up using robust metrics that we are able to retrieve from Azure Monitor, allowing us to automate and look at different rules and action groups.
Our component configuration keeps changing. Because of this, we need to put alerts on the components to figure out who changed them and what did they changed in them.
We have been using Azure Monitor quite regularly, both for internal usage and for our customers. Customers will have to use Log Analytics in combination with Azure Monitor.
What is most valuable?
Azure Monitor is a one-stop place to look at what's happening across all the resources. It provides a bird's eye view with histograms and gauges that we can build within IT.
The alerting feature is also very valuable.
Azure introduces new services almost every year. Recently, they have improved their integration with other resources, so we get even more robust data.
What needs improvement?
Unfortunately, Azure Monitor stalls quite a bit. Azure Monitor can take up to 60 seconds to bring up metrics data. That length of latency is terrible and needs to be improved. The ripple effect of one wrong configuration affects multiple resources within milliseconds. Azure Monitor then reports after more than a minute that something went wrong. To improve this, Azure should create a visual representation of what the resource configuration was and compare it to what changed.
Alerts are queries to figure out what has happened. If there was a reliable infrastructure diagram available, it could tell me where the configuration changed. Azure gives you so many logs, to understand where the change happens you have to review thousands of rows of logs.
In the cloud, there are too many resources, so you end up trying to find the needle in the haystack to determine what is actually happening.
In future releases, I would like to see Azure Monitor improve its diagram capabilities. Azure, in the last few years, has started to provide some basic diagramming where you can visualize from an Azure point of view, what is happening at the Kubernetes cluster and how the various resources are related to each other, we still need to use a lot of third-party tools.
Imagine if an Excel sheet was thrown to you with a few thousand rows, and you were asked to determine what happened, within a minute or two, before a disaster strikes. A visualization tool is required to know what the previous configuration was as compared to the current configuration.
The solution is also reactionary and not proactive or intuitive. Azure Monitor should be able to alert you that certain changes will cause certain outcomes before making the change using futuristic infrastructure diagrams.
Lastly, I would like Azure Monitor to provide a separate portal for large operations teams, as there currently is no solution for them.
For how long have I used the solution?
I started using Azure Monitor regularly in 2015.
What do I think about the stability of the solution?
The stability of Azure Monitor is good. It has not been a problem. The solution does not require maintenance. When we adopt new services, we need to configure things as part of a checklist of items. This is a minor step.
I would rate the stability a five out of five.
What do I think about the scalability of the solution?
Being a SaaS service, Azure Monitor is scalable.
How are customer service and support?
Customer service and support for Azure Monitor is good. I rate it a five out of five for technical support.
How would you rate customer service and support?
Positive
How was the initial setup?
The initial setup of Azure Monitor is easy. The deployment took a day or two because it is available by default. It is really an out-of-the-box solution.
What about the implementation team?
Our DevOps side handled the implementation of Azure Monitor themselves.
The implementation strategy was and continues to be that whatever resources we want to monitor through Azure Monitor, we enable them.
What was our ROI?
The ROI for Azure Monitor is poor. I would rate it a two out of five.
What's my experience with pricing, setup cost, and licensing?
Because we have to use Log Analytics, in combination with Azure Monitor, it is expensive. It is expensive because the logs are getting generated for store requests across all the Azure resources. This is all that needs to be stored, and both in terms of hot and cold storage. Cold storage after 30 days, but hot storage is required for the NOC and the SOC teams, the network operations, and the security operations teams.
Typically, we do try to encourage our customers to keep at least 30 days within Azure Monitor.
I would rate Azure Monitor a two out of five for affordability.
Which other solutions did I evaluate?
We evaluate options through our customers' requests. We have found that Azure Monitor actually monitors every resource better than New Relic, Datadog, or Splunk.
Splunk is very good for on-premise servers. However, internally, we do not hold logs for more than 30 days, so Azure Monitor works for us.
Azure Monitor has a lengthy latency period for dashboard alerts. Sometimes we get data in New Relic and Datadog faster than with Azure Monitor.
What other advice do I have?
Anyone considering implementing Azure Monitor into their organization should consider the length of retention time required for their logs and applications. If it is beyond 30 days, Azure Monitor becomes expensive.
Overall, I would rate Azure Monitor a seven out of ten. The features included in the solution are good, however, they lack development. They are allowing their partners to come up with good offerings, but not developing the core products themselves.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: My company has a business relationship with this vendor other than being a customer: Gold Partners
Last updated: Dec 12, 2022
Flag as inappropriateBuyer's Guide
Cloud Monitoring Software
February 2023

Get our free report covering Dynatrace, New Relic, Microsoft, and other competitors of Datadog. Updated: February 2023.
686,748 professionals have used our research since 2012.