Coming October 25: PeerSpot Awards will be announced! Learn more
Buyer's Guide
IT Alerting and Incident Management
September 2022
Get our free report covering PagerDuty, VictorOps, Everbridge, and other competitors of Opsgenie. Updated: September 2022.
632,539 professionals have used our research since 2012.

Read reviews of Opsgenie alternatives and competitors

Darrin Khan - PeerSpot reviewer
Compliance, Security & Testing Manager at a financial services firm with 11-50 employees
Real User
Top 5Leaderboard
Reduces white noise, which has reduced engineer fatigue
Pros and Cons
  • "It reduces the amount of white noise. If something comes through, then it will alert somebody. However, if it's a bit of white noise that comes through at night, then it gets dealt with the next day. Everything is visible to everybody. It's not just a single person getting an SMS, then going, "Oh, I'm not going to worry about that." The visibility to everybody on the team is one of the great things about it because it reduces the white noise."
  • "Because of the way you have to structure the rosters, if an engineer has to go on leave (or something), you can't just go in and reassign/take this person out of all of the different rosters that they're in. You have to go into each of the rosters and take them out. There might be a roster for business hours, after hours rotation, and monitoring deployments. Each time we need to take an engineer out of the pool, e.g., if they're sick or on leave, then we have to go and touch all of those rosters, updating and replacing them. Whereas, if we could just take the person out and have it automatically fill in the rostering, then that would make life a lot easier for managing it."

What is our primary use case?

We are a 24-hour online business. We use it for scheduling our on-call engineers and making sure that there is follow-the-sun or round-the-clock coverage for alerting and network operations.

It ingests all our alert paths, i.e., anything that generates an alert of any description, such as, Splunk, AWS, and internal applications. We feed all our events into it, then it generates alerts which need a response from an engineer with a description. Another thing is it is built-in scheduling is pretty much hands-off for our on-call engineers unless somebody goes on holidays. That is the only time that we have to jump in there and make any changes.

How has it helped my organization?

One of the things with our incident flow is that it generates Jira tickets for us. So, its JIRA integration is a critical thing because we need to have that logged for compliance in a separate ticketing system. Having it go into Jira is great, where we can generate hard copies of the alerts and all the events around it. Also, it has the visibility to be able to update one particular location, so you can update Jira and that information goes across to PagerDuty, or you can update PagerDuty and it goes back to Jira. The integration that they have now is great. For example, if you are in the middle of a major event, where you have multiple incidents coming at you, the way it correlates events into a single incident is great.

It reduces the amount of white noise. If something comes through, then it will alert somebody. However, if it's a bit of white noise that comes through at night, then it gets dealt with the next day. Everything is visible to everybody. It's not just a single person getting an SMS, then going, "Oh, I'm not going to worry about that." The visibility to everybody on the team is one of the great things about it because it reduces the white noise.

What is most valuable?

The scheduling feature is the main valuable one for us, because it was previously costing us time. For example, when I was doing the scheduling for the rosters, I would be spending maybe a day out of a month getting the rosters all sorted out. It was rather intense, and a fair chunk of time each month was dedicated to the schedules.

The flexibility in what we can send to it: emails, custom webhooks, and things like that.

We have our production and development environments. If an alert goes offline in the development environment, it's generally treated as a low priority. However, if anything goes down or alerts us from the production environment as a critical or high priority, then an engineer has to stop and fix it straightaway. 

What needs improvement?

Because of the way you have to structure the rosters, if an engineer has to go on leave (or something), you can't just go in and reassign/take this person out of all of the different rosters that they're in. You have to go into each of the rosters and take them out. There might be a roster for business hours, after hours rotation, and monitoring deployments. Each time we need to take an engineer out of the pool, e.g., if they're sick or on leave, then we have to go and touch all of those rosters, updating and replacing them. Whereas, if we could just take the person out and have it automatically fill in the rostering, then that would make life a lot easier for managing it.

We have an on-call phone number. However, at the moment, it is routed to a static voicemail. We would actually like to be able to have that phone follow whoever is on-call.

For how long have I used the solution?

I have used it for five or six years, possibly longer.

What do I think about the stability of the solution?

The stability is pretty good. There was one incident where push notifications stopped, but it failed over to SMS and phone calls, so it really didn't make much of a difference. Even then, because we didn't get that many alerts through at the time that they were having push notification issues, it didn't bother us. It was resolved very quickly (in about an hour). The only reason we noticed it was because they told us about it, not because we found it.

We haven't had any issues where PagerDuty caused an impact to us from their maintenance. Using their product, we have been able to set our alerts into maintenance, which is good. There has been no downtime from them being offline, or anything like that.

Before, we would have needed to have done a lot of alert path manual management, then going through afterward, enabling and disabling them. Whereas, within PagerDuty, it's so much easier. You just go in there and click a service on the maintenance, then it automatically does it all for you in the background. We don't have to sit there and think about it. So, it's quick and simple. This is saving us a good hour a month, because that would be one engineer sitting there going through, updating alert paths, etc.

Because we are a payment processor, we can't go offline. We need to be very on the ball and on point with any issues that come up. Having PagerDuty there means we're able to do that.

What do I think about the scalability of the solution?

The scalability is pretty good. I haven't seen anything that would restrict it. Because it's a SaaS platform, you can pretty much plug anything you want into it. I haven't had any restrictions on what I can feed into it from an alert perspective, and we can just keep adding more users as we see fit.

It is an integral part of our operations environment, so we wouldn't want to change or reduce it in any way. If our production environment increased and we had to add more services to it, then it's easy enough to do. It's not as though that is a major problem.

We have eight users in the organization. We also have a couple of stakeholder licenses where we notify stakeholders of major events. These are not actually interactive. They don't get alerts, but they'll get notifications if we allow them, such as, adding them to an incident. They will then get notifications from it, not necessarily alerts. There are internal, as we don't have external clients in that loop because the information management is a something that we keep a tight handle on and that is very manual.

There are two other DevOps engineers who maintain it. There is redundancy if I'm sick. However, I still take the lead on a lot of the stuff.

How are customer service and technical support?

We worked with technical support at one stage when we were trying to get a mail filter. We wanted to set up a complex mail filter with some rules around it. That is when we contacted them, though this is not an ongoing requirement. They were pretty good and very informative. They were to the point, without being blunt.

Which solution did I use previously and why did I switch?

At the time of implementation, the solution was to replace our SMS-based solution, taking the rostering and management of the SMS rotation and making it easier. This was a bunch of homegrown shell scripts that had a little modem card, which would send SMSs to us.

We switch to PagerDuter mainly because of the maintenance and inflexibility of our original solution. We had to maintain it ourselves, paying for the upkeep of the modem, SMS account etc., then making sure that we could send the information to various phones on different carriers. By going to PagerDuty, we were able to come up multiple paths to be able to get those alerts, not just by our SMS.

Previously, we were manually copying and pasting the information. Per incident, it was taking us maybe half an hour, because someone would have to sit there and copy things backwards and forwards, making sure it was all in sync at the end of the incident.

When we first started looking around for a product to replace the existing alerting process, we found this product where alerts were more visible. Then, based on that fact, they were more visible. After a while, this naturally reduced the quantity of alerts by making them more visible. This made it easier to deal with issues because we were able to see alerts. Also, everybody saw them, not just one person.

How was the initial setup?

The initial setup was really easy. We just went in there and clicked a couple buttons, then away we went. 

Anytime you need to set something up, the initial setup is great, quick, and easy. It's when you get into some of the nuances, like rostering, where you have to take a person out of a roster, then put them back in. That sometimes adds a bit of complexity. However, the initial setup was one of the things that sold us on it since it was so quick and easy. That is because it is a SaaS-based solution.

When we initially started it, it was like me fiddling around on one weekend. I said to the guys, "Look, I've got this going," then it pretty much went from there. So, it might have been an hour at the most. It did not take long at all.

What about the implementation team?

I was the only person who deployed it.

What was our ROI?

The main flexibility and return on investment we get is that we don't have to do the maintenance on the products that we previously had. It's just seamless. It's like, "Oh yeah, it's reliable. We don't have to do anything else." Whereas, previously it was, "Ah, is the pager actually working?" This reduces worry and everybody's comfortable with the fact that it's going to work. So, the return on investment is more a comfort factor, knowing that we're able to rely on it and not worry that, "Oh, hang on, the alerting's not working," then go and chase up what's wrong with the alerting as well as chase any other problems which come up.
The best thing that we've had is that we get alerted before things happen rather than after the customer's having a problem or notices the problem.

As a result of the reduced white noise, we have reduced engineer fatigue. This means that because the engineers are not tired, their work throughput increases. It is definitely noticeable. If our engineers is working and gets called after hours every night, then when they come in to do their shifts, they're tired because they've had interrupted sleep. Whereas, if we make sure we don't have the white noise and everything else coming through, they're still able to get through their normal workload as well.

What's my experience with pricing, setup cost, and licensing?

If you add more people, then you have to pay more, which is always a thing with the SaaS solutions.

PagerDuty's pricing seems competitive. At one point, we were looking at OpsGenie because part of their current pricing includes the call routing that we wanted to include. It was actually cheaper to get that plus the call routing than it is on PagerDuty at the moment. However, we would have to go and buy an extra module to go with it. What we have at the moment is solid, and it would be a hard sell to say, "We'll go to something else that we're not familiar with."

If we wanted phone calls or additional SMSs, we would have to pitch up for those. They give us so many per month per user, then we have to pay extra if it goes over that.

Which other solutions did I evaluate?

Over the years, we have looked at other solutions: OpsGenie and VictorOps. There was another one, but they faded away. We were also using Pingdom at one point. Some of them are still a little bit green in this space. They're definitely coming up to speed.

So far, we're settled on PagerDuty because they were the leader and only one around at the time we were evaluating solutions. Since then, we've started looking at other products just to make sure that they're still on point with what we need.

The alerting functionality is not too bad. I have evaluated other competitive products for the way you can set different types of alerts, e.g., for non-critical or critical. PagerDuty will alert you differently based on those settings, which is an advantage that we like. It will also try multiple paths so you can set it up to email you the alert, send you an SMS, phone you, or just a push notification to your phone. One of those four mechanisms means the engineer will get notified one way or another. If that doesn't work, it automatically escalates to the next person in the alerting path.

We do have a project in the pipes for probably the beginning of next year to go through and do another review to make sure that the solution has everything there. We also want to do comparisons for what other options are available, make sure the pricing is still competitive, what's on offer, and so on.

What other advice do I have?

For whatever solution you have for alerting, and it being such a critical role in incident management, you need to be able to rely on it. PagerDuty allows us to do that.

Ensure you sit down and identify what you want in any alerting platform, whether it's PagerDuty or OpsGenie. Sit down and define what you want, particularly around your scheduling, what alerts you want to be able to ingest or handle, who you want to be able to process or send those alerts to, and any other possible bits and pieces in there that you may need before you sit down and look at an alerting platform of any description. Because sometimes, depending on what it is, there may be another way of doing it when you actually go and talk to the salespeople or pre-sales engineers. They'll go, "Oh, well, you can do this, this, or this." This will avoid bright light problems where, "Oh, that's a nice, shiny light. Yeah, we need that." You actually have in front of you what you need, not necessarily what they're trying to sell you.

We have looked at the solution’s analytics, but haven't gone much into them. At the time that we were looking at it, we didn't see any real benefit to it since we are only a small team. If you would look at a larger organization, you would get more benefit out of it. However, because we're such a small team, everybody knows how many alerts are coming through. It's not as though we need to do a full-on detailed, analytical review of things.

I would rate this solution a nine out of 10. It is a reliable solution that works.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
NickYoung - PeerSpot reviewer
Director of Enterprise Reporting, Visualization & Analytics at a university with 1,001-5,000 employees
Real User
Top 20
Enabled us to meet our "lights out" goal and repurpose staff to do work of greater value
Pros and Cons
  • "The automatic logging that's built into xMatters, especially the timeline of events, is very helpful because we can figure out why a particular person got a call... Having that level of detail built-in makes it really easy for me or the managers to prove that's what happened, and we can self-serve that information. It gives people the autonomy to know why they got a call."
  • "We would like to see a greater variety of integrations with ServiceNow. It works fine as it is, but an enhancement would be the ability to interact with the major incident module in ServiceNow... The way our major incident process works, when an incident is elevated from a P1 to a major incident, that is an extra flag in ServiceNow. It would be awesome to have xMatters get notification when something goes from a P1 to a major and then have it go through a different workflow, rather than our regular P1."

What is our primary use case?

We use xMatters as our automated on-call engagement system. We use ServiceNow for major incident management and processing for the university's IT services. When there is an incident of sufficient priority, impact, or urgency, we make use of the integration between ServiceNow and xMatters. xMatters contacts our staff members who are on call to make them aware that there's an issue going on. It gets them the information they need to log in and fix whatever might be happening. xMatters can do a lot of other things, but we use it primarily for our major incident response and automated on-call processes.

How has it helped my organization?

In 2019, we embarked on a "lights out" process. We had staff members sitting in our operations center, 24/7/365. They had to watch the screens and make sure, when something went "bump" in the night or something went down, to physically pick up the phone and call somebody. In December of 2019, were able to bring those staff members back into a nine-to-five type of job, repurpose them, and move them into other roles. We let the machines do the hard work of notifying people if something goes wrong. xMatters was a big part of that because it allowed our managers to maintain their own rosters, and cell phones didn't have to be handed from one person to another. The process just worked really well. That was of benefit for our central IT.

We also onboarded our institution's public safety/police department. Before, if they had an issue where everything went down and they couldn't do anything from their office, they would either call or walk over to the IT building and find somebody in the operations center, and then the operation center would call somebody from networks. Now, we have onboarded several select people from the police station. They have the ability to use the xMatters mobile app to hit a big red button that contacts our major incident managers directly, without them having to do much else. That means they don't have to physically come work with us or find us. We were able to replace that physical process that existed prior to 2019 with a fully automated process now.

The automation provided by xMatters has helped us respond to incidents. It puts the responsibility for responding on the groups and the people who are responsible for providing service. They're getting a notification when something happens that meets a certain threshold. That's in contrast to the subjective process we had in place previously where the person who was in the operations center decided not to call somebody for whatever reason. Now that it's automated and everybody is playing by the same rules, there have been improvements on the monitoring side of things and in how things are architected. They know that if something goes down, they're going to get a call. Having the managers and the people closer to the process, with the ability to manage their own rosters, results in a little bit more responsibility, rather than just passing it off to the person who's sitting in the operations center.

The automated notification process has made people understand that they have to fix things before they go "bump" in the night. They know there is no longer a person sitting in our operations center who might decide not to wake somebody up. The machines are going to detect that something has gone wrong and they're going to notify xMatters, and xMatters is going to notify the group. Tangentially, that results in people proactively fixing things ahead of time. In turn, with people being a little bit more proactive in handling things, issues don't get up to a priority-one level as much. But when it happens, xMatters does its job and gets out of the way really quickly. It helps us deal with incidents when they happen.

In addition, the targeted notifications have helped reduce response times to IT incidents. It doesn't require a person in the operations center to call five people five times. It handles things synchronously. I would absolutely posit that our response time is quicker than it used to be.

What is most valuable?

In terms of its flexibility, we've been using it for close to two years, and we have yet to encounter a situation where somebody hasn't been enabled to configure it to work the way we want. We can configure groups to be members of other groups, enabling us to nest sequences of rosters, and that has been super-helpful in a number of scenarios. We provided a little bit of training and a little bit of documentation for the managers who had to manage their rosters and the sequence of calls, and since then, we really haven't had to do a lot, other than some reminders. But we just tell them the URL and that they should log in. They can figure it out from there. The UI is understandable. It's fairly straightforward to understand how you add a user or add a member to the roster or add a device. It doesn't take a lot of administrative overhead and that's important for us. We don't have a lot of people to manage every little thing, so people being able to do it themselves is pretty important.

And because we use it primarily for our major incident response and automated on-call processes, the automatic logging that's built into xMatters, especially the timeline of events, is very helpful because we can figure out why a particular person got a call. We can see, for instance, that it was because an incident showed up in that person's group and it went to the first person on-call and that person hit skip or ignore. It then went to the next person, called all of their devices, but they never acknowledged anything. Then it went to the next person and that's who actually picked up. Having that level of detail built-in makes it really easy for me or the managers to prove that's what happened, and we can self-serve that information. It gives people the autonomy to know why they got a call. Just click here and you'll see exactly why the fourth person in the roster got the call instead of the first.

The integration of xMatters with ServiceNow worked pretty easily. There was a little bit of configuration and coordination with our ServiceNow, but once it was set up it just worked. It does the right thing for us. We don't want every single instance that ServiceNow handles to generate an on-call notification. We only want priority-one and priority-two to result in notifications, for certain groups, via xMatters. It does that really well. That integration part was super-easy. I have also done some work with the xMatters API to pull out information about users and groups and rosters into a Google sheet. I used a Google Apps Script to interact with xMatters and pull information out for reporting purposes. That was also really easy. We use that information to see how many people are in xMatters, who's licensed, and if people have left the university we can make sure we kill off their accounts.

xMatters has also helped us build workflows that meet our needs. In comparison to all of the organizations that use xMatters, our workflows are not complex, but it does what it does well and easily. Our simple workflows consist of an incident coming in and the right group being contacted. Within that group it goes through the sequence of people in the roster, in the right order. That was super-easy to set up. It was also very easy to set up another simple workflow where we use Zoom and Google Meet for our bridge process. If somebody isn't sure about something that is going on they can send out a "Please jump on the bridge line real quick" message. We can use either the xMatters bridge or the Zoom or Google Meet bridges that we have set up. That helps us control access and costs because we're already using Zoom and Google.

What needs improvement?

We would like to see a greater variety of integrations with ServiceNow. It works fine as it is, but an enhancement would be the ability to interact with the major incident module in ServiceNow. In ServiceNow, you can create an incident which is priority-1, 2, 3 or 4. The existing xMatters integration allows you to filter on just P1s and P2s, or on all priorities, or on just select ones. The way our major incident process works, when an incident is elevated from a P1 to a major incident, that is an extra flag in ServiceNow. It would be awesome to have xMatters get notification when something goes from a P1 to a major and then have it go through a different workflow, rather than our regular P1. 

For how long have I used the solution?

We purchased it in the latter half of 2019, so we've been using xMatters IT Management for about two years.

What do I think about the stability of the solution?

The stability has been great. I can't think of a time in the last two years that it's been down when we've needed it. They've done upgrades, but I can't remember it ever being down.

What do I think about the scalability of the solution?

The pricing was good, from our perspective, for scaling. It hits the mark. If we had to add hundreds of users we'd take a look at what kinds of bulk discount rates they may have.

As far as the technology goes, it seems to me that scaling is pretty easy to manage. You start with the ability to put groups inside of groups and have nested rosters. There are workflows that are specific to groups or to particular processes and that makes it fairly easy to configure. I would expect it to be a pretty scalable solution if we decided to roll it out in a significant way.

Currently, we have 105 people licensed, and 102 of them are in central IT. The other three are in the police department. Everybody in IT who is licensed is an active user because they are on-call in whatever rotation has been defined.

It's yet to be decided if we will increase our usage. In higher education there have been some budget cuts and position losses. It's always a moving target regarding whether we're going to expand or contract. At this point, I don't think we'll expand the use of xMatters because we've already licensed it to everybody in IT who needs to be licensed. If we had to roll it out to other departments around the university, I don't see it being an issue. But we are a heavily centralized IT operation here. We don't have a lot of distributed IT infrastructure or staff. Pretty much everything has to flow through IT.

How are customer service and support?

Their support is quick. They literally react within minutes, at times, after you put a ticket in. They've been great with any support issue we have had. That was especially true early on. We haven't had one in a while, but when we had questions that weren't bugs but just our not understanding something, they were getting back to us within minutes.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We did not have something that was similar to xMatters. What we had was an old-fashioned analog method of on-call management, in which people would share a cell phone. The cell phone would be handed from person to person as they went off-call. We had staff who sat in our operations center, 24/7/365. They had the list of phone numbers in a document on their machines that gave them the cell phone numbers to call for each group. So there was a system, but it wasn't a modern solution.

How was the initial setup?

We did a couple of walkthrough training sessions with xMatters staff. It involved a core group from our side, people who were going to be the admins or the main people using and configuring xMatters. I then did a handful of walkthroughs with different groups in our IT department. Those were about 45 minutes to an hour in length and I showed them the interface and how to add their devices. We did a little bit of documentation, but not much, about our process as it relates to xMatters. We then rolled it out. We did all of the training within a few weeks, once we got close to that "lights out"  deadline at the end of December of 2019.

In terms of our infrastructure, we just added the module for ServiceNow, filled in some details according to the documentation, and hit save. That was it.

As for maintenance, the only thing we've had to do is add users and remove users. It's a set-it-and-forget-it solution.

What was our ROI?

There have been savings in process and overhead that we have been able to realize. We no longer need to have our staff looking at a screen overnight, on weekends, and during the day, every day of the year. We repurposed those staff members to work of higher value.

What's my experience with pricing, setup cost, and licensing?

It's billed per user license.

The way we approached it was to look at who actually needed to be on-call and licensed people accordingly. The pricing is tiered so we took that into account. If we were to license 10 or 20 people, that would be a certain price. And if we were to license 50 or 100, there would be a little bit of discounting. But the per-user license was right in line with what we were expecting.

Which other solutions did I evaluate?

We looked at PagerDuty, Opsgenie, and VictorOps. We considered all of them and looked at some demos, but we didn't get as far as doing a full proof of concept. The main reason we ended up going with xMatters was that it seemed that a lot of the alternatives I mentioned were built on the premise of being the actual incident management tool, and not just an on-call management tool. We were very clear that we needed a tool to do on-call management, and that ServiceNow was going to be our incident management tool. We just needed something to bring people together by notifying their mobile devices or by making a phone call to alert them in the middle of the night. xMatters fit that perfectly.

What other advice do I have?

I don't think I've ever had a complaint about it. xMatters just works.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
IT Alerting and Incident Management
September 2022
Get our free report covering PagerDuty, VictorOps, Everbridge, and other competitors of Opsgenie. Updated: September 2022.
632,539 professionals have used our research since 2012.