What is our primary use case?
We have deployed the tool. It’s an event correlation tool. Unlike tools like Splunk, which have their agents running on servers to send telemetry data and logs, Moogsoft operates differently. It collects data published from various telemetry tools and then handles the correlation. For example, if a web server goes down, Moogsoft will generate an alert indicating that the server is down. However, this alert alone may not provide meaningful insights. If the web server is down, there might be other related issues, such as database connection problems or API failures. This can overwhelm users with numerous alerts.
Moogsoft addresses this by filtering the noise and consolidating alerts into a single situation report. This way, users won’t be bombarded with alerts. Instead, they’ll receive a clear overview of the problem, identifying where it lies and what components are affected. This allows the Site Reliability Engineer to understand the situation better. Once a situation is identified, you can define workflows. A workflow might involve an automated repair plan that outlines specific actions if certain conditions are met. If action A is performed, then feedback will dictate the next steps. We have deployed Moogsoft for an automobile client and a bank in Singapore.
How has it helped my organization?
We have observed improvements since integrating Moogsoft into our system. Initially, we had various monitoring tools, including Datadog, OpenSearch, CloudWatch alerts, and Prometheus. Each team was working in silos, using different tools without a cohesive view from a business perspective. Once we integrated these tools and allowed data to flow into Moogsoft, we could establish workflows correlating events, clarifying our monitoring efforts.
Moogsoft excels as an event correlation tool but requires substantial data to provide effective insights. On the first day, it may not offer comprehensive visibility. Dynatrace provides immediate insights upon deployment, making it superior in terms of initial usability. While it captures essential data, it may lack deeper visibility into interactions between servers, only showing which IPs data is sent to.
Moogsoft becomes much more functional as more data is onboarded. Its machine-learning capabilities take about one to two weeks to identify similar past situations.
What is most valuable?
Moogsoft has a feature called cookbooks. A cookbook is a concept introduced by Moogsoft that allows you to define actions based on specific conditions. For example, if an alert or event comes up from a server, you can set a time window of sixty minutes for any events from that host or custom fields to be grouped. This means you can organize alerts not only based on the values of the hosts but also the timing of the events.
In my experience with production and infrastructure cases, one issue often leads to another, creating a domino effect. Moogsoft helps visualize this by providing a clear view of how the problems began and where they ultimately led, giving you a comprehensive understanding of the current situation.
Another key feature is the journaling capability for SREs. When someone resolves a ticket by following certain steps, Moogsoft’s AI or machine learning can detect similar situations in the future. It then shows past incidents and the solutions that have been implemented, which have received positive feedback from my clients. It reduces the mean time to resolution, making the overall process more efficient.
What needs improvement?
Moogsoft's integration options are somewhat limited. It primarily relies on webhooks and APIs for data input, meaning external systems must push data to Moogsoft; it cannot pull data independently. If I want to connect CloudWatch or Prometheus to Moogsoft, I have to write custom code on the Prometheus side to send data through Moogsoft’s APIs or webhooks.
Other monitoring tools like Splunk and Dynatrace typically have agents that reside on the system, automatically collecting and sending data. This difference can create challenges for teams needing seamless integration. Although Moogsoft has developed some plugins, such as those for Grafana and Zabbix, that are ready to use, they don’t have comprehensive support for every tool.
Splunk has developed an AI assistant that can answer your queries and everything else. Moogsoft lacks that. Dynatrace has this AI component that will identify things on its own. On the other end, Moogsoft, you must set up the workflow. You have to come and set up the cookbooks.
ServiceNow also has good event correlation. Splunk is fantastic with the event correlation.
For how long have I used the solution?
I have been using Moogsoft for one and a half years.
What do I think about the scalability of the solution?
Moogsoft is highly scalable, thanks to its API and webhooks, which allow various software to send data. This means it can collect data from multiple sources. However, a downside is that you must write integration code for each data source. For example, integrating with tools like PingID requires developing a specific code to send the data.
While this adds extra work for the deployment team, the scalability remains strong. Unlike some platforms, such as Splunk, where knowing Python is necessary for certain integrations, Moogsoft allows users familiar with Java or other languages to write code and send data easily.
How are customer service and support?
Prior to the acquisition, Support was good. After the acquisition, things became a bit slower, which seems to be common in many organizations. For instance, we experienced similar delays in the first few months when Cisco acquired the platform. Issues like unpatched vulnerabilities in new packages created challenges from a security perspective. Although it was a managed environment, access was restricted.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
When comparing Moogsoft to ServiceNow, one major issue is that Moogsoft lacks widespread understanding and effective marketing, leading to a perception of it being somewhat buggy and different. ServiceNow has implemented event correlation in a user-friendly way, adhering to industry-standard guidelines. Its terminology, such as workflow, definition, and knowledge base, is straightforward, making it easier for users to navigate. Moogsoft, however, uses terms like cookbooks and recipes, which can create confusion for new users.
From a deployment perspective, Splunk offers a straightforward installation process. Deploying Splunk is as simple as a few clicks in the UI, even on Windows Server. Moogsoft requires very specific system configurations, such as a particular Red Hat and Java version, which can be outdated and pose security risks. This makes deployment challenging for Moogsoft admins. Other tools have adopted more flexible deployment options, including Docker images, allowing for easier integration and reducing dependency concerns.
How was the initial setup?
The deployment of Moogsoft was quite challenging, primarily due to its Java dependency. It relies on an older version of Java, which created compatibility issues with our servers that are set up with the latest version. Moogsoft utilizes several services, including MySQL, NGINX, Apache Tomcat, and potentially Jenkins for scheduling. To ensure proper functionality, you must have the correct versions of all these services running on your server, complicating the setup process.
The documentation can be extensive, and it can be tedious to ensure everything is correctly configured. If Moogsoft had implemented their solution using Docker, packaging all components together, it would have significantly simplified the installation process. A Docker image would allow for easier deployment and management, reducing the complexity of handling multiple dependencies.
We could deploy Moogsoft on day one, but several functionalities did not work as expected. To address these issues, we had to raise support tickets. It took about seven to eight working days to resolve everything and fully operationalize the system.
We were the consultants deploying for this automobile client. We needed help from Moogsoft support.
What's my experience with pricing, setup cost, and licensing?
When comparing pricing, Moogsoft is significantly more affordable than Splunk and ServiceNow. Moogsoft offers an on-premise solution with a one-time licensing cost, eliminating the recurring fees based on data ingestion that tools like Splunk impose. This structure is advantageous, as it allows businesses to avoid skyrocketing costs associated with the amount of data processed.
Splunk and similar tools charge based on the volume of data ingested, which can lead to increased expenses, especially as data volumes grow. This has led to the rise of tools like Kribble, which helps deduplicate data to manage licensing costs.
While promoted more recently, Moogsoft's SaaS solution still operates on a pay-as-you-go model based on data volume, which can be costly for customers. In contrast, with Moogsoft, once you own the software, the ongoing costs are primarily for support, much like owning a Dell laptop, where you pay for the hardware and only incur additional costs for maintenance or support. It provides a clearer understanding of expenses and is generally more favorable from a customer perspective.
What other advice do I have?
If you manage anything on the managed server, you might have to update the OS separately from that one.
Overall, I rate the solution a seven out of ten.
Which deployment model are you using for this solution?
On-premises