Use for log aggregation, alerting, and monitoring in a container environment
- Searching errors
- Alerting through Slack and OpsGenie using their plugins.
We run a containerized microservices environment. Being able to set up streams and search for errors and anomalies across hundreds of containers is why a log aggregation platform like Graylog is valuable to us.
Allowing us to set up alerts and integrate with platforms we already use, such as Slack and OpsGenie to alert users of these errors proactively, is also a very useful feature.
Elasticsearch recommendations for tuning could be better. Graylog doesn't have direct support for running the system inside of Kubernetes, so it can be challenging to fill in the gaps and set up containers in a way that is both performant and stable.
We ran into problems with Elasticsearch throwing a circuit-breaking exception due to field data size being too large. It turned out that the heap size directly impacted this size in a high-throughput environment, causing unexplained instability in Graylog. We were able to troubleshoot on the Elasticsearch size, but we should have been able to reference some minimum requirements for Graylog to know that our settings weren't sufficient.
Otherwise, the documentation is great and there are a lot of options for configuration. Since container orchestration systems are popular and Graylog fits the niche well, perhaps they could officially support running in docker containers on Kubernetes as a StatefulSet as a use case. That way, the declarative nature of Kubernetes config files would document their best-case deployment scenario.
One to three years.
No issues with scalability.
Splunk, Logstash, and Elasticsearch.
Set up in Kubernetes; not complex once the configuration is right.
Splunk, Logstash, and Elasticsearch.
Make sure your Elasticsearch cluster is sized right, memory-wise.
FROM GRAYLOG: Thank you for the review, and wanted to point you to our new 3.0 version of Graylog. In 3.0 we have the ability to export content packs, which you can then migrate your processing pipelines, alerts, dashboards, and lookup tables, so they can be moved to a different system or be shared with the community. Also, in 3.0 Enterprise side, we have implemented Views, which allows for much greater flexibility on searches as well as creating interactive dashboards. Also in views, we have added a parameter option, to build workflows all based on one input (i.e. IP address, User name).
If you have a chance, give the new version a try!