Our primary use case for Threat Stack comes from the fact that we are a smaller company and we don't have dedicated security resources. The biggest thing we need is auditing logs and the agent on all of our machines, and then using their SecOps program to help us filter and analyze that data, because we obviously don't have the manpower to do that. So our biggest use case is collecting all the data, having them there to watch our backs and to help give us recommendations on what we can fix, and to help us in case there is an incident, where they can help us track everything.
One of the ways they've improved the way our organization functions is that when we first signed up with Threat Stack, we were just using password authentication. Managing 70 servers with passwords is terrible. One of the first things they noticed and that we collaborated on was that we needed to start automating some of these logins and actually knowing who was on the box, so it wouldn't always show up as the same user. We needed to disable password authentication, which makes it a lot easier to deal with search certificates. It was basic stuff that I already knew.
Then they pushed us in the direction of automating that, removing the human element from it. We've been moving forward with Chef to automate all of that so I can get rid of the password authentication on all of our servers. That's just one code-push and all the servers update and I don't have to worry about it.
If I lose an employee, I remove their key from all the servers via Chef and they can't get into any of our boxes. If I need to redeploy an SSL certificate to our boxes, it's all done through Chef. That's been the biggest thing, their partnering with us and helping us identify ways to automate and make that better.
They've also helped with a couple of security audits that we've had and that has been very helpful.
The other positive changes are better insight into security, so when something happens it's not just, "Oh we had a security incident," or, "I've seen a tech blog that says there's a vulnerability in this version of Nginx. Which of my servers have that?" They take care of all that for me.
Also, getting the alerts in your face - we've integrated with Slack - so the dev team gets a notification every time a vulnerability is found or something is off. We can then check if someone really did that in our AWS account: Did somebody really mean to do that on our server? And then we can address the issue that way.
Rules give us more visibility and control over what's being triggered and that's been super helpful. I don't have the time to go in there and create those rules. So instead, if we do something that's out of the norm - something we're allowing security-wise that we probably shouldn't, but we're going to address it in the future - they'll contact us, they'll reach out to us as soon as they see something as an anomaly and say, "Hey, did you mean to do this?" We can then say, "Yeah, we did," and then they'll help us configure those rules to suppress them for a limited amount of time until we can resolve the issue, so we're not inundated by non-useful alerts.
In terms of cloud infrastructure, the biggest thing is the fact that they do connect with our AWS account and they let us know which boxes are and are not running the agent. They give us details on that. That's the biggest insight they've given there. That's allowing me to see which servers I have my agent on and which ones I don't. I can get a quick glance at my weak points and servers that I need to either migrate over or get rid of.
We have also seen a measurable decrease in the meantime to remediation in the sense that before, we wouldn't have even been able to detect and then get to the remediation. The remediation wasn't even happening. Now, we're actually alerted to and can start working the security issues. Before, we never would have known, so that's quite the improvement. It's really hard to quantify because we didn't have a good process. We were oblivious to vulnerabilities.
It has absolutely cut down the time to investigate potential attacks because it tells us immediately via Slack. We have a link, we click the link, we open Threat Stack, and it takes us right to the events we need to know about. That's been just awesome. In terms of time saved, to go in and dig through the servers and find all the logs, it probably saves 45 minutes to two hours per incident, depending on how impacting it is. We get a handful of alerts a week that we have to deal with, so we're saving a couple of hours a couple of times a week. Obviously, partnering with Threat Stack and implementing Chef makes all of that a lot faster. If you take into account all of that, we're saving oodles of time. If we actually had to go patch every box manually without Chef - which we got because of Threat Stack... That's saving a boatload of time, because of their recommendation and going through the security measurements.