What is our primary use case?
A lot of things that Foglight does could be derived from DMVs and extended events. I'm going to sound like a salesperson, but I have to be a salesperson to sell the value of a product to my company. They need to understand that to answer some of the questions they ask, not having a tool like this will make my answers very speculative.
As a DBA, you have to be able to answer three questions. The first is: What's happening right now? Why is the system slow or why are things not responding? That is probably the most trivial for an experienced DBA. That is where the tool's value might not be as obvious, as you can look at the sp_who or DMV and pretty much tell what's going on without having to pay money for a license for a product like Foglight.
The second question is: What just happened? There could be just a couple of seconds difference between the first and second questions. But the effort to answer the second question is significantly higher because it is water under the bridge. You need some kind of monitoring solution implemented, even if it's just a basic solution where you capture a certain timeframe, so you can roll back and review what just happened. However, there will still be a significant amount of speculation because, usually, you can't afford to monitor every single metric, and there are hundreds of them. The issue could be the OS, it could be infrastructure-related, or it could be that the SQL code is not performing well because it's not written well. So the second question is significantly harder to answer, and that's where a tool like this will become very helpful.
The third question is: What has been going on? That is by far the hardest question to answer without this type of tool. This is the type of question a manager might ask for the purposes of resource planning. Or a senior VP might say, "Hey, how are we doing? Can we bring on another customer? Can we sustain a 20 percent increase in workload?" I don't know how I would answer that question without having this type of solution. I work in the industry quite a bit and there is, unfortunately, a lot of misunderstanding due to a lack of a comprehensive view of infrastructure.
There's no way to answer that question without getting some kind of baseline tool. Unfortunately, in most of the shops I work for, only one question is usually being answered relatively accurately, and that is "What is going on?" And it's a luxury because by the time a customer escalates an issue and it gets interpreted by support people, there's a gap. That gap could be a couple of minutes, a couple of hours, or a couple of days.
How has it helped my organization?
Ultimately, when you negotiate how many licenses you need, you always find the most problematic instances. You also have to also evaluate the culture and maturity of an organization. Unfortunately, there is often a lot of legacy code to maintain. It's not always easy to identify those things quickly.
In that context, Foglight has been pretty spectacular in terms of the number of times I have been able to answer questions that nobody could answer before. I used the tool and showed my team how you use the tool to answer a lot of those questions, and some of those questions were pretty complex. We'll have deadlocks, we'll have locking conflicts, we will have blocking, and we'll have unexpected CPU spikes. Obviously, there is some complexity involved with the architecture and that is not always clear, but the tool is phenomenally helpful in enabling us to change and repair things.
We have also been able to predict a problem. A lot of times you can see a particular process starting to misbehave. Visually, you can see the spike, and it is something that could potentially lead to a bigger problem because the process will not scale. It gives you an opportunity to address things before they become real problems.
When it comes to displaying intensive database queries, Foglight is the best tool. Spotlight does not do that very well and Foglight is fantastic. It enables what they call a multidimensional analysis. You have a visual presentation of query resource utilization and you can slice it by the type of resource. You can also slice it by the number of executions.
For example, a few times I've seen a server running very hot, the CPU would be 80-plus percent, and people are starting to freak out. But in reality, the box is very healthy. It has no locks or blocking. Rather, it's utilizing the CPU because that's what you want it to do. You always need to juxtapose multiple metrics simultaneously, and Foglight is really good for that. It has a dashboard where you can look at multiple parameters and components at the same time. If I see the CPU goes up and I also see the number of connections goes up and the number of batches per second goes up, to me it just means that SQL Server is working hard because we are processing fast and we are able to have more work done in a particular time frame.
A lot of times, when you do have problems, you actually see the CPU go down. People say, "Well, what's the problem?" The problem is that you have some internal blocking or locking, or some kind of resource contention, and the CPU cannot process as many batches per second.
When it comes to identifying the least performant queries, or queries that are performant but that are just very hyper with a lot of calls to them, that is where the tool really shines. It allows you to identify those things quickly.
What is most valuable?
Thinking about my favorite feature in Foglight is like thinking about my favorite food. One of the hardest things to do in database management is to evaluate performance deviation across time. The adaptive baseline that Quest is using is by far the most helpful. That doesn't necessarily mean I use it on a daily basis, but it is something that I have not been able to find in any other tool, at the same level. Microsoft and Query Store do provide performance monitoring. There are a lot of legacy, built-in, free-of-charge tools that do part of the job, but not as comprehensively. Foglight's adaptive baseline monitoring and measuring of deviation is one of the best features of the solution.
What needs improvement?
Foglight does have a component that allows you to look at things in real time, but it's not as friendly or as efficient in terms of responsiveness as Quest Spotlight is. Foglight might be lacking in this department. It could be just the nature of the beast, it could be the fact that it's web-based, as opposed to Spotlight being a fat client running on C#. I use both tools in conjunction with each other; they are a part of the toolset. Foglight might not be as helpful.
It's also possible that part of the issue is how Foglight is deployed. We always try to save on cost and because it requires a SQL license, you don't necessarily have the luxury of putting it on a super fast server. It could be related to that. But I have noticed that it's not as responsive for determining, in real time, what's going on.
The way I have understood things is that there was an attempt to merge Spotlight functionality with Foglight. They have somewhat done that, even though I still feel that they're not going to be able to completely kill Spotlight. That tool is done so well and it's really serving a purpose in terms of a real-time, very fast analysis of multiple metrics.
I don't like what they did with Foglight because it's an attempt to merge. It's like a sports car versus a heavy-duty truck. They are both fantastic, but when you try to jack up a truck to work fast, it doesn't work very well.
For how long have I used the solution?
I've been using Quest Foglight for Databases since 2010, approximately.
What do I think about the scalability of the solution?
Our company is growing. It has almost doubled in size over the past few years. We do get more servers now and we are considering upgrading to the enterprise model.
How are customer service and support?
I had some issues with support. It's possible that they were because I wasn't able to respond to some of their requests.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
There was a prior version of the product called PASS, Performance Analysis for SQL Server. While Foglight is an eight out of 10, that product would be a nine. From what I understand, they had to change the architecture of the product because the previous implementation was very difficult to maintain. They had a completely different architecture with a memory-resident client that was scrubbing memory up to 50 times per second. That meant the tool was much more responsive and provided a lot of active information, but because of the need for internal information every time Microsoft changed its product, which now happens a lot, they had to have a team of experts to constantly work with Microsoft to modify the agent that was in memory, and it became very difficult.
They had to move on to a different architecture, and that is the architecture we're discussing today, which is not as responsive. And while it retained a lot of functionality and added some, it lost the feel that the previous one had.
I still rate Foglight pretty high because I can't think of any other tool that goes above five or six out of 10.
What's my experience with pricing, setup cost, and licensing?
This is like the "Cadillac" of performance monitoring software. It's not cheap. I work for different companies and some were not able to afford it, so I can't say that I have consistently used it for over the last 10 years. But I have used it at different places and I am currently with a company that I was able to convince that the product is worth the cost.
Which other solutions did I evaluate?
I have had a look at other tools including Redgate SQL Toolbelt and Idera. They are monitoring solutions and they each have something that I like very much, but none of them came close to the comprehensiveness of Foglight. It's a very mature product and has been around for a very long time.
I am a little bit biased and I told my team, "Look guys, I have used a lot of tools but I like this the best. I'm very biased because I have used Quest tools so much. I am very proficient with them, so you need to check me on this." We had multiple meetings and presentations and there was agreement that Foglight, while being the most expensive tool, was the most comprehensive. The value was there.
There are tools out there these days to build something like Foglight on a budget. But if you go that route, what you don't have is a team of engineers who are subject matter experts who are working directly with the database vendor so that they're ahead of all the new features. When new features come out, if you have a "homemade" system, you have to stop what you're doing and concentrate on improving your solution.
I go to presentations often and there are some brilliant DBAs who have built their own dashboards, and I'm very impressed. These guys are unbelievable. But if I ask them, "So how much time does it take you to maintain it?" they usually won't tell me. A lot of the time, they don't maintain it because they build it for what they think they need. But eventually, if you stop maintaining a product, it becomes less accurate.
You have to evaluate your labor and how much time you're spending on maintaining a product as opposed to providing value for your company. If a product like Foglight is updated and functions well, it's kind of cliche, but it's like having a full-time DBA on staff for which you're only paying the cost of a license. The couple of hundred dollars for a license is basically a DBA that you have hired because you don't have to do those things. The tool does them for you.
It's not even the time involved, it's also a matter of staying focused on something. You can only juggle so many balls. If you constantly have to concentrate on tinkering with performance monitoring, you're not spending as much time developing your solutions.
What other advice do I have?
If I get a ticket that says, "We had an outage a couple of hours ago," I'm lucky. Most of the time a ticket will say, "Let's evaluate this outage we had a month ago." Even with a tool like Foglight, that becomes significantly more difficult. The tools are very granular but the farther in time you go, the less granular they become. That's just common sense to save on storage. Once you lose the granularity, some of the intermittent issues might be lost.
That is why I always tell folks that if something happens or they're suspicious of something, "Before you file a formal request give me a heads-up on it right away." If I look at it quickly, I might be able to pinpoint the exact root cause. If we wait for the formal workload to escalate to me, the answer could be much less accurate. A lot of times it requires a lot of domain knowledge to be able to ascertain if it's related to the infrastructure, the syntax, or both, or just some weird thing that we usually attribute to hiccups with the cloud.
There's a companion product called Quest Spotlight that has some functionality in common with Foglight. But I'm glad that they will never really collapse into one. I believe this has been their strategy for at least for the past five years. Spotlight is something that I have used longer than Foglight because it's a cheaper tool. I wouldn't say less sophisticated, but it's targeting less senior people. In other words, it's very easy to navigate and could be used by executives and people who are not necessarily IT-savvy. Whereas Foglight is a lot more in-depth and requires significant expertise to derive the information you're looking for.
I often find that an initial estimate about the root cause is wrong. You're not working with a static environment, especially if you have mixed workloads such as online transaction processing with a lot of in and out, as well as decision support systems where you have a long query reporting. They're not easily separable these days. People just assume a database is supposed to do both. While it does do both, it's hard to fine-tune it for both. One is a race car and the other is a truck. How do you make a race car haul a lot of loads and how do you make a truck super nimble and fast? So you're constantly adjusting things.
Sometimes I have to go back and look at the baseline. The answer might be that it's Tuesday, and on Tuesday we usually have a bigger workload. Sometimes the answer is that nothing is going on, it's just the nature of the best.
I have to be able to separate the tool's capabilities from the inconsistency in performance due to the fact that a lot of stuff is going on and things are not always consistent. It's not always easy to pinpoint what is really causing an issue, but Foglight certainly helps to identify actual resource contention.
The solution helps in both ways because you're able to look at the baseline and see that on Tuesday this spike is acceptable. But you can also look at what it is about Tuesday that is causing us to run so much slower.
I've been hearing for the past 10 years that my job is obsolete and that AI is going to take over. At first, I was nervous about that. Now, I'm just laughing because, with every year and more functionality, it is becoming much harder to make tuning decisions. SQL Server claims to have auto tuning but it's very limited in scope. Any experienced engineer will tell you that when SQL Server comes up with any kind of advisories or any kind of suggestions on the index build, you have to take it with a grain of salt because its view is very limited. There are a tremendous number of dimensions you need to be able to evaluate. I feel we're still far away from a self-healing, self-tuning system.
*Disclosure: My company does not have a business relationship with this vendor other than being a customer.