We use Turbonomic to evaluate all of our virtualized clusters. Initially, we were only using Turbonomic for our long-term VMware stacks. Now we are monitoring VMware ESXi 7 and Nutanix AHV stacks. On the server side, we have 400 VMs. We don't evaluate the VDI side because we have 1,100 seats, so it's too expensive. We made a special contract on ELA for vRealize Ops for VDI on that side. It wasn't horrible. It's just the bare minimum to show us if there's a problem in the stack.
We mainly use Turbonomic as a heat map, but we aren't drilling down into the performance of individual applications like Kubernetes. That's Docker or Swarm, but we use other tools to monitor the transaction levels, etc., instead of Turbonomic.
Turbonomic is our overall heat map in our NOC. We fire it up, and when we see a red flag, we dig into it, and off we go, but the basic application components do not have our Dockers linked to them. It's just mainly working on the surface of the virtualization stack itself.
Our infrastructure is solid enough that I get a VDI call about every three weeks. My server farms are built like tanks, so it takes a lot to take them down. We can sleep well at night. Everything is on-prem. The only cloud solutions we use at the college are SaaS systems. We don't put much in the cloud because cloud environments are too vulnerable to hacks and exploits.
We're going from a silo system to HCI — from 450 hard disks to hybrid flash. While we undergo a significant infrastructure change, we're using Turbonomic to watch VMware because it has aged, and our migration isn't happening the way we want. We will probably reevaluate when the next contract is up for Turbonomic instead.
Once we switch to pure Nutanix, we will reevaluate Turbonomic. I will probably keep it because management is used to Turbonomic's reporting. That saves me much OPEX time building those reports out of Nutanix by hand. I've been here for 16 years, and my CIO has been here for 17 years. They're used to the reports we've been developing over the last decade. We developed them using VMTurbo. We set the standard with that first tool for reporting.
We use Nutanix Prism Central to manage everything on the Nutanix side, but Turbonomic provides ancillary information that gives me a holistic view of reporting and more features that Prism Central doesn't cover. Turbonomic provides linkages, visual aids, graphs, charts, etc.
I'm the one who uses it. It's up on my NOC screen. We log and monitor it pretty much every day. Then, once a month, it generates reports on its own. In that sense, it's used daily or monitored daily. We watch what it's reporting every day on the heat map. Regarding issues and such, maybe every couple of weeks we have something pop up that we look at.
Turbonomic helped us with cluster projections. We have different-sized hosts in a single cluster. I have two-socket and four-socket hosts sitting in a cluster, so the impacts aren't easy to understand in aggregate. Turbonomic helps to evaluate what will happen in hypothetical configurations. I can forecast the effect of dropping one server and adding another. If I drop a pair of 48 cores and add a single 96 at a different gigahertz, will that be adequate? It can tell you if you need to add more cores to manage your server hardware purchasing.
It also assists us in evaluating performance risks. The dashboards show what current risks are happening, and we use the planning features to see the what-ifs. We check the heat map daily. If something pops up there, we check it out to prevent issues from happening down the chain. It's mainly on the VMware side with the older VMXs. We haven't found anything on the Nutanix side to be worried about.
Turbonomic has helped us address performance degradation under VMware. It identifies when there's a bottleneck in the storage line, so we can start moving some virtual disks around to different ones. It helps in the older silo structure. The performance degradation is on the VMware or the fiber channel SAN side. Some of the SANs are nine years old.
It is able to identify points before we even noticed them. We're meeting all our SLAs because it never gets to the point where they catch something. They might say, "Oh, it seems a little slow," and then they'll return from lunch, saying, "Oh, it's okay again."
We log into it in the morning and let it sit up in the NOC. We take a peek when it shows something. We'll check it out if it's red, but it'll usually clear up if it's yellow. For example, all systems might run at 110% immediately before registration closes while students try to get their last class for their senior year registered before the other students. We'll return to our normal 20-30% usage in about an hour and a half.
They won't notice a thing because we'll be moving to more of a Kubernetes Docker-style system with Nutanix Carbon. I will probably try to integrate that with Turbonomic. We will probably connect Turbonomic deeper into that stack because that will be able to pull and spin up new Dockers automatically on hardware and not within anything else, giving the server room to spin up another Docker. Theoretically, I've got room for about 600 more containers, and we currently use 15.
We're centralized IT. I use Turbonomic mainly as a showback because we don't charge our different departments. There technically is no charge in our current Red Hat licenses, and that's picked up. We pick that up and get requests in. There is no self-service here.
The instructor says, "I need 400 cores and two terabytes of RAM to run my analysis." I'm like, "That's how they run it on a supercomputer. We don't have those here. Now, if your research grant wants to buy us one, sure, we'll set it up. Tell us where the half-million dollars is, and we'll set it up for you." There's no self-service here, but we use it for a showback.
We had Turbonomic load-balancing all our clusters, and we did not let VMware load-balance our clusters because of the algorithms. Their marketing and share algorithms were much more precise than VMware's because I had disparate-sized servers.
VMware liked to put a heavy load on my little boxes and leave my big boxes alone, or it stuffed the big boxes full and left the little boxes alone. Turbonomic keeps everything about even. Their algorithm for load balancing was much cleaner until the ESXi 7 than VMware. That made the hardware more cost-effective because I didn't have little guys sleeping in a corner someplace sucking up hardware, power, and cooling while not doing any work all day.
Resource starvation has never been an issue for use. We run different resource pools, and we've never had any service hit 100%. I have redundancies and reserve capacities needed to weather any storm. We use Turbonomic primarily to monitor and maintain equal resources on all servers.
We've never had a server hit 100%. I might have one hit 80% periodically before they moved something around. We've been in the VMware game since 2.X back when a monster server had four cores and 32 gigs of RAM. We've been virtualized over 80% for the last 12 years. We've been heavily virtualized for over 80% of the previous decade. We knew virtualization was the way it was going and went for it.
It reduced our operational expenditures because I have reclaimed some of the time typically spent generating reports. It's part of our system, and we just use it. Turbonomic is part of our network operations center. The dashboard is on my screen, so I can see if the indicators turn yellow or red. I can address the issue before it gets to the point where I'm getting calls from the service desk.
Great review. Been a user for over two years and recommend the product highly as it is valuable especially the planning module. The automation is great too.