What is our primary use case?
We are working as SRE, so you can consider us as an admin or consultant role because we are the ones who manage the complete cloud infrastructure. Let me introduce myself; I'm Tarun, and I'm working at Infosys, an Indian-based company. I have a total experience of 10.5 years in the IT industry, and I have worked on multiple tools such as ITSM tools, ServiceNow, BMC Remedy, and also got a chance to work on Grafana, Splunk, Prometheus, and other monitoring tools. I work at Infosys, managing their own cloud technology, a recently developed cloud platform.
We are responsible for managing the complete infrastructure using the CloudStack platform. Our basic day-to-day activities include managing users, dealing with VM creation issues, and checking hypervisors when users encounter problems. We have various regions, including Reno, Mesa, and Maiden, where our servers run. If there is an issue with a particular hypervisor, we receive tickets from users, check the services on the hypervisor, and troubleshoot accordingly.
We also perform patching activities during weekends, including DB patching, which requires us to drain hypervisors and take downtime for management servers. Alongside these activities, we monitor everything using Grafana, which offers dashboards and alerts for issues such as service disruptions or resource utilization increases.
What is most valuable?
The features or capabilities of CloudStack that I have found most valuable include the scaling and load balancer functionalities, which are very useful as they automatically assign VMs to the appropriate host based on current CPU or memory usage. Another great feature is its ease of use, requiring no prior cloud knowledge, making it straightforward to set up the management server along with zones, pods, and clusters. Additionally, the VM migration feature is exceptional, allowing us to drain VMs to other hypervisors efficiently if a host goes down, ensuring business continuity.
CloudStack has greatly helped integrate our existing infrastructure because we manage a massive cloud platform with thousands of VMs and around 1,200 to 1,500 hypervisors. We are planning to expand to over 5,000 hosts. Given this extensive landscape, we appreciate how smoothly CloudStack handles operations without disrupting business. We maintain four management servers along with the hypervisors, distributing them across various locations such as Maiden, Reno, and Mesa, allowing us to manage our vast VM landscape effectively.
What needs improvement?
I believe enhancements could be made in CloudStack, particularly in the management servers. I had the opportunity to collaborate with one of the Apache CloudStack developers, leading to the installation of additional agents for better information retrieval from hypervisors. We implemented configurations, ensuring consistency across hosts and VMs. We also utilized Ansible or YAML files to automate password upgrades during patching activities and database management, showcasing our customization efforts for improvement.
For future releases, it would be beneficial if Apache enhanced CloudStack by integrating alerting systems directly. We've had instances where hypervisors experienced overloads without prior alerts, so proactive alerts within the tool would better prepare us for addressing issues swiftly, preventing disruption in business operations. A portal summarizing system metrics alongside alert capabilities would greatly enhance our operational efficiency.
For how long have I used the solution?
I have been working with CloudStack for around two and a half years now.
What do I think about the stability of the solution?
In terms of stability and reliability, we haven't encountered stability problems with CloudStack unless there are issues with our networking team. We did face a high-priority networking issue once, but even then, CloudStack allowed us to migrate hypervisors and VMs to other zones smoothly, which demonstrates its overall reliability.
What do I think about the scalability of the solution?
CloudStack is highly scalable. Our previous scalability efforts went well, and we are planning to implement further scalability. It's straightforward to add hypervisors to the CloudStack environment, and we manage configurations using Puppet for seamless integration, making it easy to get the services up and running promptly.
How are customer service and support?
I often communicate with CloudStack's technical support and customer service. We have a Slack channel where users raise tickets or issues, and we connect with them directly to resolve problems. We log in, share screens, and troubleshoot issues from CloudStack or via CLI before closing the tickets and informing users about the resolutions.
How would you rate customer service and support?
How was the initial setup?
I did not participate in the initial setup and deployment of CloudStack. By the time I arrived, the setup was already in place; however, I contributed to upgrading from 1,000 hypervisors to around 5,000, with plans to expand further.
What other advice do I have?
We use CloudStack support for multiple hypervisors. Regarding network management capabilities, I find it a good feature, particularly the network offering that allows for distributing IP addresses within clusters. If a VM is created, it gets an assigned IP address, and network traffic is managed effectively. If there's excessive traffic in one region, it automatically migrates to another. We have monitoring configured, so we're informed beforehand of network issues, giving us time to act and leverage networking capabilities to manage traffic effectively. On a scale of 1-10, I rate CloudStack a 9 out of 10.
Which deployment model are you using for this solution?
On-premises
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Other
Disclosure: My company has a business relationship with this vendor other than being a customer. partner