What is our primary use case?
We used Chef, the automation tool, as an Infrastructure as Code tool for configuration deployment, such as deploying patches on numerous servers, first on the development box, then on QA, and then on production. That was the main use case for us in the initial stage of using Chef. We then started using Chef for configuration changes. We deployed all those things on the workstation, pushed the recipes from the workstation to the Chef server, and the Chef client pulled from the Chef server, so that is how all the configurations got matched.
Before using Chef, we patched our servers manually or with shell scripts. Suppose we had 100 servers in our development area and needed to patch them all in a day or a week. We would run those shell scripts and wait for their outputs while going on each server to find out what patches had already been applied and if the server had been rebooted. That was a very long-running task for us. With the help of Chef, we configured all those nodes as Chef clients and matched or configured all those nodes by subscribing them to the Chef server. We changed the configuration file with the help of a recipe and pushed those recipes from the workstation to the Chef server. Chef clients started comparing their configuration files from their client to the server. That is how they started to patch themselves and make changes to their configuration files. This was the main use case for us when we started to use this automation tool.
After that, suppose we had 10 servers currently and 10 new servers. We had an application to deploy and the application team requested 10 servers with those many tools, users, configurations, and server hardlink compliance. We simply wrote those recipes in the workstation and pushed those recipes to the Chef server. Chef Client and Ohai were already installed on all 10 servers, and they simply compared their configuration files from Chef client to Chef server. As soon as there was a difference in the configuration, the Chef client pulled all those configurations and automatically deployed all those configurations on those nodes. That is how easy it was for us to deliver the servers or the configurations in a faster manner with the help of Chef.
For SAP servers, we needed to make many changes in files such as sysctl.conf on the Linux server. Suppose we had a brand new server delivered to us from a cloud platform, and we needed to make all those changes so that it could have all the security-related features or changes on the server. Instead of manually doing those changes on all those SAP servers, we could simply write a Chef recipe once and it would be applied on all those SAP servers. That is how it benefited us.
When I used Chef in my previous organization five years ago, it was on-prem servers.
We had not used any other solution before using Chef. We were using all those tasks with shell scripts only. After using Chef, we found other configuration management tools, such as Ansible and Terraform, which we found better, and then we switched from Chef to Ansible.
What is most valuable?
The main use case is the Infrastructure as Code. We can simply change any system configuration, whether it is related to changes of any port number of any files, or we can use package installation. We can help with user and group management. If we need to deploy any application, security hardening is the main tool. For the SAP servers, we need to update many codes, files, programs, and features on the server. With the help of Chef, we can just change those things in a recipe, and those recipes are going to be pulled by all those systems, changing themselves in a minute. Security and compliance hardening are also features we find very useful.
The manual work has been reduced with the help of this automation. We do not need to have 15 or 20 engineers to do those tasks manually. We only need two or three people to write those recipes and upload them on the Chef server. Once the configuration tool pulls those changes from the Chef server, it automatically deploys all those changes. It has reduced our manpower and our costs.
For example, suppose we have 20 servers to patch in this cycle and we need to deliver them in a week. Earlier we used to have six or seven engineers to patch all those 20 servers because it was a manual task. They had to log in on those servers and review if the patches had been correctly deployed. They also needed to validate the changes performed before or after the reboot of the servers. That was a lengthy task and they had to manage all those screenshots and comparisons of those screenshots. With the help of Chef automation tool, we simply deployed the recipe on the workstation and it got uploaded to the workstation, and all those 10 servers and 15 servers pulled the patches from the Chef server. With the help of Chef Client and Ohai, they patched themselves, and we simply clicked another recipe to reboot the servers. That is how only two or three engineers are required to patch those 20 servers. The number of engineers has been reduced and the timing has also been reduced by 10 to 12 hours for each server.
Chef is stable. If we have a large number of servers, it is stable.
What needs improvement?
There are other automation tools, configuration management tools in the market, which offer many good functionalities compared to Chef. For Chef, we need to install those agents, the Chef client, on all those nodes. That is another heinous task to perform on those nodes. Compared with other tools, they do not require any agent; they simply push configurations to all the clients. Chef needs to improve on this agent installation on all those nodes.
I would say that the agent configuration is required, and we need to manage the workstation, the Chef server, and then the Chef client. These two or three things are very difficult. It is a time-taking task compared with other configuration management tools.
They need to compete with other tools, such as Ansible or Terraform. They should work on their agent part. If they can remove the agent installation on the nodes and combine both the Chef server and workstation into one server, that will provide a significant benefit in cost for the clients. They should aim for an agentless architecture rather than an agent-based architecture, which will help other customers.
That is a very difficult thing because I have stopped using Chef. If you have very good developers who are skilled in Ruby language and can write codes in the Chef recipe, then those developers should start using Chef.
For how long have I used the solution?
I have been working in my current field for 10 plus years.
What do I think about the stability of the solution?
Chef is stable. If we have a large number of servers, it is stable.
What do I think about the scalability of the solution?
Chef can be scaled. It is very good in scalability, and we just need to configure the client nodes and it will start adding them, and those clients will start pulling the information from the Chef server. It is highly scalable and we have not faced any issues when using Chef in terms of scalability.
How are customer service and support?
Since we used the free version, the license cost was zero. We have not used the paid version, so we have not had the option to create a ticket for support. However, the developer option or community support is good on Chef, and Chef codes, which are in Ruby language, are easily available on Chef Supermarket, so we found them very useful.
Which solution did I use previously and why did I switch?
We had not used any other solution before using Chef. We were using all those tasks with shell scripts only. After using Chef, we found other configuration management tools, such as Ansible and Terraform, which we found better, and then we switched from Chef to Ansible.
There was another tool called Puppet, but Puppet and Chef are both pull-based configuration tools, and Puppet has a limitation of 25 nodes to perform. That is why we went with Chef.
How was the initial setup?
We simply downloaded Chef from their website, chef.io, and deployed it on the workstation to start using it.
What's my experience with pricing, setup cost, and licensing?
The licensing cost is zero for Chef if you are using the free version. They have developed other versions, such as SaaS-based and self-managed. For the SaaS-based version, it is $59 per node per year. I think it can be costly considering the advantages and disadvantages of Chef. Anyone can start using the free version or the zero-cost license with no upfront cost required, but the setup requires a cost. We need to set up a workstation server and a Chef server, so there will be costs for those two servers regardless of whether they are deployed on-prem or on the cloud.
Which other solutions did I evaluate?
There are other automation tools, configuration management tools in the market, which offer many good functionalities compared to Chef. For Chef, we need to install those agents, the Chef client, on all those nodes. That is another heinous task to perform on those nodes. Compared with other tools, they do not require any agent; they simply push configurations to all the clients. Chef needs to improve on this agent installation on all those nodes.
I would say that the agent configuration is required, and we need to manage the workstation, the Chef server, and then the Chef client. These two or three things are very difficult. It is a time-taking task compared with other configuration management tools.
They need to compete with other tools, such as Ansible or Terraform. They should work on their agent part. If they can remove the agent installation on the nodes and combine both the Chef server and workstation into one server, that will provide a significant benefit in cost for the clients. They should aim for an agentless architecture rather than an agent-based architecture, which will help other customers.
That is a very difficult thing because I have stopped using Chef. If you have very good developers who are skilled in Ruby language and can write codes in the Chef recipe, then those developers should start using Chef.
What other advice do I have?
After using Chef, we found other configuration management tools, such as Ansible and Terraform, which we found better, and then we switched from Chef to Ansible.
There was another tool called Puppet, but Puppet and Chef are both pull-based configuration tools, and Puppet has a limitation of 25 nodes to perform. That is why we went with Chef.
I chose that number because the other automation tools, the other configuration tools are far better. I would give them nine or ten, for example, Ansible. They are not agent-based tools; they simply push configurations, and there is no need to install these tools or the clients on the nodes. Other tools are much better than Chef these days. I would rate this review as five out of ten.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.