What is our primary use case?
I was previously an implementation engineer at Solo.io, which offers an enterprise version of Istio. Essentially, the product uses higher-level CRDs to manage the data plane and control plane of Istio. Solo also provides custom images of Istio, like FIPS-enabled versions.
Most of the use cases I worked on at Solo involved multi-cluster environments with service mesh, which are traditionally difficult to configure. Solo's product made it easier to manage large multi-cluster environments. For some customers, we were dealing with up to a hundred thousand apps in the mesh, or even more.
A lot of the large-scale use cases included outlier detection and failover of backend services across clusters. Mutual TLS (mTLS) was also a very popular use case. Istio simplifies enabling mTLS connections for front-end to back-end services in a microservice environment.
Typically, customers used the Istio Ingress Gateway as their primary API gateway. We also had many customers who wrote their own Envoy filters for the data plane, with a common use case being integrating a NextAuth gRPC service for redirects to their OIDC providers for front-end services. This was a very common implementation.
How has it helped my organization?
What needs improvement?
There's a new product being worked on, with contributions from the team at Solo.io, called Ambient Mesh.
This is being built into the open-source project. The purpose of Ambient Mesh was to make it more scalable, possibly getting rid of the sidecars on services to lower latency. Does it make Istio simpler or easier to use? I don’t think so. It might actually increase complexity, but it could work well for specialized use cases. But, you'd need knowledgeable engineers to implement it properly.
So, I think Istio has issues with performance and scalability. I guess that's the best way to put it.
I recently worked with a customer who was running a performance test with Istio. Their test involved creating 500 to 1,000 namespaces with a large number of sample services that were all being rolled out together. What we found was that Pilot, one of Istio's components, was crashing, causing the performance test to fail.
This isn’t a very common use case, spinning up a thousand services all at once, but it was overloading the control plane. We found that increasing the replica count on the control plane sometimes leads to issues with leader election. Pilot doesn't handle leader election very well, but a recent bug fix by the Solo.io team has improved this aspect. So, in terms of scalability, it’s improving. That's the best way to put it.
For how long have I used the solution?
I have been using this solution for three years now.
What do I think about the stability of the solution?
It is a stable product, but my use case is a bit unique because I was working with some of the largest Istio service mesh implementations. Because of this, I encountered more potential issues or bugs than the average consumer.
I was fortunate to work with a lot of the Istio upstream engineers, so I had direct access to get those issues remediated quickly. But, I’ve encountered a fair amount of bugs, particularly in larger environments.
So, I would rate the stability a nine out of ten. It’s a production-level open-source tool, part of the CNCF graduated project. I wouldn’t think twice about using Istio in production. In fact, I’m currently working on a project that will implement Istio.
What do I think about the scalability of the solution?
If we compare Istio’s scaling to a scenario where a thousand services were spun up with or without Istio, I won’t encounter the same problem without Istio.
So, it scales decently well, but there’s room for improvement. I would rate the scalability an eight out of ten.
There are not many end users because I work with the military now, implementing Istio use cases. Istio is very new to the Department of Defense, so I serve as a subject matter expert for these implementations.
For the project I'm working on, this will likely be the first time Istio is deployed within our organization, and we’ll be supporting a couple of hundred users.
How are customer service and support?
I used to work directly with those engineers, so my experience was a bit different. I worked with them to file issues in the open-source upstream repo.
Additionally, we would submit tickets internally at Solo.io, where Istio contributors were working. Those contributors would then work on the tickets and integrate them back into Istio upstream. So, I've worked both ways.
Those guys are world-class engineers. They are some of the most impressive people I've worked with.
I haven't encountered more knowledgeable support elsewhere.
How would you rate customer service and support?
How was the initial setup?
The initial setup is pretty straightforward, though it depends on your use case and what you're trying to achieve. Istio is highly configurable.
It has a very easy quick start where you can provide a set of default values that get you up and running quickly. However, if you’re working with larger use cases, you’ll need a more extensive Helm configuration for the control plane. So, it varies.
For a demo with a sample use case or a smaller environment, it’s very easy. If you're just looking to enable mTLS between services, that's also very simple.
Not anyone can use it; some prior experience is needed. Solo.io built its product around Istio with the intent of making it easier to use, but there's still a steep learning curve.
It's not like other products where you can just buy a license, deploy it in your environment, and be good to go. Istio sits at a foundational level in the deployment stack, meaning much of your infrastructure will be built on top of it.
When I did implementations, these projects would typically take at least a year and consist of three phases: design, a pretty intensive implementation phase where services are onboarded to the mesh, and finally, optimizing the mesh and integrating more features.
What other advice do I have?
If someone is looking for a service mesh, I would recommend Istio over any other options.
I would recommend not implementing all the features at once. Istio can quickly become difficult if you try to use every feature right from the start. You're likely to get frustrated and might even consider giving up, thinking that Istio isn't necessary.
I suggest starting with the simplest and most basic features that Istio offers, like mTLS, to help implement your zero-trust service environment. From there, you can gradually build out authorization, Envoy filters, and other extensibility features as needed.
Overall, I would rate it an eight out of ten. There’s a lot of new technology coming out that challenges Istio's usability. For instance, eBPF-based CNIs like Cilium are developing their own mesh solutions, which may or may not use sidecars and work very quickly.
Istio is facing some competition. Also, due to its complexity, Istio hasn’t taken off as much as it should have. It’s not meant for every use case, but in many, it works very well. So, I’d rate it an eight.
*Disclosure: My company does not have a business relationship with this vendor other than being a customer.