What is our primary use case?
We use this solution for ETL, which includes data summation and cleaning.
This solution used in an on-premise deployment, for now.
What is most valuable?
I have used most of the standard SQL features, but the ones that stand out are the Data Flows and Bulk Import.
What needs improvement?
The synchronous processing needs to be improved. For batch processing, it works fine, but when you start to do real-time processing, I find that this solution is not strong, depending on how you use it. If you use it for short batches, micro batching, that might work, but it is not as good for queuing real-time processing. This solution needs full support for real-time processing.
The solution needs better support for XML and JSON.
For how long have I used the solution?
I've been using this solution for the past ten years.
What do I think about the stability of the solution?
I use this product extensively, on a daily basis. It is stable.
Over time, I think we'll most likely start to decrease usage. This will happen as we move to more real-time processing, and we will most likely start to do more queue processing. In terms of batch processing, it will scale down considerably.
What do I think about the scalability of the solution?
In the catalog, it's supposed to be scalable. I think that it has support for an SQL cluster.
In my opinion, I think it's a bit more limited in terms of scalability, although it scales with the database. I would say that the scalability is intermediate in terms of being able to launch multiple instances, or it could do load balancing as well. I think that would be a bit more challenging.
How are customer service and technical support?
We haven't needed to contact Microsoft technical support. When we have trouble we usually use Google to search for what we need to find out. Also, in terms of issues, there's a lot of information on SQLServerCentral and Stack Overflow.
Which solution did I use previously and why did I switch?
For the most part, we started with this solution.
How was the initial setup?
The initial setup of this solution is pretty much straightforward.
If you want to develop with Visual Studio then you have to install the data service add-ons afterward, so it is a bit cumbersome. Then, if you want to use the catalog on the database, with deployment there are often security issues and you have to get an SQL catalog up and running. This can also be a bit cumbersome.
I would say that it takes a day or two to deploy this solution in a new environment, and it can be completed by one or two people. A single developer may be sufficient to deploy and maintain the system.
What about the implementation team?
I implemented some of this myself and had help in terms of setting up the security. There are often security settings that require the assistance of a DBA.
Which other solutions did I evaluate?
This solution was already in place. That's what is available and it's what people know. Going forward, this will most likely change.
What other advice do I have?
My advice to anybody implementing this solution is to look into whether to use it on a catalog in a database, versus using package deployment. There are pros and cons to both approaches in terms of deployment and security. I would say that's something that needs to be evaluated quite early. There are lots of benefits to the catalog, but also a bit more admin attached to it.
Another consideration is real-time processing needs. If this is a requirement then I would recommend against using this solution, unless the next version has a new set of features specifically for real-time processing.
I would rate this solution an eight out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Curious people's experiences when they mention "fail a lot" or scalability. I've used many ETL tools in my career - most of them very expensive and I'd put SSIS up against any of them for reliability and performance (within limits).
Scalability is largely comparing to expectations and it depends on your expectations. I think people too often compare completely different architectures and are surprised when they scale differently. SSIS is single server. No MPP going on here folks. You get a lot more than you've paid for (which is really nothing is you already own SqlServer). SSIS sure beats the open source stuff I've seen out there that really sucks. Try Pentaho written in Java if you want slow. I've read where people have custom coded front ends to fire multiple SSIS servers and there's ways of partitioning data flows but if you are getting into that you might be on the wrong tool. Consider the opposite - most people are running SSIS on the target database box so its competing with the database server as well as not utilizing more than one server. I'm doing that and actually getting quite great performance (again - its all about expectations).
So yes if you need millions per second SSIS is not the tool you want. My benchmark with SSIS is @10,000 rows per second to stage large rows through a data flow. I'm guessing if you need a lot faster than that you have significant volumes and big pockets so why would you look at a free tool that's designed to be installed on a database server?
As for failing, it would only fail due to buffers if you did something with altering buffers that you should not have done. That would be your bad sorry. Or you're doing something else silly like running on 4gb VM and didn't set a max memory on the Sql Server so basically everything crawls or fails. Hey - some of you are laughing but some are probably scratching their heads and asking, so what's wrong with that?