What is our primary use case?
We use SSIS to copy production data to Reporting Repositories. We use it for data movement from and to SQL or flat files such as CSV or fixed data. This is primarily to build and update a data lake and data warehouse.
More custom use cases are to do DQ validation comparing source to destination, and to extract data from bat files and SSIS files in order to build a metadata repository for documenting and mapping all the ETL packages. We use this to find the packages responsible for any particular table or file, or vice versa. This system, along with statistics logging, also gives us summary reporting capability.
How has it helped my organization?
I like that SSIS lets us build programs top-down. Starting with the control section, then drilling into the details of the steps make more sense than what Informatica requires, where you have to build each block first, then use a different tool to build the control structure.
In production, SSIS does a very good job of distributing work to many threads. It is robust and does a good job of handling overload conditions. It is simple for basic jobs, yet able to do very complex tasks. When all else fails, code can be written to make your own custom task or even tool.
I believe SSIS is not given the credit it deserves. SSIS does what most other new vendors tout as great features. But it is a very mature and stable product, unlike those others. It is supported by a larger company than many of those other products.
The error messages are pretty clear and point you to the problem well.
I am disappointed that Azure Data Factory did not stay closer to SSIS.
What is most valuable?
Data Flows are the main component we use. These can range from a simple source to sink ETL, to many source to many sink dataflows.
For flexibility and to make reading the logs in case of errors, I prefer not to put multiple copy operations in the same dataflow, although that is possible. I try to keep in mind how to make production operations as easy as possible in case of failures. I want to give the night crew the best opportunity for success when an error does occur.
We have about one error per 400 processes. About 3/4 of these were transitory issues with issues like slow connections and temporary outages. By making the code rerunnable, we can auto-restart on any error, and that resolves 3/4 without human intervention. This leaves any failure after two retries the only issue that the night crew has to do Root Cause Analysis (RCA) on.
What needs improvement?
We purchase an add-on called task factory primarily to allow bulk delete, update, and upsert capability. I'd like to see this be part of the standard package.
I believe there are ways to build a model and set variables so that it can be a generic process.
In my next system, I would like to have a generic process that would handle all the logging and processing in a model that can be modified and enhanced as the need for a better process, or different statistics to be logged is discovered. I'd want this in a way that the model can be changed and all the processes, with their unique parameters, could all be changed with the model upgraded.
I believe they should add some features that help to create the code using a model. This would allow for continuous improvement of the model uses and easy replication of all the different programs that use the model.
For how long have I used the solution?
What do I think about the stability of the solution?
Very mature and stable, and already has great performance features.
What do I think about the scalability of the solution?
It can be very scalable both by adding memory or cores, and by running the SSIS on a farm of servers.
How are customer service and support?
I did not really need much. SSIS has a lot of excellent help available for a web search.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
Prior solutions used SQL triggers
We have more control over the ETL with SSIS.
How was the initial setup?
Installation is quite easy. There are some drivers and setting you may need to make, based on the sources or sinks you need. I am hoping these are more embedded in the install process, as I have not set up a server since 2015.
What about the implementation team?
What was our ROI?
What's my experience with pricing, setup cost, and licensing?
SSIS is part of SQL server, in itself a very solid product.
You can run SSIS on the same SQL server instance where your data warehouse is, using the compute power at night to update, and during the day for reporting and data analysis.
Which other solutions did I evaluate?
I did not evaluate other options. SSIS was available, and it worked well.
I later changed jobs to another company that used SSIS, and worked to improve and grow their data warehouse solution.
After using SSIS, we have tried talend, informatica, and spark solutions. I would go back to SSIS, even for a non Microsoft shop, if I were allowed to.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.