IBM InfoSphere DataStage vs StreamSets comparison

 

Comparison Buyer's Guide

Executive Summary
 

Categories and Ranking

IBM InfoSphere DataStage
Ranking in Data Integration
7th
Average Rating
7.8
Number of Reviews
37
Ranking in other categories
No ranking in other categories
StreamSets
Ranking in Data Integration
8th
Average Rating
8.4
Number of Reviews
24
Ranking in other categories
No ranking in other categories
 

Mindshare comparison

As of June 2024, in the Data Integration category, the mindshare of IBM InfoSphere DataStage is 6.6%, up from 6.2% compared to the previous year. The mindshare of StreamSets is 1.6%, up from 1.2% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Data Integration
Unique Categories:
No other categories found
No other categories found
 

Featured Reviews

Murali B - PeerSpot reviewer
Mar 28, 2024
Facilitated our peak data integration projects, offers good GUI and availability of connectors is strong
DataStage facilitated our peak data integration projects. For example, big data integrations have happened, particularly when we worked with BigQuery files... that integration server. DataStage parallel processing capabilities have improved data tasks. When I worked with DataStage, it could handle around two terabytes of data. We have other appliances as well, but we're processing data concurrently. It was good. My team supported it well, and everything worked fine. The GUI was good. Compared to Cloud Pak for Data, we have some enhanced connectors in the standard InfoSphere DataStage version. That standard version is really good; it's easy to use. When we want to find out the absolute quality of data, the governance features really helped. For example, when we tried to identify discrepancies between systems, it worked well.
Nantabo Jackie - PeerSpot reviewer
Mar 24, 2023
Simplified pipelines and helped us break down data silos within our organization
The design experience when implementing batch streaming or ECL pipelines is very easy and straightforward. When we initially attempted to integrate StreamSets with Kafka, it was somewhat challenging until we consulted the documentation, after which it became straightforward. We use StreamSets to move data into modern analytics platforms. Moving the data into modern analytics platforms is still complex. It requires a lot of understanding of logic. StreamSets enables us to build data pipelines without knowing how to code. StreamSets' ability to build data pipelines without requiring us to know complex programming is very important, as it allows us to focus on our projects without spending time writing code. StreamSets' Transformer for Snowflake is simple to use for designing both simple and complex transformation logic. StreamSets' Transformer for Snowflake is extremely important to me as it helps me to connect external data sources and keep my internal workflow organized. Transformer for Snowflake's functionality is a perfect ten out of ten. It is important and cost-effective that Transformer for Snowflake is a serverless engine embedded within the platform, as without this feature, it would be very expensive. This feature helps us to sell at lower budget costs, which would otherwise be at a high cost with other servers. StreamSets has helped improve our organization. StreamSets simplified pipelines for our organization. It is easier to complete a project when we know where and how to start, and working with the team remotely makes it more efficient. This helps us to save time and be more organized when creating data pipelines. Being a structured company that produces reliable resources for our application benefits both our clients and contacts. StreamSets' built-in data drift resilience plays a part in our ETL operations. With prior knowledge, the built-in data drift resilience is very effective, but it can be challenging to implement without the preexisting knowledge. The built-in data drift resilience reduced the time it takes us to fix data drift breakages by 45 percent. StreamSets helped us break down data silos within our organization. The use of StreamSets to break down data silos enabled us to be confident in the services and products we provide, as well as the real-time streaming we offer. This has had a positive impact on our business, as it allowed us to accurately determine the analytics we need to present to stakeholders, clients, and our sources while ensuring that the process is secure and transparent. StreamSets saved us time because anyone can use StreamSets not just developers. We can save around 40 percent of our time. StreamSets' reusable assets helped us reduce workload by around 25 percent. StreamSets saved us money by not having to hire developers with specialized skills. We saved around $2,000 US. StreamSets helped us scale our data operations. Since StreamSets makes it easy to scale our data operations, it enabled us to know exactly where to start at any time. We are aware of the timeline for completing the project, and depending on our familiarity with the software, we can come up with a solution quickly.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"The most valuable feature for our data processing needs is IBM InfoSphere DataStage's capability to handle ETL tasks with large record volumes."
"The most valuable feature is the product's versatility to inject data."
"The most valuable feature is the data integration for data warehousing."
"We are mostly using transmission rules. It has a lot of functions and logic related to transmission. It is a user-friendly tool with in-built functions."
"As a data integration platform, it is easy to use. It is quite robust and useful for volumetric analysis when you have huge volumes of data. We have tested it for up to ten million rows, and it is robust enough to process ten million rows internally with its parallel processing. Its error logging mechanism is far simpler and easier to understand than other data integration tools. The newer version of InfoSphere has the data catalog and IDC lineage. They are helpful in the easy traceability of columns and tables."
"The product is easy to deploy."
"It works with multiple servers and offers high availability."
"IBM is stable and accurate to monitor. It's easy to understand to monitor the data lineage from source to target."
"I have used Data Collector, Transformer, and Control Hub products from StreamSets. What I really like about these products is that they're very user-friendly. People who are not from a technological or core development background find it easy to get started and build data pipelines and connect to the databases. They would be comfortable like any technical person within a couple of weeks."
"It's very easy to integrate. It integrates with Snowflake, AWS, Google Cloud, and Azure. It's very helpful for DevOps, DataOps, and data engineering because it provides a comprehensive solution, and it's not complicated."
"It is a very powerful, modern data analytics solution, in which you can integrate a large volume of data from different sources. It integrates all of the data and you can design, create, and monitor pipelines according to your requirements. It is an all-in-one day data ops solution."
"The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customize it to do what you need. Many other tools have started to use features similar to those introduced by StreamSets, like automated workflows that are easy to set up."
"StreamSets’ data drift resilience has reduced the time it takes us to fix data drift breakages. For example, in our previous Hadoop scenario, when we were creating the Sqoop-based processes to move data from source to destinations, we were getting the job done. That took approximately an hour to an hour and a half when we did it with Hadoop. However, with the StreamSets, since it works on a data collector-based mechanism, it completes the same process in 15 minutes of time. Therefore, it has saved us around 45 minutes per data pipeline or table that we migrate. Thus, it reduced the data transfer, including the drift part, by 45 minutes."
"The Ease of configuration for pipes is amazing. It has a lot of connectors. Mainly, we can do everything with the data in the pipe. I really like the graphical interface too"
"The most valuable features are the option of integration with a variety of protocols, languages, and origins."
"The scheduling within the data engineering pipeline is very much appreciated, and it has a wide range of connectors for connecting to any data sources like SQL Server, AWS, Azure, etc. We have used it with Kafka, Hadoop, and Azure Data Factory Datasets. Connecting to these systems with StreamSets is very easy."
 

Cons

"There could be more customization options for the product."
"Currently lacking virtualization ability."
"Working with some of the big data components is good, but I can see improvements are needed."
"What needs improvement in IBM InfoSphere DataStage is its pricing. The pricing for the solution is higher than its competitors, so a lot of the clients my company has worked with prefer other tools over IBM InfoSphere DataStage because of the high price tag. Another area for improvement in the solution stems from a lot of new types of databases, for example, databases in the cloud and big data have become available, and IBM InfoSphere DataStage is working on various connectors for different data sources, but that still isn't up-to-date, meaning that some connectors are missing for modern data sources. The latest version of IBM InfoSphere DataStage also has a complex architecture, so my team faced frequent outages and that should be improved as well."
"It takes a lot of time to actually trigger your job and then go into the logs and other stuff. So all of this is really time-consuming."
"The interface needs improvement."
"The response time from support is slow and needs to be improved."
"DataStage is quite expensive. It is too hard to find a consultant using DataStage in Turkey."
"The design experience is the bane of our existence because their documentation is not the best. Even when they update their software, they don't publish the best information on how to update and change your pipeline configuration to make it conform to current best practices. We don't pay for the added support. We use the "freeware version." The user community, as well as the documentation they provide for the standard user, are difficult, at best."
"Visualization and monitoring need to be improved and refined."
"We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which was painful. Also, pipeline failures were common, and data drifting wasn't addressed, which made things worse. Licensing was another issue we encountered."
"StreamSet works great for batch processing but we are looking for something that is more real-time. We need latency in numbers below milliseconds."
"I would like to see further improvement in the UI. In addition, upgrades are not automatic and they should be automated. Currently, we have to manually upgrade versions."
"The logging mechanism could be improved. If I am working on a pipeline, then create a job out of it and it is running, it will generate constant logs. So, the logging mechanism could be simplified. Now, it is a bit difficult to understand and filter the logs. It takes some time."
"They need to improve their customer care services. Sometimes it has taken more than 48 hours to resolve an issue. That should be reduced. They are aware of small or generic issues, but not the more technical or deep issues. For those, they require some time, generally 48 to 72 hours to respond. That should be improved."
"StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target."
 

Pricing and Cost Advice

"It's quite expensive."
"It's very expensive."
"The pricing depends on the setup. However, we paid $100,000 as a one-time cost for an on-premises setup."
"High-cost of ownership: They could take a page from open source software."
"The product is expensive."
"Small and medium-sized companies cannot afford to pay for this solution."
"It is quite expensive."
"Our internal team takes care of group licensing and cost. We don't have individual licenses. We have group licensing at the company level. Usually, IBM doesn't charge anything separately on the licensing side. For storage and everything else, we are paying around $6,000 per month, which is not very high. It includes Linux data storage, execution, and licensing. They're charging $40 for one-hour execution. Based on that, we are spending around $2,000 on the production environment and $1,000 on the lower environment for testing and development-side executions. For the mainframe, we are using the Db2 mainframe database, and we are spending around $1,000 on the Db2 mainframe database as well. All this comes out to be around $6,000. We, however, would like to have some cost reduction."
"StreamSets Data Collector is open source. One can utilize the StreamSets Data Collector, but the Control Hub is the main repository where all the jobs are present. Everything happens in Control Hub."
"The pricing is too fixed. It should be based on how much data you need to process. Some businesses are not so big that they process a lot of data."
"There are two editions, Professional and Enterprise, and there is a free trial. We're using the Professional edition and it is competitively priced."
"StreamSets is expensive, especially for small businesses."
"It has a CPU core-based licensing, which works for us and is quite good."
"We are running the community version right now, which can be used free of charge."
"The licensing is expensive, and there are other costs involved too. I know from using the software that you have to buy new features whenever there are new updates, which I don't really like. But initially, it was very good."
"It's not so favorable for small companies."
report
Use our free recommendation engine to learn which Data Integration solutions are best for your needs.
787,817 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
26%
Manufacturing Company
11%
Computer Software Company
10%
Insurance Company
7%
Financial Services Firm
17%
Computer Software Company
13%
Manufacturing Company
8%
Government
7%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

Would you upgrade to more premium versions of IBM InfoSphere DataStage?
My company currently uses the free version of the product, and we are definitely switching to a paid one. We needed a tool that can help us not only integrate our data but use it effectively. For ...
Is IBM InfoSphere DataStage more difficult to use compared to other tools in the field?
I think the tool may cause some difficulties if you have not used other data integration solutions before. I have worked at companies that used different tools for data integration, and they work ...
Do you rely on IBM Cloud Paks for your data? Have you utilized this product, or do you use IBM InfoSphere DataStage without it?
IBM Cloud Paks makes a big difference in your data integration. My company has been using it alongside IBM InfoSphere DataStage and while the main product is good on its own, this one truly expands...
What do you like most about StreamSets?
The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customiz...
What needs improvement with StreamSets?
We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which ...
What is your primary use case for StreamSets?
StreamSets is used for data transformation rather than ETL processes. It focuses on transforming data directly from sources without handling the extraction part of the process. The transformed data...
 

Learn More

Video not available
 

Overview

 

Sample Customers

Dubai Statistics Center, Etisalat Egypt
Availity, BT Group, Humana, Deluxe, GSK, RingCentral, IBM, Shell, SamTrans, State of Ohio, TalentFulfilled, TechBridge
Find out what your peers are saying about IBM InfoSphere DataStage vs. StreamSets and other solutions. Updated: June 2024.
787,817 professionals have used our research since 2012.