Try our new research platform with insights from 80,000+ expert users

IBM Cloud Pak for Data vs StreamSets comparison

 

Comparison Buyer's Guide

Executive SummaryUpdated on Dec 19, 2024

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Categories and Ranking

IBM Cloud Pak for Data
Ranking in Data Integration
25th
Average Rating
7.8
Reviews Sentiment
6.5
Number of Reviews
13
Ranking in other categories
Data Virtualization (3rd)
StreamSets
Ranking in Data Integration
16th
Average Rating
8.4
Reviews Sentiment
7.0
Number of Reviews
21
Ranking in other categories
No ranking in other categories
 

Mindshare comparison

As of May 2025, in the Data Integration category, the mindshare of IBM Cloud Pak for Data is 1.8%, up from 1.7% compared to the previous year. The mindshare of StreamSets is 1.6%, up from 1.4% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Data Integration
 

Featured Reviews

Michelle Leslie - PeerSpot reviewer
Starts strong with data management capabilities but needs a demo database
What I would love to see is an end-to-end, almost a training demo database of some sort, where one of the biggest problems with data management is demonstrated. There are so many components to data management, and more often than not, people understand one thing really well. They may understand DataStage and how to move data around, but they do not see the impact of moving data incorrectly. They also do not see the impact of everyone understanding a piece of data in the same way. I would love Cloud Pak to come with a demo database that illustrates the different components of data management in a logical way, so I can see the whole picture instead of just the area I'm specializing in. It would be great if Cloud Pak, from a data modeling point of view, allowed us to import our PDMs, for example. It would be ideal to import and create business terms in Cloud Pak. The PEA would be great to create the technical data. The association between the business and the technical metadata could then be automated by pulling it through from your ACE models. The data modeling component is available in Cloud Pak. Additionally, when it comes to Cloud Pak, even though it has the NextGen DataStage built into it, there is Cloud Pak for data integration as well. Currently, I do not think we have a full enough understanding of how CP4D and CP4I can enhance each other.
Nantabo Jackie - PeerSpot reviewer
Simplified pipelines and helped us break down data silos within our organization
The design experience when implementing batch streaming or ECL pipelines is very easy and straightforward. When we initially attempted to integrate StreamSets with Kafka, it was somewhat challenging until we consulted the documentation, after which it became straightforward. We use StreamSets to move data into modern analytics platforms. Moving the data into modern analytics platforms is still complex. It requires a lot of understanding of logic. StreamSets enables us to build data pipelines without knowing how to code. StreamSets' ability to build data pipelines without requiring us to know complex programming is very important, as it allows us to focus on our projects without spending time writing code. StreamSets' Transformer for Snowflake is simple to use for designing both simple and complex transformation logic. StreamSets' Transformer for Snowflake is extremely important to me as it helps me to connect external data sources and keep my internal workflow organized. Transformer for Snowflake's functionality is a perfect ten out of ten. It is important and cost-effective that Transformer for Snowflake is a serverless engine embedded within the platform, as without this feature, it would be very expensive. This feature helps us to sell at lower budget costs, which would otherwise be at a high cost with other servers. StreamSets has helped improve our organization. StreamSets simplified pipelines for our organization. It is easier to complete a project when we know where and how to start, and working with the team remotely makes it more efficient. This helps us to save time and be more organized when creating data pipelines. Being a structured company that produces reliable resources for our application benefits both our clients and contacts. StreamSets' built-in data drift resilience plays a part in our ETL operations. With prior knowledge, the built-in data drift resilience is very effective, but it can be challenging to implement without the preexisting knowledge. The built-in data drift resilience reduced the time it takes us to fix data drift breakages by 45 percent. StreamSets helped us break down data silos within our organization. The use of StreamSets to break down data silos enabled us to be confident in the services and products we provide, as well as the real-time streaming we offer. This has had a positive impact on our business, as it allowed us to accurately determine the analytics we need to present to stakeholders, clients, and our sources while ensuring that the process is secure and transparent. StreamSets saved us time because anyone can use StreamSets not just developers. We can save around 40 percent of our time. StreamSets' reusable assets helped us reduce workload by around 25 percent. StreamSets saved us money by not having to hire developers with specialized skills. We saved around $2,000 US. StreamSets helped us scale our data operations. Since StreamSets makes it easy to scale our data operations, it enabled us to know exactly where to start at any time. We are aware of the timeline for completing the project, and depending on our familiarity with the software, we can come up with a solution quickly.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"One of Cloud Pak's best features is the Watson Knowledge Catalog, which helps you implement data governance."
"Cloud Pak is a very, very, very good system."
"I love the way that I can start at a very basic level with my data management journey by capturing my policies, justifying my data, and putting them into different categories to say this is data relating to individuals, for example, or data relating to geography."
"Scalability-wise, I rate the solution a nine or ten out of ten."
"Its data preparation capabilities are highly valuable."
"The most valuable features are data virtualization and reporting."
"The most valuable features of IBM Cloud Pak for Data are the Watson Studio, where we can initiate more groups and write code. Additionally, Watson Machine Learning is available with many other services, such as APIs which you can plug the machine learning models."
"The most valuable feature of IBM Cloud Pak for Data is the Modeler flows. The ability to develop models using a graphical approach and the capability to connect to various sources, as well as the data virtualization capabilities, allow me to easily access and utilize data that is dispersed across different sources."
"The Ease of configuration for pipes is amazing. It has a lot of connectors. Mainly, we can do everything with the data in the pipe. I really like the graphical interface too"
"One of the things I like is the data pipelines. They have a very good design. Implementing pipelines is very straightforward. It doesn't require any technical skill."
"Also, the intuitive canvas for designing all the streams in the pipeline, along with the simplicity of the entire product are very big pluses for me. The software is very simple and straightforward. That is something that is needed right now."
"The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customize it to do what you need. Many other tools have started to use features similar to those introduced by StreamSets, like automated workflows that are easy to set up."
"What I love the most is that StreamSets is very light. It's a containerized application. It's easy to use with Docker. If you are a large organization, it's very easy to use Kubernetes."
"The ETL capabilities are very useful for us. We extract and transform data from multiple data sources, into a single, consistent data store, and then we put it in our systems. We typically use it to connect our Apache Kafka with data lakes. That process is smooth and saves us a lot of time in our production systems."
"The most valuable features are the option of integration with a variety of protocols, languages, and origins."
"StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall."
 

Cons

"The interface could improve because sometimes it becomes slow. Sometimes there is a delay between clicks when using the software, which can make the development process slow. It can take a few seconds to complete one action, and then a few more seconds to do the next one."
"The technical support could be a little better."
"What I would love to see is an end-to-end, almost a training demo database of some sort, where one of the biggest problems with data management is demonstrated."
"The solution's user experience is an area that has room for improvement."
"One thing that bugs me is how much infrastructure Cloud Pak requires for the initial deployment. It doesn't allow you to start small. The smallest permitted deployment is too big. It's a huge problem that prevents us from implementing the solution in many scenarios."
"One challenge I'm facing with IBM Cloud Pak for Data is native features have been decommissioned, such as XML input and output. Too many changes have been made, and my company has around one hundred thousand mappings, so my team has been putting more effort into alternative ways to do things. Another area for improvement in IBM Cloud Pak for Data is that it's more complicated to shift from on-premise to the cloud. Other vendors provide secure agents that easily connect with your existing setup. Still, with IBM Cloud Pak for Data, you have to perform connection migration steps, upgrade to the latest version, etc., which makes it more complicated, especially as my company has XML-based mappings. Still, the XML input and output capabilities of IBM Cloud Pak for Data have been discontinued, so I'd like IBM to bring that back."
"Cloud Pak would be improved with integration with cloud service providers like Cloudera."
"There is a solution that is part of IBM Cloud Pak for Data called Watson OpenScale. It is used to monitor the deployed models for the quality and fairness of the results. This is one area that needs a lot of improvement."
"We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which was painful. Also, pipeline failures were common, and data drifting wasn't addressed, which made things worse. Licensing was another issue we encountered."
"Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful."
"We create pipelines or jobs in StreamSets Control Hub. It is a great feature, but if there is a way to have a folder structure or organize the pipelines and jobs in Control Hub, it would be great. I submitted a ticket for this some time back."
"The execution engine could be improved. When I was at their session, they were using some obscure platform to run. There is a controller, which controls what happens on that, but you should be able to easily do this at any of the cloud services, such as Google Cloud. You shouldn't have any issues in terms of how to run it with their online development platform or design platform, basically their execution engine. There are issues with that."
"Sometimes, it is not clear at first how to set up nodes. A site with an explanation of how each node works would be very helpful."
"If you use JDBC Lookup, for example, it generally takes a long time to process data."
"The software is very good overall. Areas for improvement are the error logging and the version history. I would like to see better, more detailed error logging information."
"StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target."
 

Pricing and Cost Advice

"I think that this product is too expensive for smaller companies."
"Cloud Pak's cost is a little high."
"It's quite expensive."
"For the licensing of the solution, there is a yearly payment that needs to be made. Also, since it is expensive, cost-wise, I rate the solution an eight or nine out of ten."
"IBM Cloud Pak for Data is expensive. If we include the training time and the machine learning, it's expensive. The cost of the execution is more reasonable."
"I don't have the exact licensing cost for IBM Cloud Pak for Data, as my company is still finalizing requirements, including monthly, yearly, and three-year licensing fees. Still, on a scale of one to five, I'd rate it a three because, compared to other vendors, it's more complicated."
"The solution's pricing is competitive with that of other vendors."
"The solution is expensive."
"We use the free version. It's great for a public, free release. Our stance is that the paid support model is too expensive to get into. They should honestly reevaluate that."
"StreamSets Data Collector is open source. One can utilize the StreamSets Data Collector, but the Control Hub is the main repository where all the jobs are present. Everything happens in Control Hub."
"Its pricing is pretty much up to the mark. For smaller enterprises, it could be a big price to pay at the initial stage of operations, but the moment you have the Seed B or Seed C funding and you want to scale up your operations and aren't much worried about the funds, at that point in time, you would need a solution that could be scaled."
"The overall cost is very flexible so it is not a burden for our organization... However, the cost should be improved. For small and mid-size organizations it might be a challenge."
"The pricing is too fixed. It should be based on how much data you need to process. Some businesses are not so big that they process a lot of data."
"The licensing is expensive, and there are other costs involved too. I know from using the software that you have to buy new features whenever there are new updates, which I don't really like. But initially, it was very good."
"We are running the community version right now, which can be used free of charge."
"I believe the pricing is not equitable."
report
Use our free recommendation engine to learn which Data Integration solutions are best for your needs.
852,764 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
29%
Computer Software Company
10%
Manufacturing Company
10%
Government
6%
Financial Services Firm
13%
Computer Software Company
12%
Manufacturing Company
10%
Insurance Company
9%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about IBM Cloud Pak for Data?
DataStage allows me to connect to different data sources.
What is your experience regarding pricing and costs for IBM Cloud Pak for Data?
The setup cost is very expensive. The cost depends on the pieces of the solution I'm using, how much data I have, and whether it's on the cloud or on-prem.
What needs improvement with IBM Cloud Pak for Data?
What I would love to see is an end-to-end, almost a training demo database of some sort, where one of the biggest problems with data management is demonstrated. There are so many components to data...
What do you like most about StreamSets?
The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customiz...
What needs improvement with StreamSets?
One issue I observed with StreamSets is that the memory runs out quickly when processing large volumes of data. Because of this memory issue, we have to upgrade our EC2 boxes in the Amazon AWS infr...
What is your primary use case for StreamSets?
We are using StreamSets for batch loading.
 

Also Known As

Cloud Pak for Data
No data available
 

Overview

 

Sample Customers

Qatar Development Bank, GuideWell, Skanderborg Music Festival
Availity, BT Group, Humana, Deluxe, GSK, RingCentral, IBM, Shell, SamTrans, State of Ohio, TalentFulfilled, TechBridge
Find out what your peers are saying about IBM Cloud Pak for Data vs. StreamSets and other solutions. Updated: April 2025.
852,764 professionals have used our research since 2012.