Try our new research platform with insights from 80,000+ expert users

IBM Cloud Pak for Data vs StreamSets comparison

 

Comparison Buyer's Guide

Executive SummaryUpdated on Dec 19, 2024

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Categories and Ranking

IBM Cloud Pak for Data
Ranking in Data Integration
24th
Average Rating
7.8
Reviews Sentiment
6.5
Number of Reviews
13
Ranking in other categories
Data Virtualization (3rd)
StreamSets
Ranking in Data Integration
23rd
Average Rating
8.4
Reviews Sentiment
7.0
Number of Reviews
21
Ranking in other categories
No ranking in other categories
 

Mindshare comparison

As of July 2025, in the Data Integration category, the mindshare of IBM Cloud Pak for Data is 2.0%, up from 1.7% compared to the previous year. The mindshare of StreamSets is 1.6%, up from 1.4% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Data Integration
 

Featured Reviews

Michelle Leslie - PeerSpot reviewer
Starts strong with data management capabilities but needs a demo database
What I would love to see is an end-to-end, almost a training demo database of some sort, where one of the biggest problems with data management is demonstrated. There are so many components to data management, and more often than not, people understand one thing really well. They may understand DataStage and how to move data around, but they do not see the impact of moving data incorrectly. They also do not see the impact of everyone understanding a piece of data in the same way. I would love Cloud Pak to come with a demo database that illustrates the different components of data management in a logical way, so I can see the whole picture instead of just the area I'm specializing in. It would be great if Cloud Pak, from a data modeling point of view, allowed us to import our PDMs, for example. It would be ideal to import and create business terms in Cloud Pak. The PEA would be great to create the technical data. The association between the business and the technical metadata could then be automated by pulling it through from your ACE models. The data modeling component is available in Cloud Pak. Additionally, when it comes to Cloud Pak, even though it has the NextGen DataStage built into it, there is Cloud Pak for data integration as well. Currently, I do not think we have a full enough understanding of how CP4D and CP4I can enhance each other.
Ved Prakash Yadav - PeerSpot reviewer
Useful for data transformation and helps with column encryption
We use various tools and alerting systems to notify us of pipeline errors or failures. StreamSets supports data governance and compliance by allowing us to encrypt incoming data based on specified rules. We can easily encrypt columns by providing the column name and hash key. If you're considering using StreamSets for the first time, I would advise first understanding why you want to use it and how it will benefit you. If you're dealing with change tracking or handling large amounts of data, it could be cost-effective compared to services like Amazon. It's easy to schedule and manage tasks with the tool, and you can enhance your skills as an ETL developer. You can easily migrate traditional pipelines built on platforms like Informatica or Talend to StreamSets. I rate the overall solution an eight out of ten.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"Its data preparation capabilities are highly valuable."
"DataStage allows me to connect to different data sources."
"The most valuable features are data virtualization and reporting."
"One of Cloud Pak's best features is the Watson Knowledge Catalog, which helps you implement data governance."
"Cloud Pak's most valuable features are IBM MQ, IBM App Connect, IBM API Connect, and ISPF."
"Cloud Pak is a very, very, very good system."
"The most valuable features of IBM Cloud Pak for Data are the Watson Studio, where we can initiate more groups and write code. Additionally, Watson Machine Learning is available with many other services, such as APIs which you can plug the machine learning models."
"You can model the data there, connect the data models with the business processes and create data lineage processes."
"The best feature that I really like is the integration."
"StreamSets is the leader in the market."
"The Ease of configuration for pipes is amazing. It has a lot of connectors. Mainly, we can do everything with the data in the pipe. I really like the graphical interface too"
"The most valuable would be the GUI platform that I saw. I first saw it at a special session that StreamSets provided towards the end of the summer. I saw the way you set it up and how you have different processes going on with your data. The design experience seemed to be pretty straightforward to me in terms of how you drag and drop these nodes and connect them with arrows."
"StreamSets’ data drift resilience has reduced the time it takes us to fix data drift breakages. For example, in our previous Hadoop scenario, when we were creating the Sqoop-based processes to move data from source to destinations, we were getting the job done. That took approximately an hour to an hour and a half when we did it with Hadoop. However, with the StreamSets, since it works on a data collector-based mechanism, it completes the same process in 15 minutes of time. Therefore, it has saved us around 45 minutes per data pipeline or table that we migrate. Thus, it reduced the data transfer, including the drift part, by 45 minutes."
"One of the things I like is the data pipelines. They have a very good design. Implementing pipelines is very straightforward. It doesn't require any technical skill."
"It is really easy to set up and the interface is easy to use."
"StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall."
 

Cons

"One challenge I'm facing with IBM Cloud Pak for Data is native features have been decommissioned, such as XML input and output. Too many changes have been made, and my company has around one hundred thousand mappings, so my team has been putting more effort into alternative ways to do things. Another area for improvement in IBM Cloud Pak for Data is that it's more complicated to shift from on-premise to the cloud. Other vendors provide secure agents that easily connect with your existing setup. Still, with IBM Cloud Pak for Data, you have to perform connection migration steps, upgrade to the latest version, etc., which makes it more complicated, especially as my company has XML-based mappings. Still, the XML input and output capabilities of IBM Cloud Pak for Data have been discontinued, so I'd like IBM to bring that back."
"One thing that bugs me is how much infrastructure Cloud Pak requires for the initial deployment. It doesn't allow you to start small. The smallest permitted deployment is too big. It's a huge problem that prevents us from implementing the solution in many scenarios."
"The product is trying to be more maturity in terms of connectors. That, I believe, is an area where Cloud Pak can improve."
"The interface could improve because sometimes it becomes slow. Sometimes there is a delay between clicks when using the software, which can make the development process slow. It can take a few seconds to complete one action, and then a few more seconds to do the next one."
"Cloud Pak would be improved with integration with cloud service providers like Cloudera."
"The setup cost is very expensive. The cost depends on the pieces of the solution I'm using, how much data I have, and whether it's on the cloud or on-prem."
"The solution could have more connectors."
"The solution's user experience is an area that has room for improvement."
"The monitoring visualization is not that user-friendly. It should include other features to visualize things, like how many records were streamed from a source to a destination on a particular date."
"I would like to see it integrate with other kinds of platforms, other than Java. We're going to have a lot of applications using .NET and other languages or frameworks. StreamSets is very helpful for the old Java platform but it's hard to integrate with the other platforms and frameworks."
"We've seen a couple of cases where it appears to have a memory leak or a similar problem."
"Sometimes, it is not clear at first how to set up nodes. A site with an explanation of how each node works would be very helpful."
"The documentation is inadequate and has room for improvement because the technical support does not regularly update their documentation or the knowledge base."
"One issue I observed with StreamSets is that the memory runs out quickly when processing large volumes of data. Because of this memory issue, we have to upgrade our EC2 boxes in the Amazon AWS infrastructure."
"In terms of the product, I don't think there is any room for improvement because it is very good. One small area of improvement that is very much needed is on the knowledge base side. Sometimes, it is not very clear how to set up a certain process or a certain node for a person who's using the platform for the first time."
"StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target."
 

Pricing and Cost Advice

"For the licensing of the solution, there is a yearly payment that needs to be made. Also, since it is expensive, cost-wise, I rate the solution an eight or nine out of ten."
"The solution's pricing is competitive with that of other vendors."
"I don't have the exact licensing cost for IBM Cloud Pak for Data, as my company is still finalizing requirements, including monthly, yearly, and three-year licensing fees. Still, on a scale of one to five, I'd rate it a three because, compared to other vendors, it's more complicated."
"The solution is expensive."
"Cloud Pak's cost is a little high."
"It's quite expensive."
"IBM Cloud Pak for Data is expensive. If we include the training time and the machine learning, it's expensive. The cost of the execution is more reasonable."
"I think that this product is too expensive for smaller companies."
"There are two editions, Professional and Enterprise, and there is a free trial. We're using the Professional edition and it is competitively priced."
"The licensing is expensive, and there are other costs involved too. I know from using the software that you have to buy new features whenever there are new updates, which I don't really like. But initially, it was very good."
"Its pricing is pretty much up to the mark. For smaller enterprises, it could be a big price to pay at the initial stage of operations, but the moment you have the Seed B or Seed C funding and you want to scale up your operations and aren't much worried about the funds, at that point in time, you would need a solution that could be scaled."
"It has a CPU core-based licensing, which works for us and is quite good."
"There are different versions of the product. One is the corporate license version, and the other one is the open-source or free version. I have been using the corporate license version, but they have recently launched a new open-source version so that anybody can create an account and use it. The licensing cost varies from customer to customer. I don't have a lot of input on that. It is taken care of by PMO, and they seem fine with its pricing model. It is being used enterprise-wide. They seem to have got a good deal for StreamSets."
"The overall cost is very flexible so it is not a burden for our organization... However, the cost should be improved. For small and mid-size organizations it might be a challenge."
"We are running the community version right now, which can be used free of charge."
"We use the free version. It's great for a public, free release. Our stance is that the paid support model is too expensive to get into. They should honestly reevaluate that."
report
Use our free recommendation engine to learn which Data Integration solutions are best for your needs.
861,803 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
30%
Computer Software Company
10%
Manufacturing Company
10%
Government
6%
Financial Services Firm
11%
Computer Software Company
11%
Manufacturing Company
10%
Insurance Company
9%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about IBM Cloud Pak for Data?
DataStage allows me to connect to different data sources.
What is your experience regarding pricing and costs for IBM Cloud Pak for Data?
The setup cost is very expensive. The cost depends on the pieces of the solution I'm using, how much data I have, and whether it's on the cloud or on-prem.
What needs improvement with IBM Cloud Pak for Data?
What I would love to see is an end-to-end, almost a training demo database of some sort, where one of the biggest problems with data management is demonstrated. There are so many components to data...
What do you like most about StreamSets?
The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customiz...
What needs improvement with StreamSets?
One issue I observed with StreamSets is that the memory runs out quickly when processing large volumes of data. Because of this memory issue, we have to upgrade our EC2 boxes in the Amazon AWS infr...
What is your primary use case for StreamSets?
We are using StreamSets for batch loading.
 

Also Known As

Cloud Pak for Data
No data available
 

Overview

 

Sample Customers

Qatar Development Bank, GuideWell, Skanderborg Music Festival
Availity, BT Group, Humana, Deluxe, GSK, RingCentral, IBM, Shell, SamTrans, State of Ohio, TalentFulfilled, TechBridge
Find out what your peers are saying about IBM Cloud Pak for Data vs. StreamSets and other solutions. Updated: July 2025.
861,803 professionals have used our research since 2012.