IBM Cloud Pak for Data vs StreamSets comparison

Cancel
You must select at least 2 products to compare!
IBM Logo
4,032 views|2,639 comparisons
84% willing to recommend
StreamSets Logo
4,200 views|2,349 comparisons
100% willing to recommend
Comparison Buyer's Guide
Executive Summary

We performed a comparison between IBM Cloud Pak for Data and StreamSets based on real PeerSpot user reviews.

Find out in this report how the two Data Integration solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI.
To learn more, read our detailed IBM Cloud Pak for Data vs. StreamSets Report (Updated: May 2024).
771,170 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"The most valuable features are data virtualization and reporting.""Cloud Pak's most valuable features are IBM MQ, IBM App Connect, IBM API Connect, and ISPF.""DataStage allows me to connect to different data sources.""The most valuable features of IBM Cloud Pak for Data are the Watson Studio, where we can initiate more groups and write code. Additionally, Watson Machine Learning is available with many other services, such as APIs which you can plug the machine learning models.""Its data preparation capabilities are highly valuable.""You can model the data there, connect the data models with the business processes and create data lineage processes.""One of Cloud Pak's best features is the Watson Knowledge Catalog, which helps you implement data governance.""The most valuable feature of IBM Cloud Pak for Data is the Modeler flows. The ability to develop models using a graphical approach and the capability to connect to various sources, as well as the data virtualization capabilities, allow me to easily access and utilize data that is dispersed across different sources."

More IBM Cloud Pak for Data Pros →

"The Ease of configuration for pipes is amazing. It has a lot of connectors. Mainly, we can do everything with the data in the pipe. I really like the graphical interface too""The ETL capabilities are very useful for us. We extract and transform data from multiple data sources, into a single, consistent data store, and then we put it in our systems. We typically use it to connect our Apache Kafka with data lakes. That process is smooth and saves us a lot of time in our production systems.""The ability to have a good bifurcation rate and fewer mistakes is valuable.""The UI is user-friendly, it doesn't require any technical know-how and we can navigate to social media or use it more easily.""The scheduling within the data engineering pipeline is very much appreciated, and it has a wide range of connectors for connecting to any data sources like SQL Server, AWS, Azure, etc. We have used it with Kafka, Hadoop, and Azure Data Factory Datasets. Connecting to these systems with StreamSets is very easy.""The entire user interface is very simple and the simplicity of creating pipelines is something that I like very much about it. The design experience is very smooth.""The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customize it to do what you need. Many other tools have started to use features similar to those introduced by StreamSets, like automated workflows that are easy to set up.""One of the things I like is the data pipelines. They have a very good design. Implementing pipelines is very straightforward. It doesn't require any technical skill."

More StreamSets Pros →

Cons
"The product must improve its performance.""The tool depends on the control plane, an OpenShift container platform utilized as an orchestration layer...So, we have communicated this issue to IBM and asked if it is feasible to adapt the solution to work on a Kubernetes platform that we support.""One challenge I'm facing with IBM Cloud Pak for Data is native features have been decommissioned, such as XML input and output. Too many changes have been made, and my company has around one hundred thousand mappings, so my team has been putting more effort into alternative ways to do things. Another area for improvement in IBM Cloud Pak for Data is that it's more complicated to shift from on-premise to the cloud. Other vendors provide secure agents that easily connect with your existing setup. Still, with IBM Cloud Pak for Data, you have to perform connection migration steps, upgrade to the latest version, etc., which makes it more complicated, especially as my company has XML-based mappings. Still, the XML input and output capabilities of IBM Cloud Pak for Data have been discontinued, so I'd like IBM to bring that back.""The interface could improve because sometimes it becomes slow. Sometimes there is a delay between clicks when using the software, which can make the development process slow. It can take a few seconds to complete one action, and then a few more seconds to do the next one.""There is a solution that is part of IBM Cloud Pak for Data called Watson OpenScale. It is used to monitor the deployed models for the quality and fairness of the results. This is one area that needs a lot of improvement.""One thing that bugs me is how much infrastructure Cloud Pak requires for the initial deployment. It doesn't allow you to start small. The smallest permitted deployment is too big. It's a huge problem that prevents us from implementing the solution in many scenarios.""The technical support could be a little better.""The solution's user experience is an area that has room for improvement."

More IBM Cloud Pak for Data Cons →

"Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful.""StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target.""Sometimes, when we have large amounts of data that is very efficiently stored in Hadoop or Kafka, it is not very efficient to run it through StreamSets, due to the lack of efficiency or the resources that StreamSets is using.""StreamSet works great for batch processing but we are looking for something that is more real-time. We need latency in numbers below milliseconds.""In terms of the product, I don't think there is any room for improvement because it is very good. One small area of improvement that is very much needed is on the knowledge base side. Sometimes, it is not very clear how to set up a certain process or a certain node for a person who's using the platform for the first time.""If you use JDBC Lookup, for example, it generally takes a long time to process data.""Visualization and monitoring need to be improved and refined.""There aren't enough hands-on labs, and debugging is also an issue because it takes a lot of time. Logs are not that clear when you are debugging, and you can only select a single source for a pipeline."

More StreamSets Cons →

Pricing and Cost Advice
  • "I think that this product is too expensive for smaller companies."
  • "I don't have the exact licensing cost for IBM Cloud Pak for Data, as my company is still finalizing requirements, including monthly, yearly, and three-year licensing fees. Still, on a scale of one to five, I'd rate it a three because, compared to other vendors, it's more complicated."
  • "Cloud Pak's cost is a little high."
  • "IBM Cloud Pak for Data is expensive. If we include the training time and the machine learning, it's expensive. The cost of the execution is more reasonable."
  • "For the licensing of the solution, there is a yearly payment that needs to be made. Also, since it is expensive, cost-wise, I rate the solution an eight or nine out of ten."
  • "It's quite expensive."
  • "The solution is expensive."
  • More IBM Cloud Pak for Data Pricing and Cost Advice →

  • "We are running the community version right now, which can be used free of charge."
  • "StreamSets Data Collector is open source. One can utilize the StreamSets Data Collector, but the Control Hub is the main repository where all the jobs are present. Everything happens in Control Hub."
  • "It has a CPU core-based licensing, which works for us and is quite good."
  • "There are different versions of the product. One is the corporate license version, and the other one is the open-source or free version. I have been using the corporate license version, but they have recently launched a new open-source version so that anybody can create an account and use it. The licensing cost varies from customer to customer. I don't have a lot of input on that. It is taken care of by PMO, and they seem fine with its pricing model. It is being used enterprise-wide. They seem to have got a good deal for StreamSets."
  • "The pricing is good, but not the best. They have some customized plans you can opt for."
  • "We use the free version. It's great for a public, free release. Our stance is that the paid support model is too expensive to get into. They should honestly reevaluate that."
  • "The overall cost for small and mid-size organizations needs to be better."
  • "There are two editions, Professional and Enterprise, and there is a free trial. We're using the Professional edition and it is competitively priced."
  • More StreamSets Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Data Integration solutions are best for your needs.
    771,170 professionals have used our research since 2012.
    Questions from the Community
    Top Answer:DataStage allows me to connect to different data sources.
    Top Answer:The product must improve its performance. We see typical cloud-related issues in the solution. IBM can still focus more on keeping the performance up and keeping it 100% available all the time.
    Top Answer:The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customize… more »
    Top Answer:We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which was… more »
    Top Answer:StreamSets is used for data transformation rather than ETL processes. It focuses on transforming data directly from sources without handling the extraction part of the process. The transformed data is… more »
    Ranking
    17th
    out of 101 in Data Integration
    Views
    4,032
    Comparisons
    2,639
    Reviews
    9
    Average Words per Review
    500
    Rating
    8.4
    8th
    out of 101 in Data Integration
    Views
    4,200
    Comparisons
    2,349
    Reviews
    22
    Average Words per Review
    1,306
    Rating
    8.4
    Comparisons
    Also Known As
    Cloud Pak for Data
    Learn More
    StreamSets
    Video Not Available
    Overview

    IBM Cloud Pak® for Data is a fully-integrated data and AI platform that modernizes how businesses collect, organize and analyze data to infuse AI throughout their organizations. Cloud-native by design, the platform unifies market-leading services spanning the entire analytics lifecycle. From data management, DataOps, governance, business analytics and automated AI, IBM Cloud Pak for Data helps eliminate the need for costly, and often competing, point solutions while providing the information architecture you need to implement AI successfully.

    Building on the streamlined hybrid-cloud foundation of Red Hat® OpenShift®, IBM Cloud Pak for Data takes advantage of the underlying resource and infrastructure optimization and management. The solution fully supports multicloud environments such as Amazon Web Services (AWS), Azure, Google Cloud, IBM Cloud™ and private cloud deployments. Find out how IBM Cloud Pak for Data can lower your total cost of ownership and accelerate innovation.

    StreamSets is a data integration platform that enables organizations to efficiently move and process data across various systems. It offers a user-friendly interface for designing, deploying, and managing data pipelines, allowing users to easily connect to various data sources and destinations. StreamSets also provides real-time monitoring and alerting capabilities, ensuring that data is flowing smoothly and any issues are quickly addressed.

    Sample Customers
    Qatar Development Bank, GuideWell, Skanderborg Music Festival
    Availity, BT Group, Humana, Deluxe, GSK, RingCentral, IBM, Shell, SamTrans, State of Ohio, TalentFulfilled, TechBridge
    Top Industries
    VISITORS READING REVIEWS
    Financial Services Firm26%
    Computer Software Company11%
    Manufacturing Company8%
    Government8%
    REVIEWERS
    Financial Services Firm21%
    Energy/Utilities Company21%
    Comms Service Provider14%
    Computer Software Company14%
    VISITORS READING REVIEWS
    Financial Services Firm17%
    Computer Software Company13%
    Manufacturing Company8%
    Government7%
    Company Size
    REVIEWERS
    Small Business46%
    Large Enterprise54%
    VISITORS READING REVIEWS
    Small Business17%
    Midsize Enterprise8%
    Large Enterprise76%
    REVIEWERS
    Small Business40%
    Midsize Enterprise12%
    Large Enterprise48%
    VISITORS READING REVIEWS
    Small Business16%
    Midsize Enterprise11%
    Large Enterprise73%
    Buyer's Guide
    IBM Cloud Pak for Data vs. StreamSets
    May 2024
    Find out what your peers are saying about IBM Cloud Pak for Data vs. StreamSets and other solutions. Updated: May 2024.
    771,170 professionals have used our research since 2012.

    IBM Cloud Pak for Data is ranked 17th in Data Integration with 11 reviews while StreamSets is ranked 8th in Data Integration with 24 reviews. IBM Cloud Pak for Data is rated 8.0, while StreamSets is rated 8.4. The top reviewer of IBM Cloud Pak for Data writes "A scalable data analytics and digital transformation tool that provides useful features and integrations". On the other hand, the top reviewer of StreamSets writes "We no longer need to hire highly skilled data engineers to create and monitor data pipelines". IBM Cloud Pak for Data is most compared with IBM InfoSphere DataStage, Azure Data Factory, Informatica Cloud Data Integration, Palantir Foundry and Denodo, whereas StreamSets is most compared with Fivetran, Azure Data Factory, Informatica PowerCenter, SSIS and IBM InfoSphere DataStage. See our IBM Cloud Pak for Data vs. StreamSets report.

    See our list of best Data Integration vendors.

    We monitor all Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.