Try our new research platform with insights from 80,000+ expert users

IBM Cloud Pak for Data vs Pentaho Data Integration and Analytics comparison

 

Comparison Buyer's Guide

Executive SummaryUpdated on Dec 19, 2024

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Categories and Ranking

IBM Cloud Pak for Data
Ranking in Data Integration
16th
Average Rating
7.8
Reviews Sentiment
6.5
Number of Reviews
13
Ranking in other categories
Data Virtualization (3rd)
Pentaho Data Integration an...
Ranking in Data Integration
22nd
Average Rating
8.0
Reviews Sentiment
6.9
Number of Reviews
53
Ranking in other categories
No ranking in other categories
 

Mindshare comparison

As of May 2025, in the Data Integration category, the mindshare of IBM Cloud Pak for Data is 1.8%, up from 1.7% compared to the previous year. The mindshare of Pentaho Data Integration and Analytics is 1.6%, up from 0.6% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Data Integration
 

Featured Reviews

Michelle Leslie - PeerSpot reviewer
Starts strong with data management capabilities but needs a demo database
What I would love to see is an end-to-end, almost a training demo database of some sort, where one of the biggest problems with data management is demonstrated. There are so many components to data management, and more often than not, people understand one thing really well. They may understand DataStage and how to move data around, but they do not see the impact of moving data incorrectly. They also do not see the impact of everyone understanding a piece of data in the same way. I would love Cloud Pak to come with a demo database that illustrates the different components of data management in a logical way, so I can see the whole picture instead of just the area I'm specializing in. It would be great if Cloud Pak, from a data modeling point of view, allowed us to import our PDMs, for example. It would be ideal to import and create business terms in Cloud Pak. The PEA would be great to create the technical data. The association between the business and the technical metadata could then be automated by pulling it through from your ACE models. The data modeling component is available in Cloud Pak. Additionally, when it comes to Cloud Pak, even though it has the NextGen DataStage built into it, there is Cloud Pak for data integration as well. Currently, I do not think we have a full enough understanding of how CP4D and CP4I can enhance each other.
Ryan Ferdon - PeerSpot reviewer
Low-code makes development faster than with Python, but there were caching issues
If you're working with a larger data set, I'm not so sure it would be the best solution. The larger things got the slower it was. It was kind of buggy sometimes. And when we ran the flow, it didn't go from a perceived start to end, node by node. Everything kicked off at once. That meant there were times when it would get ahead of itself and a job would fail. That was not because the job was wrong, but because Pentaho decided to go at everything at once, and something would process before it was supposed to. There were nodes you could add to make sure that, before this node kicks off, all these others have processed, but it was a bit tedious. There were also caching issues, and we had to write code to clear the cache every time we opened the program, because the cache would fill up and it wouldn't run. I don't know how hard that would be for them to fix, or if it was fixed in version 10. Also, the UI is a bit outdated, but I'm more of a fan of function over how something looks. One other thing that would have helped with Pentaho was documentation and support on the internet: how to do things, how to set up. I think there are some sites on how to install it, and Pentaho does have a help repository, but it wasn't always the most useful.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"IBM Watson Catalog and data pipelines are the most valuable features of the solution."
"Scalability-wise, I rate the solution a nine or ten out of ten."
"Cloud Pak is a very, very, very good system."
"I love the way that I can start at a very basic level with my data management journey by capturing my policies, justifying my data, and putting them into different categories to say this is data relating to individuals, for example, or data relating to geography."
"What I found most helpful in IBM Cloud Pak for Data is containerization, which means it's easy to shift and leave in terms of moving to other clouds. That's an advantage of IBM Cloud Pak for Data."
"One of Cloud Pak's best features is the Watson Knowledge Catalog, which helps you implement data governance."
"Cloud Pak's most valuable features are IBM MQ, IBM App Connect, IBM API Connect, and ISPF."
"Its data preparation capabilities are highly valuable."
"We use Lumada’s ability to develop and deploy data pipeline templates once and reuse them. This is very important. When the entire pipeline is automated, we do not have any issues in respect to deployment of code or with code working in one environment but not working in another environment. We have saved a lot of time and effort from that perspective because it is easy to build ETL pipelines."
"I can create faster instructions than writing with SQL or code. Also, I am able to do some background control of the data process with this tool. Therefore, I use it as an ELT tool. I have a station area where I can work with all the information that I have in my production databases, then I can work with the data that I created."
"I can use Python, which is open-source, and I can run other scripts, including Linux scripts. It's user-friendly for running any object-based language. That's a very important feature because we live in a world of open-source."
"It's my understanding that the product can scale."
"It is easy to use, install, and start working with."
"One of the valuable features is the ability to use PL/SQL statements inside the data transformations and jobs."
"I find the drag and drop feature in Pentaho Data Integration very useful for integration."
"We can schedule job execution in the BA Server, which is the front-end product we're using right now. That scheduling interface is nice."
 

Cons

"The interface could improve because sometimes it becomes slow. Sometimes there is a delay between clicks when using the software, which can make the development process slow. It can take a few seconds to complete one action, and then a few more seconds to do the next one."
"There is a solution that is part of IBM Cloud Pak for Data called Watson OpenScale. It is used to monitor the deployed models for the quality and fairness of the results. This is one area that needs a lot of improvement."
"The solution could have more connectors."
"The product must improve its performance."
"Cloud Pak would be improved with integration with cloud service providers like Cloudera."
"One challenge I'm facing with IBM Cloud Pak for Data is native features have been decommissioned, such as XML input and output. Too many changes have been made, and my company has around one hundred thousand mappings, so my team has been putting more effort into alternative ways to do things. Another area for improvement in IBM Cloud Pak for Data is that it's more complicated to shift from on-premise to the cloud. Other vendors provide secure agents that easily connect with your existing setup. Still, with IBM Cloud Pak for Data, you have to perform connection migration steps, upgrade to the latest version, etc., which makes it more complicated, especially as my company has XML-based mappings. Still, the XML input and output capabilities of IBM Cloud Pak for Data have been discontinued, so I'd like IBM to bring that back."
"The setup cost is very expensive. The cost depends on the pieces of the solution I'm using, how much data I have, and whether it's on the cloud or on-prem."
"The tool depends on the control plane, an OpenShift container platform utilized as an orchestration layer...So, we have communicated this issue to IBM and asked if it is feasible to adapt the solution to work on a Kubernetes platform that we support."
"I would like to see support for some additional cloud sources. It doesn't support Azure, for example. I was trying to do a PoC with Azure the other day but it seems they don't support it."
"While Pentaho Data Integration is very friendly, it is not very useful when there isn't a lot of data to handle."
"As far as I remember, not all connectors worked very well. They can add more connectors and more drivers to the process to integrate with more flows."
"I'm still in the very recent stage concerning Pentaho Data Integration, but it can't really handle what I describe as "extreme data processing" i.e. when there is a huge amount of data to process. That is one area where Pentaho is still lacking."
"Parallel execution could be better in Pentaho. It's very simple but I don't think it works well."
"The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have. A simple feature you would expect a reporting tool to have is the ability to search the repository for a report. It doesn't even have that capability. That's been a feature that we've been asking for since the beginning and it hasn't been implemented yet."
"​There is not a data quality or MDM solution in the Pentaho DI suite.​"
"I was not happy with the Pentaho Report Designer because of the way it was set up. There was a zone and, under it, another zone, and under that another one, and under that another one. There were a lot of levels and places inside the report, and it was a little bit complicated. You have to search all these different places using a mouse, clicking everywhere... each report is coded in a binary file... You cannot search with a text search tool..."
 

Pricing and Cost Advice

"The solution's pricing is competitive with that of other vendors."
"IBM Cloud Pak for Data is expensive. If we include the training time and the machine learning, it's expensive. The cost of the execution is more reasonable."
"For the licensing of the solution, there is a yearly payment that needs to be made. Also, since it is expensive, cost-wise, I rate the solution an eight or nine out of ten."
"Cloud Pak's cost is a little high."
"The solution is expensive."
"I think that this product is too expensive for smaller companies."
"It's quite expensive."
"I don't have the exact licensing cost for IBM Cloud Pak for Data, as my company is still finalizing requirements, including monthly, yearly, and three-year licensing fees. Still, on a scale of one to five, I'd rate it a three because, compared to other vendors, it's more complicated."
"You don't need the Enterprise Edition, you can go with the Community Edition. That way you can use it for free and, for free, it's a pretty good tool to use."
"It does seem a bit expensive compared to the serverless product offering. Tools, such as Server Integration Services, are "almost" free with a database engine. It is comparable to products like Alteryx, which is also very expensive."
"I primarily work on the Community Version, which is available to use free of charge."
"I think Lumada's price is fair compared to some of the others, like BusinessObjects, which is was the other thing that I used at my previous job. BusinessObject's price was more reasonable before SAP acquired it. They jacked the price up significantly. Oracle's OBIEE tool was also prohibitively expensive."
"When we first started with it, it was much cheaper. It has gone up drastically, especially since Hitachi bought out Pentaho."
"We are using the Community Edition. We have been trying to use and sell the Enterprise version, but that hasn't been possible due to the budget required for it."
"Sometimes we provide the licenses or the customer can procure their own licenses. Previously, we had an enterprise license. Currently, we are on a community license as this is adequate for our needs."
"The solution reduced our ETL development time by a lot because a whole project used to take about a month to get done previously. After having Lumada, it took just a week. For a big company in Brazil, it saves a team at least $10,000 a month."
report
Use our free recommendation engine to learn which Data Integration solutions are best for your needs.
850,028 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
27%
Computer Software Company
11%
Manufacturing Company
10%
Government
7%
Financial Services Firm
22%
Computer Software Company
15%
Government
8%
Manufacturing Company
5%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about IBM Cloud Pak for Data?
DataStage allows me to connect to different data sources.
What is your experience regarding pricing and costs for IBM Cloud Pak for Data?
The setup cost is very expensive. The cost depends on the pieces of the solution I'm using, how much data I have, and whether it's on the cloud or on-prem.
What needs improvement with IBM Cloud Pak for Data?
What I would love to see is an end-to-end, almost a training demo database of some sort, where one of the biggest problems with data management is demonstrated. There are so many components to data...
Which ETL tool would you recommend to populate data from OLTP to OLAP?
Hi Rajneesh, yes here is the feature comparison between the community and enterprise edition : https://www.hitachivantara.com/en-us/pdf/brochure/leverage-open-source-benefits-with-assurance-of-hita...
What do you think can be improved with Hitachi Lumada Data Integrations?
In my opinion, the reporting side of this tool needs serious improvements. In my previous company, we worked with Hitachi Lumada Data Integration and while it does a good job for what it’s worth, ...
What do you use Hitachi Lumada Data Integrations for most frequently?
My company has used this product to transform data from databases, CSV files, and flat files. It really does a good job. We were most satisfied with the results in terms of how many people could us...
 

Also Known As

Cloud Pak for Data
Hitachi Lumada Data Integration, Kettle, Pentaho Data Integration
 

Overview

 

Sample Customers

Qatar Development Bank, GuideWell, Skanderborg Music Festival
66Controls, Providential Revenue Agency of Ro Negro, NOAA Information Systems, Swiss Real Estate Institute
Find out what your peers are saying about IBM Cloud Pak for Data vs. Pentaho Data Integration and Analytics and other solutions. Updated: April 2025.
850,028 professionals have used our research since 2012.