

Pentaho Data Integration and Analytics and IBM Cloud Pak for Data compete in the data integration and analytics category. IBM Cloud Pak for Data appears to have the upper hand in handling complex data projects due to its advanced analytics and machine learning capabilities.
Features: Pentaho Data Integration offers ease of use with a drag-and-drop interface, open-source model, and big data support. It is known for its scalability and compatibility with various data systems. IBM Cloud Pak for Data stands out with advanced analytics, machine learning capabilities, and robust data governance, making it suitable for complex projects.
Room for Improvement: Pentaho could improve cloud integration, debugging feedback, and expand connectors for modern platforms. Setting it up on non-Windows platforms can be complex, and documentation could be enhanced. IBM Cloud Pak for Data might benefit from better cost-reduction strategies, more out-of-the-box integrations, and a user-friendly interface to improve usability. Installation and administration could be streamlined further.
Ease of Deployment and Customer Service: Pentaho offers flexible deployment across on-premises and hybrid setups, supported by a strong user community, though professional support is less effective. IBM Cloud Pak for Data supports hybrid and on-premises deployment but requires significant initial infrastructure setup. Its technical support is responsive but sometimes lacks depth for complex queries, with adaptable pricing offers.
Pricing and ROI: Pentaho is cost-effective, with its free Community Edition offering savings on ETL processes and yielding good ROI through reduced development time. IBM Cloud Pak for Data has high pricing but provides valuable functions for enterprises requiring advanced analytics and machine learning, though its cost is less accessible for smaller organizations.
We have been able to drive responsible, transparent, and explainable AI workflow to operationalize AI and mitigate risk and regulatory compliance easily.
It is easy to collect, organize, and analyze data no matter where it is, hence being able to make data-driven decisions.
I have seen a return on investment; my team was able to stay extremely small even though we had a lot of data integrations with many companies.
I can testify to the return on investment with metrics regarding time saved; we have increased our efficiency by about 20 to 30 percent due to the swift migration processes facilitated by the tool.
I have noticed a return on investment with Pentaho Data Integration and Analytics in terms of time savings and staff reduction.
I rate the technical support from IBM a nine out of ten because the support has been very top-notch, unparalleled, and also very professional.
Cloud Pak is a complicated system, and it's often difficult to find the right resource in IBM to help with specific issues.
The customer support for IBM Cloud Pak for Data is great and responsive.
24/7 assistance is available for the Enterprise Edition.
take the time to understand our business requirements, offering appropriate recommendations.
Communication with the vendor is challenging
I have not noticed any downtime or lagging, especially when dealing with large data, so it is relatively very scalable.
IBM Cloud Pak for Data's scalability is very good; it can be used by any size of organization.
For scalability, I rate it a nine out of ten because it is a very scalable solution that has been able to handle my organization's growth efficiently.
It can be scaled well until you reach a point where you need to perform a lot of operations, and the issue arises when it runs out of memory to handle some data.
Its ability to scale horizontally in cloud-native architectures or for massive real-time processing is limited.
Pentaho Data Integration handles larger datasets better.
The overall performance of IBM Cloud Pak for Data, particularly with IBM DataStage for ETL processes, is very good.
Performance issues arise due to reliance on a flowchart-based mechanism instead of scripts, which can lead to longer execution times.
I find that version 3.1 is the most stable version I have ever used.
It's pretty stable, however, it struggles when dealing with smaller amounts of data.
Setting up the hybrid and multi-cloud environments is a long job and it takes time.
IBM Cloud Pak for Data can be improved because processing speeds are sometimes slow.
To improve IBM Cloud Pak for Data, I suggest more out-of-the-box integration.
We should also explore more effective partitioning for parallel processing and fine-tuning database connections to reduce load times and improve ETL speed.
Pentaho Data Integration and Analytics can be improved by working with different environments, specifically the possibility to change the variables, meaning I write my variables only once and can change them for different environments such as production or development.
Pentaho Data Integration and Analytics could have real-time processing and automatic alerting, having alerts or automatic notifications when a job fails or when certain data doesn't meet certain rules.
The setup cost is very expensive.
Regarding my experience with pricing, setup cost, and licensing, for a small organization, the price might be relatively high, but for huge enterprises such as ours, the price is relatively affordable.
The list price is high, but the flexibility in pricing is adequate.
I use the community version of Pentaho Data Integration and Analytics, and I do not need additional costs.
The setup cost was minimal, and the pricing experience was pretty good.
The company covered it and they had no problem paying for it because they saw that it was cost-effective in terms of performance afterwards.
From there, I can work my way into a more granular level, applying all of that information on top of my actual data to understand what my data looks like, where it came from, and where it went wrong, managing it throughout the cycle.
The benefits of choosing IBM Cognos, in addition to saving on cost, include having institutional knowledge about maintaining this infrastructure and enough people who have developed on Cognos in the past, which creates comfort in its use.
We have been able to save approximately 80 percent of our time. We are not doing data analysis manually, so this relieves our data department of dealing with data.
Pentaho Data Integration and Analytics has positively impacted my organization because it meant we didn't have to write a lot of custom API back-end processing logic; it did the majority of that heavy lifting for us.
It automates the data workflow, including extraction, cleansing, and loading into warehouses for BI reporting purposes, while also removing duplicates, validating data, and standardizing formats, enabling real-time decision-making.
Pentaho Data Integration and Analytics has positively impacted my organization because it is easier to use, and my knowledge about this work facilitates the translation from the source to my final system.
| Product | Mindshare (%) |
|---|---|
| Pentaho Data Integration and Analytics | 1.7% |
| IBM Cloud Pak for Data | 1.1% |
| Other | 97.2% |


| Company Size | Count |
|---|---|
| Small Business | 10 |
| Large Enterprise | 20 |
| Company Size | Count |
|---|---|
| Small Business | 18 |
| Midsize Enterprise | 17 |
| Large Enterprise | 32 |
IBM Cloud Pak for Data is a comprehensive platform integrating data management, AI, and machine learning capabilities tailored for hybrid environments. It's renowned for enhancing productivity through efficient data analytics and management.
This platform offers data virtualization, robust analytics, and AI-driven processes. Its integration capabilities, including IBM MQ and App Connect, facilitate seamless data connections. Users benefit from containerization, data governance, and compatibility with hybrid systems, improving decision-making and management productivity. However, the requirement of extensive infrastructure and performance challenges can impact scalability for small businesses.
What are the key features of IBM Cloud Pak for Data?In the financial and banking sectors, IBM Cloud Pak for Data is utilized for data management tasks like spend analytics and contract leakage analysis. It's used for data integration, machine learning, and AI-driven analytics to transform data into valuable insights in industries such as FinTech and consultancy.
Pentaho Data Integration and Analytics offers an intuitive platform for data workflows, enabling users to easily manage ETL processes across diverse data formats, ensuring seamless automation and development.
With its drag-and-drop interface, Pentaho allows for efficient ETL workflows without extensive coding. It supports a multitude of data formats and sources such as SQL, NoSQL, Hadoop, CSV, and JSON. Advanced features like metadata injection and API integration enable seamless automation. However, improvements in big data performance, better cloud service integration, and enhanced real-time processing capabilities can enhance user experience. Additional connectors and improved documentation are sought after by many. Providing support for more programming languages and optimizing memory usage also presents opportunities for enhancement.
What are the key features of Pentaho Data Integration and Analytics?Pentaho is employed across finance, healthcare, and retail industries for ETL processes. It's instrumental in integrating data from ERP, SAP systems, Excel, and APIs to develop comprehensive reports and data models. Companies rely on its capabilities for both on-premises and cloud deployments, improving data transparency and management.
We monitor all Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.