

Pentaho Data Integration and Azure Data Factory are prominent ETL tools in the data integration category. Azure Data Factory seems to have an upper hand due to its integration capabilities within the Azure suite and a scalable architecture for extensive data orchestration.
Features: Pentaho Data Integration provides robust ETL features with an intuitive drag-and-drop graphical interface. It supports multiple databases and data formats, aiding in rapid development cycles and easy data transformation. The inclusion of plugins for big data technologies such as HBase and Hadoop enhances its value. Azure Data Factory offers better integration within the Azure environment with a user-friendly drag-and-drop interface that simplifies complex data flows. It boasts numerous built-in connectors, making it seamless to integrate with other Azure services while providing strong data transformation capabilities.
Room for Improvement: Pentaho needs improvement in backward compatibility and performance with large data sets. It requires extensive native connectors and a better user interface for managing data transformations and reports. More comprehensive documentation could enhance user experience. Azure Data Factory could improve pricing transparency and UI elements, along with expanding connector availability and integration with Microsoft services. Users also suggest enhancements in real-time data processing capabilities and error management.
Ease of Deployment and Customer Service: Pentaho offers flexible deployment options like on-premises and hybrid cloud but may challenge users without in-house expertise. It benefits from a strong community, though official support can be limited, especially for the Community Edition. Azure Data Factory, used mainly in public and hybrid cloud environments, is known for straightforward setup but faces criticism over complex pricing and support issues. Its integration within the Microsoft ecosystem provides more comprehensive technical support.
Pricing and ROI: Pentaho's Community Edition is a cost-effective option for small to medium businesses, while the Enterprise Edition has higher prices post-Hitachi acquisition, though still offering good value. Azure Data Factory's pay-as-you-go model can result in unpredictable costs for extensive use but remains competitive. Both platforms promise significant ROI through reduced ETL development time and improved data handling efficiency, with Pentaho having initial cost advantages due to its open-source availability.
Our stakeholders and clients have expressed satisfaction with Azure Data Factory's efficiency and cost-effectiveness.
I have seen a return on investment; my team was able to stay extremely small even though we had a lot of data integrations with many companies.
I can testify to the return on investment with metrics regarding time saved; we have increased our efficiency by about 20 to 30 percent due to the swift migration processes facilitated by the tool.
The technical support from Microsoft is rated an eight out of ten.
The technical support is responsive and helpful
The technical support for Azure Data Factory is generally acceptable.
24/7 assistance is available for the Enterprise Edition.
take the time to understand our business requirements, offering appropriate recommendations.
Communication with the vendor is challenging
Azure Data Factory is highly scalable.
It can be scaled well until you reach a point where you need to perform a lot of operations, and the issue arises when it runs out of memory to handle some data.
Pentaho Data Integration handles larger datasets better.
Pentaho Data Integration and Analytics' scalability is commendable, as it allows us to scale up according to our needs.
The solution has a high level of stability, roughly a nine out of ten.
Performance issues arise due to reliance on a flowchart-based mechanism instead of scripts, which can lead to longer execution times.
I find that version 3.1 is the most stable version I have ever used.
It's pretty stable, however, it struggles when dealing with smaller amounts of data.
Incorporating more dedicated API sources to specific services like HubSpot CRM or Salesforce would be beneficial.
Sometimes, the compute fails to process data if there is a heavy load suddenly, and it doesn't scale up automatically.
There is a problem with the integration with third-party solutions, particularly with SAP.
We should also explore more effective partitioning for parallel processing and fine-tuning database connections to reduce load times and improve ETL speed.
Pentaho Data Integration and Analytics can be improved by working with different environments, specifically the possibility to change the variables, meaning I write my variables only once and can change them for different environments such as production or development.
I also lack the option to use programming languages beyond Python and SQL, and a provision to incorporate Scala code in the scripting component would be beneficial.
The pricing is cost-effective.
It is considered cost-effective.
I use the community version of Pentaho Data Integration and Analytics, and I do not need additional costs.
The setup cost was minimal, and the pricing experience was pretty good.
It connects to different sources out-of-the-box, making integration much easier.
The platform excels in handling major datasets, particularly when working with Power BI for reporting purposes.
Regarding the integration feature in Azure Data Factory, the integration part is excellent; we have major source connectors, so we can integrate the data from different data sources and also perform basic transformation while transforming, which is a great feature in Azure Data Factory.
Pentaho Data Integration and Analytics has positively impacted my organization because it meant we didn't have to write a lot of custom API back-end processing logic; it did the majority of that heavy lifting for us.
It automates the data workflow, including extraction, cleansing, and loading into warehouses for BI reporting purposes, while also removing duplicates, validating data, and standardizing formats, enabling real-time decision-making.
Pentaho Data Integration and Analytics has positively impacted my organization because it is easier to use, and my knowledge about this work facilitates the translation from the source to my final system.
| Product | Market Share (%) |
|---|---|
| Azure Data Factory | 3.2% |
| Pentaho Data Integration and Analytics | 1.5% |
| Other | 95.3% |


| Company Size | Count |
|---|---|
| Small Business | 31 |
| Midsize Enterprise | 19 |
| Large Enterprise | 57 |
| Company Size | Count |
|---|---|
| Small Business | 18 |
| Midsize Enterprise | 18 |
| Large Enterprise | 29 |
Azure Data Factory efficiently manages and integrates data from various sources, enabling seamless movement and transformation across platforms. Its valuable features include seamless integration with Azure services, handling large data volumes, flexible transformation, user-friendly interface, extensive connectors, and scalability. Users have experienced improved team performance, workflow simplification, enhanced collaboration, streamlined processes, and boosted productivity.
Pentaho Data Integration stands as a versatile platform designed to cater to the data integration and analytics needs of organizations, regardless of their size. This powerful solution is the go-to choice for businesses seeking to seamlessly integrate data from diverse sources, including databases, files, and applications. Pentaho Data Integration facilitates the essential tasks of cleaning and transforming data, ensuring it's primed for meaningful analysis. With a wide array of tools for data mining, machine learning, and statistical analysis, Pentaho Data Integration empowers organizations to glean valuable insights from their data. What sets Pentaho Data Integration apart is its maturity and a vibrant community of users and developers, making it a reliable and cost-effective option. Pentaho Data Integration offers a range of features, including a comprehensive ETL toolkit, data cleaning and transformation capabilities, robust data analysis tools, and seamless deployment options for data integration and analytics solutions, making it a go-to solution for organizations seeking to harness the power of their data.
We monitor all Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.