Data transformation within Pentaho is a nice feature that they have and that I value.
Brazil IT Coordinator at a transportation company with 1,001-5,000 employees
Integration between databases and data import for a BI solution is valuable.
Pros and Cons
- "Data transformation within Pentaho is a nice feature that they have and that I value."
- "I would like to see more improvements with AS400 DB2."
What is most valuable?
How has it helped my organization?
Integration between databases and data import for a BI solution.
What needs improvement?
I would like to see more improvements with AS400 DB2. I journalled the tables/instance and the data migration is too slow if I compare it with other databases.
What was my experience with deployment of the solution?
There were no issues with the deployment.
Buyer's Guide
Pentaho Data Integration and Analytics
December 2025
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: December 2025.
879,371 professionals have used our research since 2012.
What do I think about the stability of the solution?
Until now, the stability of Pentaho is great. I've already tested various scenarios and I didn't feel a loss of performance.
What do I think about the scalability of the solution?
There have been no issues so far in scaling the product.
How was the initial setup?
I used self-learning to implement it and found that the tool is very easy to understand. For some things, I looked at YouTube videos for conceptual ideas during the planning phase.
What about the implementation team?
I did it myself.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Senior Consultant at a financial services firm with 10,001+ employees
Needs improvement on the Hadoop and JMS plugins.
Valuable Features:
It allows for rapid prototyping of a wide array of ETL workloads.
Room for Improvement:
Support for common Hadoop utilities can be expanded, such as bulk load with composite row keys for HBase, and include drivers for Impala out-of-the-box. A richer interface to Hive could also be beneficial as we currently have to go through a raw connection and execute SQL scripts, for which some syntax is not respected.
As of version 6, there are also some new issues introduced that pose a bit of an annoyance:
1) On kettle's ramp up - log4j errors
2) IBM Websphere MQ Producer - variable substitution for the URL does not work - you have to hardcode.
3) shared.xml for DB connections - variable substitution for connection properties does not work - have to hardcode things like Kerberos principal for a Hive/Impala connection.
Deployment Issues:
We had no issues deploying it.
Scalability Issues:
The robustness of this solution in a production cluster (>30 nodes) remains to be seen.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Buyer's Guide
Pentaho Data Integration and Analytics
December 2025
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: December 2025.
879,371 professionals have used our research since 2012.
DWH Specialist at a healthcare company with 1,001-5,000 employees
It is extremely flexible, it allows you to use variables/parameters for just about everything.
Valuable Features:
It is extremely flexible, it allows you to use variables/parameters for just about everything.
Improvements to My Organization:
It enables us to automate our reporting and ETL to a very high extent.
Room for Improvement:
The product itself is great, the biggest downside in my opinion is that it is hard to find (hire) people with expertise. Our experience with Pentaho software is that few people have the required expertise. Hiring additional resources for projects can be tough.
Our solution is that we tend to train our own people, it’s definitely not hard to learn, basically anyone with SQL knowledge and experience in another tool can learn using Pentaho Data Integration very easily, but you might end up training them yourselves.
Deployment Issues:
We had no issues with the deployment.
Stability Issues:
There was no issues with the stability.
Scalability Issues:
We had no issues scaling it for our needs.
Other Advice:
Train your own people!
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Global Consultant - Big Data, BI, Analytics, DWH & MDM at a tech consulting company with 1,001-5,000 employees
It helps to connect to various data sources including all available databases.
Valuable Features:
It's an ETL Platform including Big Data enablement. It's the most easy to use, extend and deploy. It helps to connect to various data sources including all available databases.
We also use Pentaho Analyzer which is an ad-hoc analytics tool built on Mondrian OLAP server that enables the end user to slice and dice the data in various patterns.
Improvements to My Organization:
We Implement Pentaho for data warehouses and BI features for our various customers. No software can give as complete functionality for fulfilling end user requirements as Pentaho. As well as this, Pentaho offers a flexible platform which enables us to extend the tool to any of the end user's requirement.
Another impressive feature is the Big Data implementation/integration is very quick and simple without the need to write any code. This enabled our clients to get maximum ROI with in a short period.
Room for Improvement:
Pentaho Dashboard Designer - needs an improvement on the various features of the Dashboards, since there are CTools available and which help to fulfil the gaps, but it needs developers involvement. A full fledged Dashboard designer to perform all the functions of what we do in CDE/CDF would be a great improvement for Pentaho.
Build Process - an inbuilt build process would provide an advantage to migrate between DEV-QA-UAT-PROD, currently it is mostly performed manually.
Data Profiling - including data profiling as part of PDI would be a great improvement to the platform and helps customers to save a lot of effort/cost of data quality.
Use of Solution:
We are Pentaho Service Providers and have implemented more than 130 projects in Pentaho. We are not direct customers of Pentaho but we recommend Pentaho to our clients if it meets their requirements.
Deployment Issues:
We had no issues with the deployment.
Stability Issues:
There have been no stability issues.
Scalability Issues:
We have not had any issues scaling it for our customers.
Initial Setup:
It is quick and easy to implement.
Cost and Licensing Advice:
Pentaho is available both in Community (Free) and Enterprise Edition (Subscription based) depending upon your budget.
Other Advice:
One of the best feature to lookout in this platform is its flexibility in enhancing or adapting to your requirements. Implementation can be very quick, you can enable few dashboards and analytics to your organization in a week's time.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Project Lead at a tech services company with 10,001+ employees
The best benefit of the product is that it is easy to use and to understand.
Valuable Features:
The best benefit of the product is that it is easy to use and to understand.
Improvements to My Organization:
We have a huge amount of data that needs to be cleaned and made more valuable for our organization. This Data Integration helps us to achieve that goal.
Room for Improvement:
I have used multiple versions of this product. The initial version we were on was v3.2 and we were had multiple issues, but currently don't find any issues as a blocker. In general, it would be good if we could get better performance from this product.
Deployment Issues:
We haven't had any issues with deployment.
Stability Issues:
We haven't had any issues with stability except for those described in the Areas for Improvement.
Scalability Issues:
We haven't had any issues with scalability.
Other Advice:
There are other products out there, but I feel that this is the best one.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Senior Data Engineer at a tech company with 501-1,000 employees
It enables a technical product manager to be able to write ETL jobs themselves.
What is most valuable?
The most valuable thing for me is that it enables a technical product manager to be able to write ETL jobs themselves, which saves developers time so that they can do more important things.
How has it helped my organization?
Now developers focus on improving it as a tool (since it's open source) and teach Project Managers about it. The Project Managers are the ones responsible for their own ETL jobs as they know what they want, so hence it's best for them to manage their own jobs.
What needs improvement?
Its performance can be improved so it will work better with Big Data. Also, sometimes it can be very buggy which keeps away some potential users.
For how long have I used the solution?
I've used it for two years.
What was my experience with deployment of the solution?
We have had no issues with the deployment.
What do I think about the stability of the solution?
The performance for Big Data needs to be improved.
What do I think about the scalability of the solution?
We have had no issues scaling it for our needs.
How are customer service and technical support?
There is a community that can support limited technical help. I'll give a 6 to the community since it's not very active.
Which solution did I use previously and why did I switch?
It was already in place when I joined the company.
How was the initial setup?
It's very easy to install.
What about the implementation team?
We did it in-hous. It's worth it to have someone in your company who knows Pentaho really well.
What was our ROI?
ROI is pretty good since it is kind of a major thing in our company.
What's my experience with pricing, setup cost, and licensing?
The only cost is the time it takes for the developer to get to know it.
What other advice do I have?
If your ETL jobs are small and straightforward, then this solution is definitely worth it.
Disclosure: My company has a business relationship with this vendor other than being a customer. The company is also contributing back to the open source project.
Data Architect & ETL Lead at a financial services firm with 1,001-5,000 employees
It doesn't have the capability to produce crosstab reports with formatting capabilities. It connects seamlessly to most commonly used data sources.
Valuable Features
It is a lightweight ETL tool that's easy to get started on. It connects seamlessly to most commonly used data sources.
Improvements to My Organization
The organization went with Pentaho ETL and Reporting solutions as cost effective products, as compared to competitors. The ETL part certainly met those objectives, along with serving the purpose.
Room for Improvement
Since there have already been newer versions, maybe some of these features are already fixed now. The most troublesome missing feature was the capability to produce crosstab reports with formatting capabilities in the BI Reporting product. The one annoyance that troubled us a lot was the fact that every step in a transformation that needed data, created its own data connection. With some data sources like Greenplum, this was a problem, because they have a limit on available number of connections.
Use of Solution
I used it for three years, from 2012 to 2015, and only stopped as I left the organization.
Deployment Issues
One issue with encountered constantly with PDI deployments was that the environment parameters for jobs had to be updated manually through the designer module 'Spoon'. Although the product has a feature of keeping Environment Variables outside Spoon, that didn't work for us, as we had one Development server used for Dev, QA and UAT.
Stability Issues
There were no issues with the stability.
Scalability Issues
We had no issues scaling it across the company as needed.
Customer Service and Technical Support
It's about average. Most of the help we got was through Google searches and Wiki pages. One time we had an issue with a feature - our version of PDI could not handle microseconds. The product owner came up with a solution, but instead of applying the patch, wanted to sell it to us for a fee.
Initial Setup
I am only aware of the client side setup which was simple enough. It was pretty much a one step installation process.
Implementation Team
It was done by an in-house team. A couple of issues we realized later were regarding memory configuration for the environment. This needs to be evaluated and fine tuned otherwise you can run into job failures with large amount of data. We ran into this issue with 'Commit' points and 'Sort' steps.
Other Solutions Considered
There was an evaluation performed, however I was not involved in it.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Graduate Teaching Assistant with 1,001-5,000 employees
We can perform transformations with data very quickly, and create reports indicating the KPI in the reporting tool.
Valuable Features:
The most valuable feature is that it can take inputs from all formats, e.g. CSV, text, Excel, JSON, Hadoop, etc. It has the potential to provide the output in the format we require, and we can also use many database connections. The transformations listed are also very useful and are very self-explanatory.
Also, the data mining feature which comes with the Pentaho business analytics suite was very useful to our project, especially the Weka plugin. We could score the records in the data warehouse, which helped in predicting the values.
Lastly, the GUI is very easy to use, so we can perform transformations with data very quickly, and create reports indicating the KPI in the reporting tool. I think that a company wouldn't need to spend more money on getting an experienced person to use this tool. All you need is a balance of experienced users and new trainees to get going. You can also start using the business analytics tool once you have integrated data. Coaching and applying this technology enterprise wide will enable your business to take data driven decisions.
Improvements to My Organization:
It makes it possible for the seniors to train new employees and junior staff very quickly. All that is needed is strong knowledge of ETL and BI/Big Data concepts to use this software.
Room for Improvement:
I would like to see the data visualization tool combined with BI so I can see how data is progressing through various stages. I do think that they are working on this already. I also found, in my case, that the statistical data input wasn't working (.sas7bdat input wasn't working).
Deployment Issues:
There have been no issues with the deployment.
Stability Issues:
It could have been the case that I may not have been doing it the right way.
Scalability Issues:
We have had no issues scaling it.
Cost and Licensing Advice:
I would say it is one of the most affordable tools to use for business intelligence.
Other Advice:
You should go for this tool to manage your data warehouse, but I would suggest that you look for other reporting tools, such as Tableau, which are more user friendly and provide great insights in the data.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros
sharing their opinions.
Updated: December 2025
Product Categories
Data IntegrationPopular Comparisons
Informatica Intelligent Data Management Cloud (IDMC)
Azure Data Factory
Informatica PowerCenter
Palantir Foundry
Oracle Data Integrator (ODI)
Qlik Talend Cloud
IBM InfoSphere DataStage
Oracle GoldenGate
SAP Data Services
Spring Cloud Data Flow
Alteryx Designer
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which ETL tool would you recommend to populate data from OLTP to OLAP?
- What do you think can be improved with Hitachi Lumada Data Integrations?
- What do you use Hitachi Lumada Data Integrations for most frequently?
- Is using Hitachi Lumada Data Integrations cost-effective? Did this solution save money for your company compared to other products?
- When evaluating Data Integration, what aspect do you think is the most important to look for?
- Microsoft SSIS vs. Informatica PowerCenter - which solution has better features?
- What are the best on-prem ETL tools?
- Which integration solution is best for a company that wants to integrate systems between sales, marketing, and project development operations systems?
- Experiences with Oracle GoldenGate vs. Oracle Data Integrator?
- What are the must-have features for a Data integration system?














- You could try multithreading feature of Pentaho to increase the performance, also there are lot many options available by using we can improve the performance of the Pentaho jobs and transformations,