It is a lightweight ETL tool that's easy to get started on. It connects seamlessly to most commonly used data sources.
Data Architect & ETL Lead at a financial services firm with 1,001-5,000 employees
It doesn't have the capability to produce crosstab reports with formatting capabilities. It connects seamlessly to most commonly used data sources.
What is most valuable?
How has it helped my organization?
The organization went with Pentaho ETL and Reporting solutions as cost effective products, as compared to competitors. The ETL part certainly met those objectives, along with serving the purpose.
What needs improvement?
Since there have already been newer versions, maybe some of these features are already fixed now. The most troublesome missing feature was the capability to produce crosstab reports with formatting capabilities in the BI Reporting product. The one annoyance that troubled us a lot was the fact that every step in a transformation that needed data, created its own data connection. With some data sources like Greenplum, this was a problem, because they have a limit on available number of connections.
For how long have I used the solution?
I used it for three years, from 2012 to 2015, and only stopped as I left the organization.
Buyer's Guide
Pentaho Data Integration and Analytics
May 2025

Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: May 2025.
856,873 professionals have used our research since 2012.
What was my experience with deployment of the solution?
One issue with encountered constantly with PDI deployments was that the environment parameters for jobs had to be updated manually through the designer module 'Spoon'. Although the product has a feature of keeping Environment Variables outside Spoon, that didn't work for us, as we had one Development server used for Dev, QA and UAT.
What do I think about the stability of the solution?
There were no issues with the stability.
What do I think about the scalability of the solution?
We had no issues scaling it across the company as needed.
How are customer service and support?
It's about average. Most of the help we got was through Google searches and Wiki pages. One time we had an issue with a feature - our version of PDI could not handle microseconds. The product owner came up with a solution, but instead of applying the patch, wanted to sell it to us for a fee.
How was the initial setup?
I am only aware of the client side setup which was simple enough. It was pretty much a one step installation process.
What about the implementation team?
It was done by an in-house team. A couple of issues we realized later were regarding memory configuration for the environment. This needs to be evaluated and fine tuned otherwise you can run into job failures with large amount of data. We ran into this issue with 'Commit' points and 'Sort' steps.
Which other solutions did I evaluate?
There was an evaluation performed, however I was not involved in it.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Graduate Teaching Assistant with 1,001-5,000 employees
We can perform transformations with data very quickly, and create reports indicating the KPI in the reporting tool.
Valuable Features:
The most valuable feature is that it can take inputs from all formats, e.g. CSV, text, Excel, JSON, Hadoop, etc. It has the potential to provide the output in the format we require, and we can also use many database connections. The transformations listed are also very useful and are very self-explanatory.
Also, the data mining feature which comes with the Pentaho business analytics suite was very useful to our project, especially the Weka plugin. We could score the records in the data warehouse, which helped in predicting the values.
Lastly, the GUI is very easy to use, so we can perform transformations with data very quickly, and create reports indicating the KPI in the reporting tool. I think that a company wouldn't need to spend more money on getting an experienced person to use this tool. All you need is a balance of experienced users and new trainees to get going. You can also start using the business analytics tool once you have integrated data. Coaching and applying this technology enterprise wide will enable your business to take data driven decisions.
Improvements to My Organization:
It makes it possible for the seniors to train new employees and junior staff very quickly. All that is needed is strong knowledge of ETL and BI/Big Data concepts to use this software.
Room for Improvement:
I would like to see the data visualization tool combined with BI so I can see how data is progressing through various stages. I do think that they are working on this already. I also found, in my case, that the statistical data input wasn't working (.sas7bdat input wasn't working).
Deployment Issues:
There have been no issues with the deployment.
Stability Issues:
It could have been the case that I may not have been doing it the right way.
Scalability Issues:
We have had no issues scaling it.
Cost and Licensing Advice:
I would say it is one of the most affordable tools to use for business intelligence.
Other Advice:
You should go for this tool to manage your data warehouse, but I would suggest that you look for other reporting tools, such as Tableau, which are more user friendly and provide great insights in the data.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Buyer's Guide
Pentaho Data Integration and Analytics
May 2025

Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: May 2025.
856,873 professionals have used our research since 2012.
Business Intelligence Consultant at Sanmargar Team
We use it almost everywhere, for creating data marts, data warehouses, and implementing BI reporting tools.
Valuable Features:
First of all, the ease of deployment. I’m pretty sure that almost anyone could do simple transformations without having any knowledge of IT. Thanks to its graphical interface this tool is just drag and click. Another advantage, is that it fits everywhere. You can connect it to Big Data sources, relational databases, and all types of files. If the developer missed something, you can try finding it in the marketplace or quickly develop it yourself, because it is opensource.
Improvements to My Organization:
We use it almost everywhere, for creating data marts, data warehouses, and implementing BI reporting tools. We also build our Customer Centralized File and Data Quality Studio using it. What’s more, we use it for small solutions too, i.e. if we want to quickly export data from database to .xlsx. We also develop our own plugins for PDI and put them into the marketplace.
Room for Improvement:
A big advantage, but also a problem, is that it is open source. Almost anyone can develop their own Pentaho code and release it. Now, Pentaho is a little messy, and some parts of it are super new and some look like it were developed at the beging. I think that developers should stop inventing new parts of it, and it can take a while to clean the code and optimize the older parts of it. Some old plugins, after a long time, still doesn’t work properly enough.
Use of Solution:
I've been using it for four years, and when I started using it I was in college. I quickly found that PDI with my text search analytic plug-in is useful for preparing notes for classes. When I was bored I came up with a funny tool. It was collecting data from all my roommates about what they need from shop and it was sending notifications to peoples phones who were going to the shop.
Deployment Issues:
We have never had any problems with deployment.
Stability Issues:
There are some with stability. As I said before there are some small bugs but it’s Pentaho you can always find workaround for it.
Scalability Issues:
With the Pentaho Community version you just download it, unpack, and it should be running. If not you should also install Java.
Customer Service:
Customer service isn’t needed. Every problem solution is on the internet. If not, you can post it to community forum and you will get an immediate answer, but I have never had to post a new topic.
Initial Setup:
Straightforward. You just need to unzip file and you can already run it. There is also some setup if you need. It’s very simple you just need to edit three files in notepad.
Implementation Team:
I did this myself and we do it for other companies. All installations are easy, and you do not need to be an IT magician.
Cost and Licensing Advice:
There is a Community Edition which is free. There is also an Enterprise licence but the price varies depending on the server hardware configuration and the purpose of use (BigData, Hadoop, etc.).
Other Solutions Considered:
I had the chance to test SAS Data Integration but I didn’t fall in love with it like I did with PDI. I think that PDI is easier to use and you can do much more with PDI than with SAS.
Other Advice:
The tool is excellent, and almost everyone can use it. You just need to take it out of the box and run. There is no limit to the application – you can do everything with it. However, it still has a lot of faults. Not every component runs as you wish to. Always look for solutions on the Internet. There are many problems and build transformations/jobs that are already fixed.
Disclosure: My company has a business relationship with this vendor other than being a customer: Company where I work Sanmargar Team is a reseller of this solution and a Pentaho partner in Poland.
BI developer - (Jaspersoft/Pentaho/Pentaho C-Tools/Kettle/Talend/Data warehouse) at a tech services company with 501-1,000 employees
You can get ETL, reporting, analysis, and analytics in a single shop.
Valuable Features:
- Best in performance in both hosted and local environments
- Best open source warehouse solution using the Kimball method
- Best Big Data discovery components and BI
- Simple and easy to understand and work with
- Complete cost effective solutions
- Best support in forums
- Best visualizations in the market - Protovis & D3
- Best custom interactivity features
- Best product for embedded BI
- Best for mobile responsive technology integrated, i.e. bootstrap
- Best support in forums
- Best documentation - Open API's
Improvements to My Organization:
- It's reduced our costs
- With self-service we can save time
- Open plug-ins contributors
Room for Improvement:
- Searching repository for reports or dashboards
- Repository UI
- Loading of percentage reports and dashboards
Other Advice:
It has a fancy look, the best visualization libraries and is open source. You can get ETL, reporting, analysis, and analytics in a single shop. Small, mid sized and enterprises such as CA have been implementing Pentaho.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Business Intelligence Supervisor at a manufacturing company with 501-1,000 employees
We have performed a lot of setups since we started using it, and have had no issues.
Valuable Features
- Fast
- Easy to learn and then teach to our team
- It integrates with everything on market
Improvements to My Organization
We never used a data integration or BI platform before, and struggled with lots of Excel spreadsheets and CSV files. So when we first used Pentaho to automate a data-integration flow, we were stunned with how fast and how easy it was. We are very productive today thanks to that piece of software integration our data and the platform serving the processed data to our users.
Room for Improvement
An easier upgrade process for community tools would be nice. They also need to update the ad-hoc reports tool, as the one available is outdated. To get round this, we are using Excel as the output for some reports.
Use of Solution
For more then 4 year
Deployment Issues
Upgrading the bi-platform that is a little pain, but the rest is easy to use and to set-up.
Stability Issues
There have been no issues with the stability.
Scalability Issues
There have been no issues with the scalability.
Customer Service and Technical Support
We use only the community edition, so we only consult the internet for help. There is a strong community of users all over the world. Here in Brazil, the e-mail list is very helpful.
Initial Setup
We have performed a lot of setups since we started using Pentaho, and there have been no issues there.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Research Assistant at a university with 1,001-5,000 employees
The user-defined class operator is currently very valuable to me.
Valuable Features:
I would say that user-defined class operator is currently very valuable to me. Other than that native connectivity to hadoop (MapR), analytical databases and enterprise systems are really important to me these days.
Improvements to My Organization:
I am a researcher in the field of data integration, and I am using this tool as a sandbox. I would say, because it is open source and high availability of forums and support has made my work really easy. Also, the reporting and analysis functionality provided gives me more freedom to test my test cases and results.
Room for Improvement:
I would like to have more languages/scripts supported in user-defined classes. Right now the options are very limited. I know, if I want to do core programming I can always import my classes/jars into it, but it would be really nice to have more functionality in terms of programming language and support in UD classes/operator. Besides that, different parallel algorithms/skeletons would be great. For example, it could suggest which parallel algorithm I should use on a particular operator or a set of operators. It would be really cool to have such a functionality.
Other Advice:
If you are looking to integrate unstructured or semi-structured datasets with some parallelization, choose this tool. Parallelization supported by Pentaho Data Integration is a functionality that is really nice to have . You can choose which activities you want to parallelize and that's it. You do not have to write parallel code or something, as it does this job for you, which is awesome for a not so good programmer such as myself.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Datawarehouse Administrator at a tech services company with 501-1,000 employees
We have been able to expose data services through the use of CDA relying on the same database as the reporting tools.
What is most valuable?
Its ability of blending data and the dashboarding with C*TOOLS for creating responsive single page apps.
How has it helped my organization?
We have been able to expose data services through the use of CDA relying on the same database as the reporting tools, thus avoiding inconsistencies among the data shown by reports and data acquired by external systems.
What needs improvement?
The User Console, aka workspace, and the development of dashboards. They work but they require some programmer skills. This means a continous application management on behalf of IT dept.
For how long have I used the solution?
I've used it for six years.
What was my experience with deployment of the solution?
There were issues, but they were solved with help from tech support.
What do I think about the stability of the solution?
There were issues, but they were solved with help from tech support.
What do I think about the scalability of the solution?
There were issues, but they were solved with help from tech support.
How are customer service and technical support?
It depends, as it takes usually a long time, and some answers are just a way to acquire time and the commitment seems poor. However, when you finally get to an engineer your are likely to have your problem solved in a few days.
Which solution did I use previously and why did I switch?
We used Microstrategy, Cognos, and Business Objects. The pricing was the key driver, but also the open source licensing which made us think we would have been able to develop on our own improvements. This didn't happen because primarily of the few resources we effectively put on development.
How was the initial setup?
It's complex because of the lack of documentation and the absence of an installer for Linux.
What about the implementation team?
We did it in-house one, and we had to hire some developers for some months with Java skills.
What other advice do I have?
Have a vision, and do not let yourself be guided by the technology.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Sr BI Administrator at a healthcare company with 1,001-5,000 employees
It gave ‘out-of-the-box’ widgets for reading XML and Json interfaces which would otherwise have to be build from scratch.
What is most valuable?
It allows for very quick development due to the intuitive interface. Compared to other ETL tools like Powercenter, SSIS and SAS DI Studio it excels in rapid development cycles.
How has it helped my organization?
It gave ‘out-of-the-box’ widgets for reading XML and JSON interfaces which would otherwise have to be build from scratch.
What needs improvement?
PDI excels at the development part. Administration and monitoring are pretty weak and basic. But, I must say I have been spoiled with the great capabilities that Powercenter offers ‘out-of-the-box’ The Pentaho development team seems to rely very heavily on Linux/Unix for the admin part. Debugging could be enhanced with better feed-back.
For how long have I used the solution?
We used PDI 4.3 in a pilot against SSIS during 2013 for a couple of months. In 2014 I have the 4.4 version on a daily basis within a production environment for exactly one year. We also looked into the commercial front-end solution and found this to be too much of a collection of loosely connected applications
What was my experience with deployment of the solution?
There have been no deployment issues.
What do I think about the stability of the solution?
Stability is a bit of an issue. The GUI quite often ‘freezes’ and the is no alternative to killing the session. Very frequent saving is in order
What do I think about the scalability of the solution?
There have been no issues with scalability.
How are customer service and technical support?
The community site is pretty brilliant. Every technical component is handled on its own Wiki page. You can even look into the scrum backlog of the dev. team. Absolutely amazing.
Which solution did I use previously and why did I switch?
Heavy ETL solutions were simply too expensive and the SSIS alternative is simply too hidious to consider. It took at least three times as much time to develop the same ETL proces with SSIS as compared to Pentaho. (And having to deal with the abject Microsoft ‘debugging’.
How was the initial setup?
Incredibily easy. Just unpack, make sure you got the right drivers installed, and beware of other Java applications running.
What about the implementation team?
We simply did everything ourselves, with a little aid from the community.
What other advice do I have?
Make sure Pentaho solutions are still available as they were prior to the commercial take-over. Administration is not the best developed component . The ETL is brilliant. Make sure that the admin part is covered.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros
sharing their opinions.
Updated: May 2025
Product Categories
Data IntegrationPopular Comparisons
Informatica Intelligent Data Management Cloud (IDMC)
Azure Data Factory
Informatica PowerCenter
Oracle Data Integrator (ODI)
Palantir Foundry
IBM InfoSphere DataStage
Talend Open Studio
Oracle GoldenGate
SAP Data Services
Alteryx Designer
Spring Cloud Data Flow
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which ETL tool would you recommend to populate data from OLTP to OLAP?
- What do you think can be improved with Hitachi Lumada Data Integrations?
- What do you use Hitachi Lumada Data Integrations for most frequently?
- Is using Hitachi Lumada Data Integrations cost-effective? Did this solution save money for your company compared to other products?
- When evaluating Data Integration, what aspect do you think is the most important to look for?
- Microsoft SSIS vs. Informatica PowerCenter - which solution has better features?
- What are the best on-prem ETL tools?
- Which integration solution is best for a company that wants to integrate systems between sales, marketing, and project development operations systems?
- Experiences with Oracle GoldenGate vs. Oracle Data Integrator?
- Should we choose Data Hub or GoldenGate?