Pentaho Data Integration and Analytics Valuable Features

Jefferson Hernandez

Data Architecture and Engineering Specialist at coprocenva

I find the drag and drop feature in Pentaho Data Integration very useful for integration. I can use JavaScript and Java in some notes for ETL development. It's easy to use and friendly, especially for larger data sets.

I use Pentaho for ETLs while relying on other tools like Power BI for data visualization and Microsoft Fabric for other tasks.

View full review »

Aqeel UR Rehman

BI Analyst at a computer software company with 51-200 employees

Pentaho Data Integration is easy to use, especially when transforming data. I can find the necessary steps for any required transformation, and it is very efficient for pivoting, such as transforming rows into columns. It is also free of cost and rich in available transformations, allowing extensive data manipulations.

View full review »

MARIA PILAR CANDA

Assosiate Partner at Autana Business Partners

One of the advantages is that it is easy to use, install, and start working with. For certain volumes of data, the solution is very efficient.

View full review »

Buyer's Guide

Pentaho Data Integration and Analytics

June 2025

Free Report: Pentaho Data Integration and Analytics Reviews and More

Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: June 2025.

DOWNLOAD NOW

859,957 professionals have used our research since 2012.

KrishnaBorusu

Senior Product Manager at a retailer with 10,001+ employees

It's a very lightweight tool. It can be plug-and-played easily and read data from multiple sources. It's a very good tool for small to large companies. People or customers can learn very easily to do the transformations for loading and migrating data. It's a fantastic tool in the open-source community.

When compared to other commercial ETL tools, this is a free tool where you can download and do multiple things that the commercial tools are doing. It's a pretty good tool when compared to other commercial tools. It's available in community and enterprise editions. It's very easy to use.

View full review »

Ahad Ahmed

BI developer at Jubilee Life Insurance Company Ltd

The solution offers features for data integration and migration. Pentaho Data Integration and Analytics allows the integration of multiple data sources into one. The product is user-friendly and intuitive to use for almost any business.

View full review »

Dan Peacock

Enterprise Data Architect at a manufacturing company with 201-500 employees

I'm a database guy, not a programmer, so Lumada's ability to create low-code pipelines without custom coding is crucial for me. I don't need to do any Java customization. I've had to write SQL scripts and occasionally a Javascript within it, but those are few and far between. I can do everything else within the tool itself. I got into databases because I was sick and tired of getting errors when I compiled something.

View full review »

Ridwan Saeful Rohman

Data Engineering Associate Manager at Zalora Group

This solution offers drag-and-drop tools with a minimal script. Even if you do not come from an IT background or have no software engineering experience, it is possible to use. It is quite intuitive, allowing you to drag and drop many functions.

The abstraction is quite good.

If you're familiar with the product itself, it has transformational abstractions and job abstractions. We can create smaller transformations in the Kettle transformation and larger ones in the Kettle job. Whether you're familiar with Python or have no scripting background at all, the product is useful.

For larger data, we use Spark.

The solution enables us to create pipelines with minimal manual or custom coding efforts. Even without advanced scripting experience, it is possible to create ETL tools. I recently trained a graduate from a management major who had no experience with SQL. Within three months, he became quite fluent, despite having no prior experience using ETL tools.

The importance of handling pipeline creation with minimal coding depends on the team. If we switch to Airflow, more time is needed to teach fluency in the ETL tool. With these product abstractions, I can compress the training time to three months. With Airflow, it would take more than six months to reach the same proficiency.

We use the solution's ability to develop and deploy data pipeline templates and reuse them.

The old system, created by someone prior to me in my organization, is still in use. It was developed a long time ago and is also used for some ad hoc reporting.

The ability to develop and deploy data pipeline templates once and reuse them is crucial to us. There are requests to create pipelines, which I then deploy on our server. The system needs to be robust enough to handle scheduling without failure.

We appreciate the automation. It's hard to imagine how data teams would work if everything were done on an ad hoc basis. Automation is essential. In my organization, 95% of our data distributions are automated, and only 5% are ad hoc. With this solution, we query data manually, process it on spreadsheets, and then distribute it within the organization. Robust automation is key.

We can easily deploy the solution on the cloud, specifically on AWS. I haven't tried it on another server. We deploy it on our AWS EC2, but we develop it on local computers, including both Windows and MacBooks.

I have personally used it on both. Developing on Windows is easier to navigate. On MacBooks, the display becomes problematic when enabling dark mode.

The solution has reduced our ETL development time compared to scripting. However, this largely depends on your experience.

View full review »

Tobias Johnson

Manager, Systems Development at a manufacturing company with 5,001-10,000 employees

It makes it pretty simple to do some fairly complicated things. Both I and some of our other BI developers have made stabs at using, for example, SQL Server Integration Services, and we found them a little bit frustrating compared to Data Integration. So, its ease of use is right up there.

Its performance is a pretty close second. It is a pretty highly performant system. Its query performance on large data sets is very good.

View full review »

PhilipRobinson

Senior Engineer at a comms service provider with 501-1,000 employees

The graphical nature of the development interface is most useful because we've got people with quite mixed skills in the team. We've got some very junior, apprentice-level people, and we've got support analysts who don't have an IT background. It allows us to have quite complicated data flows and embed logic in them. Rather than having to troll through lines and lines of code and try and work out what it's doing, you get a visual representation, which makes it quite easy for people with mixed skills to support and maintain the product. That's one side of it.

The other side is that it is quite a modular program. I've worked with other ETL tools, and it is quite difficult to get component reuse by using them. With tools like SSIS, you can develop your packages for moving data from one place to another, but it is really difficult to reuse a lot of it, so you have to implement the same code again. Pentaho seems quite adaptable to have reusable components or sections of code that you can use in different transformations, and that has helped us quite a lot.

One of the things that Pentaho does is that it has the virtual web services ability to expose a transformation as if it was a database connection; for instance, when you have a REST API that you want to be read by something like Tableau that needs a JDBC connection. Pentaho was really helpful in getting that driver enabled for us to do some proof of concept work on that approach.

View full review »

Ryan Ferdon

Senior Data Engineer at Burgiss

The fact that it's a low-code solution is valuable. It's good for more junior people who may not be as experienced with programming. In our case, we didn't have a huge data set. We had small and medium-sized data sets, so it worked fine.

The fact that it's open source is also helpful in that, if a junior engineer knows they are going to use it in a job, they can download it themselves, locally, for free, and use test data to learn it.

My role was to use it to write one feed that could facilitate multiple clients. Given that it was an open-source, free solution, it was pretty robust in what it could do. I could make lookup tables and databases and map different clients, and I could use the same feed for 30 clients or 50 clients. It got the job done for our use case.

In addition, you can install it wherever you need it. We had installed versions in the cloud and I also had local versions.

View full review »

Dale Bloom

Credit Risk Analytics Manager at MarketAxess

I'm at the early stages with Lumada, and I have been using the documentation quite a bit. The support has definitely been critical right now in terms of trying to find out more about the architectural elements that need to go in for pushing the Enterprise edition.

I absolutely love Hitachi. I'm one of the forefront supporters of Hitachi for my firm. It's so easy to integrate within our environments. In terms of being able to quickly build ETL jobs, transform, and then automate them, it's really easy to integrate throughout for data analytics.

I also appreciate the fact that it's not one of the low-code/no-code solutions. You can put as much JavaScript or another code into it as you want, and that makes it a really powerful tool.

View full review »

RicardoDíaz

COO / CTO at a tech services company with 11-50 employees

Pentaho from Hitachi is a suite of different tools. Pentaho Data Integration is a part of the suite, and I love the drag-and-drop functionality. It is the best.

Its drag-and-drop interface lets me and my team implement all the solutions that we need in our company very quickly. It's a very good tool for that.

View full review »

Anton Abrarov

Project Leader at a mining and metals company with 10,001+ employees

It has a really friendly user interface, which is its main feature. The process of automating or combining SQL code with some databases and doing the automation is great and really convenient.

View full review »

reviewer995501455

Solution Integration Consultant II at a tech vendor with 201-500 employees

The metadata injection feature is the most valuable because we have used it extensively to build frameworks, where we have used it to dynamically generate code based on different configurations. If you want to make a change at all, you do not need to touch the actual code. You just need to make some configuration changes and the framework will dynamically generate code for that as per your configuration.

We have a UI where we can create our ETL pipelines as needed, which is a key advantage for us. This is very important because it reduces the time to develop for a given project. When you need to build the whole thing using code, you need to do multiple rounds of testing. Therefore, it helps us to save some effort on the QA side.

Hitachi Vantara's roadmap has a pretty good list of features that they have been releasing with every new version. For instance, in version 9, they have included metadata injection for some of the steps. The most important elements of this roadmap to our organization’s strategy are the data-driven approach that this product is taking and the fact that we have a very low-code platform. Combining these two is what gives us the flexibility to utilize this software to enhance our product.

View full review »

José Orlando Maia

Data Engineer at a tech vendor with 1,001-5,000 employees

The features that I use the most are Microsoft Excel table input, S3 CSV Input, and CSV input. Today, the features that are more valuable to me are the table input, then the CSV input. These both are very important. We extract data from the table system for our transactional databases, which are commonly used. We also use the CSV input to get data from AWS S3 and our data lake.

In Lumada, we can parallelize the steps. The performance to query the databases for me is good, especially for transactional databases. Because Lumada uses Java, we can adjust the amount of memory that we want to use to do transformations. So, it is accessible. It's possible to set up the amount of memory that we want to use in the Java VM, which is good. Therefore, Lumada is good, especially with transactional database extraction. It has good performance, not higher performance, but good performance as we query data, and it is possible to parallelize the query. For example, if we have three or four servers to get the data, then we can retrieve the data at the same time, in parallel, in these databases. This is good because we don't need to wait while one of the extractions finishes.

Using Lumada, we don't need to do many manual transformations because we have a native company for many of our transformations. Thus, Lumada is a low-code tool to gather data from SQL, Python, or other transformation tools.

View full review »

Jacopo Zaccariotto

Head of Data Engineering at InfoCert

Pentaho is flexible with a drag-and-drop interface that makes it easier to use than some other ETL products. For example, the full stack we are using in AWS does not have drag-and-drop functionality. Pentaho was a good option at the start of this journey.

We can schedule job execution in the BA Server, which is the front-end product we're using right now. That scheduling interface is nice.

View full review »

reviewer1855218

Data Architect at a consumer goods company with 1,001-5,000 employees

I can use Python, which is open-source, and I can run other scripts, including Linux scripts. It's user-friendly for running any object-based language. That's a very important feature because we live in a world of open-source. With open-source on the table, I am in a position to transform the data where it's actually being moved from one environment to another.

Whether we are working with structured or unstructured data, the tool has been helpful. We are actually able to extend it to read JSON data by creating some Java components.

The solution gives us the flexibility to deploy it in any environment, including on-premises or in the cloud. That is another very important feature.

View full review »

Renan Guedert

Business Intelligence Specialist at a recruiting/HR firm with 11-50 employees

It has many resourceful things. It has a variety of the things that you can do. It is also pretty open, since you can put in a Python script or JavaScript for everything. If you don't have the native tool on the application, you can build your own using scripts. You can build your other steps and jobs on the application. The liberty of the application has been pretty good.

Lumada enables us to create pipelines with minimal manual coding efforts, which is the most important thing. When creating a pipeline, you can see which steps are failing in the process. You can keep up the process and debug, if you have problems. So, it creates a good, visual pipeline that makes it easy to understand what you are doing during the entire process.

View full review »

Stephen Knox

Lead, Data and BI Architect at a financial services firm with 201-500 employees

Because it comes from an open-source background, it has so many different plugins. It is just extremely broad in what it can do. I appreciate that it has a very broad, wide spectrum of things that it can connect to and do. It has been around for a while, so it is mature and has a lot of things built into it. That is the biggest thing.

The visual nature of its development is a big plus. You don't need to have very strong developers to be able to work with it.

We often have to drop down to JavaScript, but that is fine. I appreciate that it has the capability built-in. When you need to, you can drop down to a scripting language. This is important to us.

View full review »

reviewer1751571

Systems Analyst at a university with 5,001-10,000 employees

The ETL is definitely an awesome feature of the product. It's very easy and quick to use. Once you understand the way it works it's pretty robust.

Lumada Data Integration requires minimal coding. You can do more complex coding if you want to, because it has a scripts option that you can add as a feature, but we haven't found a need to do that yet. We just use what's available, the steps that they have, and that is sufficient for our needs at this point. It makes it easier for other developers to look at the things that we have developed and to understand them quicker, whereas if you have complex coding it's harder to hand off to other people. Being able to transition something to another developer, and having that person pick it up quicker than if there were custom scripting, is an advantage.

In addition, the solution's ability to quickly and effectively solve issues we've brought up has been great. We've been able to use all the available features.

Among them is the ability to develop and deploy data pipeline templates once and reuse them. The fact that it enables us to leverage metadata to automate data pipeline templates and reuse them is definitely one of the features that we like the best. The metadata injection is helpful because it reduces the need to create and maintain additional ETLs. If we didn't have that feature, we would have lots of duplicated ETLs that we would have to create and maintain. The data pipeline templates have definitely been helpful when looking at productivity and costs. The automation of data pipeline templates has also been helpful in scaling the onboarding of data.

View full review »

Eric Smets

System Engineer at a tech services company with 11-50 employees

The graphical user interface is quite okay. That's the most important feature. In addition, the different types of stores and data formats that can be accessed and transferred are an important component.

We also haven't had to create any custom Java code. Almost everywhere it's SQL, so it's done in the pipeline and the configuration. That means you can offload the work to people who, while they are not less experienced, are less technical when it comes to logic. It's more about the business logic and less about the programming logic and that's really important.

Another important feature is that you can deploy it in any environment, whether it's on-premises or cloud, because you can reuse your steps. When it comes to adding to your data processing capacity dynamically that's key because when you have new workflows you have to test them. When you have to do it on a different environment, like your production environment, it's really important.

View full review »

reviewer1872000

Senior Data Analyst at a tech services company with 51-200 employees

One of the most valuable features is the ability to create many API integrations. I'm always working with advertising agents and using Facebook and Instagram to do campaigns. We use Pentaho to get the results from these campaigns and to create dashboards to analyze the results.

I'm working with large data sets. One of the clients I'm working with is a large credit card company and the database from this client is very large. Pentaho allows me to query large data sets without affecting its performance.

I use Pentaho with Jenkins to schedule the jobs. I'm using the jobs and transformations in Pentaho to create many links.

I always find ways to have minimal code and create the processes with many parameters. I am able to reuse processes that I have created before.

Creating jobs and putting them into production, as well as the visibility that Pentaho gives, are both very simple.

View full review »

Aqeel UR Rehman

BI Analyst at a computer software company with 51-200 employees

The best feature is that it's simple to use. There are simple data transformation steps available, such as trimming data or performing different types of replacement.

This solution allows us to create pipelines using a minimal amount of custom coding. Anyone in the company can do so, and it's just a simple step. If any coding is required then we can use JavaScript.

View full review »

Rodrigo Vazquez

CDE & BI Delivery Manager at a tech services company with 501-1,000 employees

A valuable feature is the number of connectors that I have. So, I can connect to different databases, origins of data, files, and SFTP. With SQL and NoSQL databases, I can connect, put it in my instructions, send it to my staging area, and create the format. Thus, I can format all my data in just one process.

View full review »

Michel Philippenko

Project Manager at a computer software company with 51-200 employees

The ETL feature was the most valuable to me. I like it very much. It was very good.

View full review »

reviewer1772286

Director of Software Engineering at a healthcare company with 10,001+ employees

There is an end-to-end flow, where a user can say, "I am looking at this field and want to slice and dice my data based on these parameters." That flexibility is provided by Pentaho. This minimal manual coding is important to us.

View full review »

Tracy Gettings

Analytics Team Leader at HealtheLink

We're using the PDI and the repository function, and they give us the ability to easily generate reporting and output, and to access data. We also like the ability to schedule.

View full review »

Krisjanis Muskars

Data Architect at a tech services company with 1,001-5,000 employees

One of the valuable features is the ability to use PL/SQL statements inside the data transformations and jobs.

View full review »

Oscar Mejia

IT-Services Manager & Solution Architect at Stratis

Running itself with the ETL was very fast. It makes it so that it is very easy to transform the information we have. We found that very useful.

The UI is very easy to understand and learn.

The solution offers lots of documentation.

The initial setup is easy.

It's my understanding that the product can scale.

We've found the solution to be stable.

The product is free to use if you choose the free version.

View full review »

reviewer1510395

Technical Manager at a computer software company with 51-200 employees

Pentaho Data Integration is quite simple to learn, and there is a lot of information available online. It is not a steep learning curve. It also integrates easily with other databases and that is great. We use the provided documentation, which is a simple process for integration compared to other proprietary tools.

View full review »

reviewer1384743

Specialist in Relational Databases and Nosql at a computer software company with 5,001-10,000 employees

One important feature, in my opinion, is the Metadata Injection. It gives flexibility to the scripts due to the fact that the scripts don't depend on a fixed structure or a fixed data model. Instead, you can develop transformations that are not dependant on the fixed structure or data models.

Let me give a pair of examples. Sometimes your tables change, adding fields or dropping some of them. When this happens if you have a transformation without using Metadata Injection your transformation fails or doesn't manage the whole info from the table. If you use Metadata Injection instead, the new fields are included and the dropped columns are excluded from the transformation. Other times you have a complex transformation to apply to a lot of different tables. Traditionally, without the Metadata Injection feature, you had to repeat the transformation for each table, adapting the transformation to the concrete structure of each table. Fortunately, with the Metadata Injection, the same transformation is valid for all the tables you want to treat. A little bit effort gives you a great benefit.

Furthermore, the solution has a free to use community version.

The solution is easy to set up, very intuitive, clear to understand and easy to maintain.

View full review »

it_user254223

Project Manager - Business Intelligence at www.datademy.es

Easy to use
Development of the product
A lot of predefined steps
Good open source option

View full review »

it_user172275

Consultant at a comms service provider with 11-50 employees

It's very simple compared to other products out there.

View full review »

it_user369171

Brazil IT Coordinator at a transportation company with 1,001-5,000 employees

Data transformation within Pentaho is a nice feature that they have and that I value.

View full review »

it_user402600

Senior Consultant at a financial services firm with 10,001+ employees

It allows for rapid prototyping of a wide array of ETL workloads.

View full review »

it_user426117

DWH Specialist at a healthcare company with 1,001-5,000 employees

It is extremely flexible, it allows you to use variables/parameters for just about everything.

View full review »

it_user426030

Global Consultant - Big Data, BI, Analytics, DWH & MDM at a tech consulting company with 1,001-5,000 employees

It's an ETL Platform including Big Data enablement. It's the most easy to use, extend and deploy. It helps to connect to various data sources including all available databases.

We also use Pentaho Analyzer which is an ad-hoc analytics tool built on Mondrian OLAP server that enables the end user to slice and dice the data in various patterns.

View full review »

it_user415695

Project Lead at a tech services company with 10,001+ employees

The best benefit of the product is that it is easy to use and to understand.

View full review »

it_user414117

Senior Data Engineer at a tech company with 501-1,000 employees

The most valuable thing for me is that it enables a technical product manager to be able to write ETL jobs themselves, which saves developers time so that they can do more important things.

View full review »

it_user373128

Data Architect & ETL Lead at a financial services firm with 1,001-5,000 employees

It is a lightweight ETL tool that's easy to get started on. It connects seamlessly to most commonly used data sources.

View full review »

it_user396720

Graduate Teaching Assistant with 1,001-5,000 employees

The most valuable feature is that it can take inputs from all formats, e.g. CSV, text, Excel, JSON, Hadoop, etc. It has the potential to provide the output in the format we require, and we can also use many database connections. The transformations listed are also very useful and are very self-explanatory.

Also, the data mining feature which comes with the Pentaho business analytics suite was very useful to our project, especially the Weka plugin. We could score the records in the data warehouse, which helped in predicting the values.

Lastly, the GUI is very easy to use, so we can perform transformations with data very quickly, and create reports indicating the KPI in the reporting tool. I think that a company wouldn't need to spend more money on getting an experienced person to use this tool. All you need is a balance of experienced users and new trainees to get going. You can also start using the business analytics tool once you have integrated data. Coaching and applying this technology enterprise wide will enable your business to take data driven decisions.

View full review »

it_user391695

Business Intelligence Consultant at Sanmargar Team

First of all, the ease of deployment. I’m pretty sure that almost anyone could do simple transformations without having any knowledge of IT. Thanks to its graphical interface this tool is just drag and click. Another advantage, is that it fits everywhere. You can connect it to Big Data sources, relational databases, and all types of files. If the developer missed something, you can try finding it in the marketplace or quickly develop it yourself, because it is opensource.

View full review »

it_user8199

BI developer - (Jaspersoft/Pentaho/Pentaho C-Tools/Kettle/Talend/Data warehouse) at a tech services company with 501-1,000 employees

Best in performance in both hosted and local environments
Best open source warehouse solution using the Kimball method
Best Big Data discovery components and BI
Simple and easy to understand and work with
Complete cost effective solutions
Best support in forums
Best visualizations in the market - Protovis & D3
Best custom interactivity features
Best product for embedded BI
Best for mobile responsive technology integrated, i.e. bootstrap
Best support in forums
Best documentation - Open API's

View full review »

it_user386202

Business Intelligence Supervisor at a manufacturing company with 501-1,000 employees

Fast
Easy to learn and then teach to our team
It integrates with everything on market

View full review »

it_user392367

Research Assistant at a university with 1,001-5,000 employees

I would say that user-defined class operator is currently very valuable to me. Other than that native connectivity to hadoop (MapR), analytical databases and enterprise systems are really important to me these days.

View full review »

it_user384993

Datawarehouse Administrator at a tech services company with 501-1,000 employees

Its ability of blending data and the dashboarding with C*TOOLS for creating responsive single page apps.

View full review »

it_user384984

Sr BI Administrator at a healthcare company with 1,001-5,000 employees

It allows for very quick development due to the intuitive interface. Compared to other ETL tools like Powercenter, SSIS and SAS DI Studio it excels in rapid development cycles.

View full review »

it_user382572

Pentaho Consultant at a comms service provider with 10,001+ employees

It is a very good open source ETL tool that's capable of connecting to most databases. It has a lot of functions that makes transforming the data very easy. Also, because it is an open source product, it is very easy to build your own solution with it.

View full review »

it_user376926

Data Developer at a tech services company with 10,001+ employees

Pentaho Kettle has a very intuitive and easy to use graphical user interface (GUI)
It is possible to understand how to develop an ETL solution even when using it for the first time
The Community Edition is free and very efficient
They have versions for Windows, Linux and Mac
Large selection of options.

View full review »

it_user375219

Consultant at a tech vendor with 501-1,000 employees

It has a nice GUI that anyone can learn to use in just a few days with minimal training.
It has great support for big data technologies; Pentaho 5.3 comes with support for HBase, Pig, Oozie, and Hadoop distribution support.

View full review »

it_user164838

CEO with 51-200 employees

Ease of use, stability, graphical interface, small amount of "voodoo" and cost.

View full review »

Ricardo Díaz

COO at a tech services company with 11-50 employees

Easy to use, support for all databases (jdbc and odbc connection), xls , csv, files, txt, SAS, R View full review »

Buyer's Guide

Pentaho Data Integration and Analytics

June 2025

Free Report: Pentaho Data Integration and Analytics Reviews and More

Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: June 2025.

DOWNLOAD NOW

859,957 professionals have used our research since 2012.