What is your primary use case for Pentaho Data Integration?

How do you or your organization use this solution?

Please share with us so that your peers can learn from your experiences.

Thank you!

Julia Miller

Community Director at PeerSpot

Buyer's Guide

Pentaho Data Integration and Analytics

May 2026

Get the report

Helped 896,942 peers since 2012

33 Answers

Last answered Feb 18, 2026

Adrián Moreiras

Data Analyst at Telefonica Digital

Real User

Top 5

Feb 18, 2026

I have been using Pentaho Data Integration and Analytics for approximately four years. My main use case for Pentaho Data Integration and Analytics is data transformation related to the telecommunications sector. For example, I send notifications via SMS depending on the type of mobile plan contracted with the telecom company. Different flows are based on user segmentation, and therefore different SMS messages with different content are sent. First, raw data comes in as an unpolished block of data where different values can appear in lowercase, uppercase, with an underscore, without an underscore, with a space, and so on. I perform basic data processing first. Once the data is in a uniform form, it goes through a series of filters where I start to categorize the user by the type of telecom plan contracted. Then there is another component in which I have the different SMS codes and texts depending on a key-value pair, which is the type of contract the user has. I match what comes from the raw data with how it is correctly categorized according to the filters and with the different type of text that users will receive, for example, to do the proration of plan changes since rate changes are made annually and the amount is increased. Pentaho Data Integration and Analytics has been very useful for me in user segmentation when rate changes were made because prices change in the telecommunications sector, and depending on each user's billing cycle, I have to do proration for different days. One month might have eighteen days with the old plan and thirteen days with the new plan. This has been a very important use case in my career. I have also worked with Net Promoter Score, which is very important in the telecommunications sector. I have worked with all the information from all the segments, all the regions, divided by province, by product contracted, divided by operating system, by everything. I have worked on Net Promoter Score with Pentaho Data Integration and Analytics, and it has been very easy to handle everything there because it was quite easy to work with.

Search for a product comparison

Jayakrishnanmg MG

Data architect at a tech vendor with 10,001+ employees

MSP

Top 10

Nov 29, 2025

Pentaho Data Integration and Analytics is my primary tool for ETL activities, where I extract data from different kinds of sources. We have two editions available: Community Edition and Enterprise Edition. Since Community Edition is open-source with limited features, we opted for the Enterprise Edition. I was able to perform transformations after extracting data from multiple heterogeneous sources, applying operations such as merge, SCDs, data cleansing, and data transformations according to client requirements, and finally loading the processed data to cloud solutions, cloud storage, or data warehouses in either SQL or NoSQL databases like MongoDB and Cassandra.I used Pentaho Data Integration and Analytics for a specific project with a client called Walmart, a retail chain located in the US, Mexico, and Canada, where I aimed to build a centralized data warehouse for daily sales analytics. The data came from various sources including online sales, Excel sheets, REST API, MySQL DB, CSV, and flat files, requiring necessary transformations dictated by the transformation sheet provided by the client, which detailed how to handle the input source data, the required cleansing, and the transformations needed. This led to a recurring activity where I created pipelines with Pentaho Data Integration and Analytics, designing jobs and transformations, and eventually loading the data into Azure SQL DB, NoSQL databases, or Snowflake, while scheduling the ETL pipelines based on activity frequency—daily, weekly, or monthly. After creating jobs and transformations for each activity with Pentaho Data Integration and Analytics, we scheduled these jobs, allowing the Business Intelligence team to review the loaded data in the data warehouses. We also used Pentaho Report Designer for conducting predictive analysis, such as forecasting sales over the next five years based on prior data, enabling us to conduct in-depth analysis for future trends.

Jefferson Hernandez

Data Architecture and Engineering Specialist at coprocenva

User

Top 5Leaderboard

Dec 3, 2024

I use Pentaho Data Integration for data integration and ETL processes. I developed with Pentaho from CoproSema. I work on machine learning projects using Pentaho in different projects, such as forecasting for clients who have not paid their credit.

Aqeel UR Rehman

BI Analyst at a computer software company with 51-200 employees

Real User

Top 5

Nov 28, 2024

Currently, I am using Pentaho Data Integration for transforming data and then loading it into different platforms. Sometimes, I use it in conjunction with AWS, particularly S3 and Redshift, to execute the copy command for data processing.

MARIA PILAR CANDA

Assosiate Partner at Autana Business Partners

Real User

Top 5

Sep 20, 2024

I have a team who has experience with integration. We are service providers and partners. Generally, clients buy the product directly from the company.

KrishnaBorusu

Senior Product Manager at a retailer with 10,001+ employees

Real User

Top 10

Jul 24, 2024

The use cases involve loading the data into the required tables based on the transformations. We do a couple of transformations, and based on the business requirement, we load the data into the required tables.

Buyer's Guide

Pentaho Data Integration and Analytics

May 2026

Free Report: Pentaho Data Integration and Analytics Reviews and More

Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: May 2026.

DOWNLOAD NOW

896,942 professionals have used our research since 2012.

Ahad Ahmed

BI developer at Jubilee Life Insurance Company Ltd

Real User

Top 5

May 27, 2024

I have used the solution to gather data from multiple sources, including APIs, databases like Oracle, and web servers. There are a bunch of data providers available who can provide you with datasets to export in JSON format from clouds or APIs.

Dan Peacock

Enterprise Data Architect at a manufacturing company with 201-500 employees

Real User

Apr 21, 2022

At first, we were using it as a means of loading our Data Warehouse from our operational systems.

Now we are using it as a means of bulk distribution (bursting) of reports, and synchronization across the enterprise.

The latter is especially useful as we have several different database platforms (Oracle, SQL Server, PostgreSQL, Access <shudder>, and Firebird).

Eric Smets

System Engineer at a tech services company with 11-50 employees

Real User

Sep 4, 2022

We use it for two major purposes. Most of the time it is for ETL of data. And based on the loaded and converted data, we are generating reports out of it. A small part of that, the pivot tables and the like, are also on the web interface, which is the more interactive part. But about 80 percent of our developers' work is on the background processes for running and transforming and changing data.

Ridwan Saeful Rohman

Data Engineering Associate Manager at Zalora Group

Real User

Top 20

Jun 26, 2022

I still use this tool on a daily basis. Comparing it to my experience with other EPL tools, the system that I created for the solution was quite simple. It is just as simple as extracting the data from MySQL, exporting it on the CSV, and then putting it on the S3 for the sales button. It is as simple as extracting the data from the MySQL Center and exporting it to the ASB. We still use this solution due to the fact that there are a lot of old systems that still use it. The new solution that we use is mostly Airflow. We are still in the transition phase. To be clear, Airflow is a data orchestration tool that mainly uses Python. Everything from the ETL, all the way to the scheduling and the monitoring of any issues. It's in one system and entirely on Airflow.

reviewer1872000

Senior Data Analyst at a tech services company with 51-200 employees

Real User

May 30, 2022

I use it for ETL. We receive data from our clients and we join the most important information and do many segmentations to help with communication between our product and our clients.

reviewer995501455

Solution Integration Consultant II at a tech vendor with 201-500 employees

Consultant

May 25, 2022

My work primarily revolves around data migration and data integration for different products. I have used them in different companies, but for most of our use cases, we use it to integrate all the data that needs to flow into our product. Also, we can have outbound from our product when we need to send to different, various integration points. We use this product extensively to build ETLs for those use cases. We are developing ETLs for the inbound data into the product as well as outbound to various integration points. Also, we have a number of core ETLs written on this platform to enhance our product. We have two different modes that we offer: one is on-premises and the other is on the cloud. On the cloud, we have an EC2 instance on AWS, then we have installed that EC2 instance and we call it using the ETL server. We also have another server for the application where the product is installed. We use version 8.3 in the production environment, but in the dev environment, we use version 9 and onwards.

RicardoDíaz

COO / CTO at a tech services company with 11-50 employees

Real User

May 19, 2022

We are a service delivery enterprise, and we have different use cases. We deliver solutions to other enterprises, such as banks. One of the use cases is for real-time analytics of the data we work with. We take CDC data from Oracle Database, and in real-time, we generate a product offer for all the products of a client. All this is in real-time. The client could be at the ATM or maybe at an agency, and they can access the product offer. We also use Pentaho within our organization to integrate all the documents and Excel spreadsheets from our consultants and have a dashboard for different hours for different projects. In terms of version, currently, Pentaho Data Integration is on version 9, but we are using version 8.2. We have all the versions, but we work with the most stable one. In terms of deployment, we have two different types of deployments. We have on-prem and private cloud deployments.

Krisjanis Muskars

Data Architect at a tech services company with 1,001-5,000 employees

Reseller

May 11, 2022

We use it as an ETL tool. We take data from a source database and move it into a target database. We do some additional processing on our target databases as well, and then load the data into a data warehouse for reports. The end result is a data warehouse and the reports built on top of that. We are a consulting company and we implement it for clients.

Anton Abrarov

Project Leader at a mining and metals company with 10,001+ employees

Real User

May 11, 2022

The company where I was working previously was using this product. We were using it for ETL process management. It was like a data flow automatization. In terms of deployment, we were using an on-premise model because we had sensitive data, and there were some restrictions related to information security.

reviewer1855218

Data Architect at a consumer goods company with 1,001-5,000 employees

Real User

May 10, 2022

We use it for orchestration and as an ETL tool to move data from one environment to another, including moving data from on-premises to the cloud and moving operational data from different source systems into the data warehouse.

Rodrigo Vazquez

CDE & BI Delivery Manager at a tech services company with 501-1,000 employees

Consultant

May 2, 2022

I just use it as an ETL. It is a tool that helps me work with data so I can solve any of my production problems. I work with a lot of databases. Therefore, I use this tool to keep information organized. I work with a virtual private cloud (VPC) and VPN. If I work in the cloud, I use VPC. If I work on-premises, I work with VPNs.

Tomasz Rabong

Client Engagement Leader at Sanmargar Team

User

Apr 20, 2022

We use PDI for complicated data transformations.

We also use our tool ( www.metastudiodrm.com ) which is integrated with PDI to speed up complex parametrization of ETL jobs.

Renan Guedert

Business Intelligence Specialist at a recruiting/HR firm with 11-50 employees

Real User

Apr 12, 2022

It was our principle to make the whole ETL and data warehousing on our projects. We created a whole step for collecting all the raw data from APIs and other databases from flat files, like Excel files, CSV files, and JSON files, to do the whole transformation and data preparation, then model the data and put it in SQL Server and integration services. For business intelligence projects, it is sometimes pretty good, when you are extracting something from the API, to have a step to transform the JSON file from the API to an SQL table. We use it heavily as a virtual machine running on Windows. We have also installed the open-source version on the desktop.

José Orlando Maia

Data Engineer at a tech vendor with 1,001-5,000 employees

MSP

Apr 11, 2022

My primary use case is to provide integration with my source systems, such as ERP systems and SAP systems, and web-based systems, having them primarily integrate with my data warehouse. For this process, I use ETL to treat and gather all the information from my first system, then consolidate it in my data warehouse.

Jacopo Zaccariotto

Head of Data Engineering at InfoCert

Real User

Apr 5, 2022

We use Pentaho for small ETL integration jobs and cross-storage analytics. It's nothing too major. We have it deployed on-premise, and we are still on the free version of the product. In our case, processing takes place on the virtual machine where we installed Pentaho. We can ingest data from different on-premises and cloud locations. We still don't carry out the data processing phase inside a different environment from where the VM is running.

Ryan Ferdon

Senior Data Engineer at Burgiss

Real User

Mar 24, 2022

We used it for ETL to transform data from flat files, CSV files, and database. We used PostgreSQL for the connections, and then we would either import it into our database if the data was in from clients, or we would export it to files if clients wanted files or if a vendor needed to import the files into their database.

Michel Philippenko

Project Manager at a computer software company with 51-200 employees

Real User

Mar 6, 2022

I was working with Pentaho for a client. I had to implement complicated data flows and extraction. I had to take data from several sources in a PostgreSQL database by reading many tables in several databases, as well as from Excel files. I created some complex jobs. I also had to implement business reports with the Pentaho Report Designer. The client I was working for had Pentaho on virtual machines.

Dale Bloom

Credit Risk Analytics Manager at MarketAxess

Real User

Jan 20, 2022

The use case is for data ETL on our various data repositories. We use it to aggregate and transform data for visualization purposes for our upper management. Currently, I am using the PDI locally on my laptop, but we are undergoing an integration to push this off. We have purchased the Enterprise edition and have licenses, and we are just working with our infrastructure to get that set up on a server. We haven't yet launched the Enterprise edition, so I've had very minimal touch with Lumada, but I did have an overview with one of the engineers as to how to use the customer portal in terms of learning documentation. So, the documentation and support are basically the two main areas that I've been using it for. I haven't piped any data or anything through it. I've logged in a couple of times to the customer portal, and I've pretty much been using it as support functionality. I have been submitting requests to understand more about how to get everything to be working for the Enterprise edition. So, I have been using the Lumada customer portal mostly for Pentaho Data Integration.

Tomasz Rabong

Client Engagement Leader at Sanmargar Team

User

Jan 20, 2022

We use it for ETL processes with clients. We integrate it with www.metastudiodrm.com due to lack of data dictionaries management in Pentaho.

reviewer1751571

Systems Analyst at a university with 5,001-10,000 employees

Real User

Dec 22, 2021

We use it as a data warehouse between our HR system and our student system, because we don't have an application that sits in between them. It's a data warehouse that we do our reporting from. We also have integrations to other, isolated apps within the university that we gather data from. We use it to bring that into our data warehouse as well.

Tracy Gettings

Analytics Team Leader at HealtheLink

Real User

Dec 22, 2021

We use it to connect to multiple databases and generate reporting. We also have ETL processes running on it. Portions of it are in AWS, but we also have desktop access.

Dan Peacock

Enterprise Data Architect at a manufacturing company with 201-500 employees

Real User

Dec 14, 2021

We mainly use Lumada to load our operational systems into our data warehouse, but we also use it for monthly reporting out of the data warehouse, so it's to and from. We use some of Lumada's other features within the business to move data around. It's become quite the Swiss army knife. We're primarily doing batch-type reports that go out. Not many people want to sift through data and pick it to join it in other things. There are a few, but again, I usually wind up doing it. The self-serve feature is not as big a seller to me because of our user base. Most of the people looking at it are salespeople. Lumada has allowed us to interact with our employees more effectively and compensate them properly. One of the cool aspects is that we use it to generate commissions for our salespeople and bonuses for our warehouse people. It allows us to get information out to them in a timely fashion. We can also see where they're at and how they're doing. The process that Lumada replaced was arcane. The sentiment among our employees, particularly the warehouse personnel, was that it was punitive. They would say, "I didn't get a bonus this month because the warehouse manager didn't like me." Now we can show them the numbers and say, "You didn't get a bonus because you were slacking off compared to everybody else." It's allowed us to be very transparent in how we're doing these tasks. Previously, that was all done behind the vest. I want people to trust the numbers, and these tools allow me to do that because I can instantly show that the information is correct. That is a huge win for us. When we first rolled it out, I spent a third of my time justifying the numbers. Now, I rarely have to do that. It's all there, and they can see it, so they trust what the information is. If something is wrong, it's not a case of "Why is this being computed wrong?" It's more like: "What didn't report?" We have 200 stores that communicate to our central hub each night. If one of them doesn't send any data, somebody notices now. That wasn't the case in the past. They're saying, "Was there something wrong with the store?" instead of, "There's something wrong with the data." With Lumada's single end-to-end data management, we no longer need some of the other tools that we developed in-house. Before that, everything was in-house. We had a build-versus-buy mentality. It simplified many aspects that we were already doing and made that process quicker. It has made a world of difference. This is primarily anecdotal, but there were times where I'd get an IM from one of the managers saying, "I'm looking at this in the sales meeting and calling out what somebody is saying. I want to make sure that this is what I'm seeing." I made a couple of people mad. Let's say they're no longer working for us, and we'll leave it at that. If you're not making somebody mad, you're not doing BI right. You're not asking the right questions. Having a single platform for data management experience is crucial for me. It lets me know when something goes wrong from a data standpoint. I know when a load fails due to bad data and don't need to hunt for it. I've got a status board, so I can say, "Everything looks good this morning." I don't have to dig into it, and that has made my job easier. What's more, I don't waste time arguing about why the numbers on this report don't match the ones on another because it's all coming from the same place. Before, they were coming from various places, and they wouldn't match for whatever reason. Maybe there's some piece of code in one report that isn't being accounted for in the other. Now, they're all coming from the same place. So everything is on the same level.

it_user1740738

Senior Engineer at a comms service provider with 501-1,000 employees

Real User

Dec 13, 2021

We're using it for data warehousing. Typically, we collect data from numerous source systems, structure it, and then make it available to drive business intelligence, dashboard reporting, and things like that. That's the main use of it. We also do a little bit of moving of data from one system to another, but the data doesn't go into the warehouse. For instance, we sync the data from one of our line of business systems into our support help desk system so that it has extra information there. So, we do a few point-to-point transfers, but mainly, it is for centralizing data for data warehousing. We use it just as a data integration tool, and we haven't found any problems. When we have big data processing, we use Amazon Redshift. We use Pentaho to load the data into Redshift and then use that for big data processing. We use Tableau for our reporting platform. We've got quite a number of users who are experienced in it, so it is our chosen reporting platform. So, we use Pentaho for the data collection and data modeling aspect of things, such as developing facts and dimensions, but we then publicly export that data to Redshift as a database platform, and then we use Tableau as our reporting platform. I am using version 8.3, which was the latest long-term support version when I looked at it the last time. Because this is something we use in production, and it is quite core to our operations, we've been advised that we just stick with the long-term support versions of the product. It is in the cloud on AWS. It is running on an EC2 instance in AWS Cloud.

Oscar Mejia

IT-Services Manager & Solution Architect at Stratis

Real User

Jul 14, 2021

We basically receive information from our clients via Excel. We take this information and transform it in order to create some data marks. With this information, on these processes we are running right now, we receive new data every day. The solution processes the Excels and creates a data mark for them. While we read the data and transform it as well as put it in a database, in order to explore the information, we need an analytics solution for that - and that is typically Microsoft's solution, Power BI.

it_user1510395

Technical Manager at a computer software company with 51-200 employees

Real User

Feb 22, 2021

We have an event planning system, which enables us to obtain a large report. It includes data Mart or data warehouse data. This is where we take data from the IT online system and pass it to the data warehouse. Then, from the data warehouse, they generate reports. We have 6 developers who are using the Panel Data Integrator, but there are no end users. We deploy the product, and the customer uses it for reporting. We have one person who undertakes a regular maintenance activity when it is required.

ABDULGAFFAR

Assistant General Manager at DTDC Express Limited

Real User

Jan 8, 2021

We are using just the simple features of this product. We're using it as a data warehouse and then for building dimensions.

reviewer1384743

Specialist in Relational Databases and Nosql at a computer software company with 5,001-10,000 employees

Real User

Jul 15, 2020

The most common use for the solution is gathering data from our databases or files in order to gather them into a different database. Another common use is to compare data between different databases. Due to a lack of integrity, you can attach these to synchronization issues.

Pentaho Data Integration and Analytics

61 Reviews

Pentaho Data Integration and Analytics offers an intuitive platform for data workflows, enabling users to easily manage ETL processes across diverse data formats, ensuring seamless automation and development.With its drag-and-drop interface, Pentaho allows for efficient ETL workflows without extensive coding. It supports a multitude of data formats and sources such as SQL, NoSQL, Hadoop, CSV, and JSON. Advanced features like metadata injection and API integration enable seamless automation....

Download Pentaho Data Integration and Analytics Report Read more

Related Q&As

Dec 6, 2022

What do you think can be improved with Hitachi Lumada Data Integrations?

Dec 6, 2022

What do you use Hitachi Lumada Data Integrations for most frequently?