Try our new research platform with insights from 80,000+ expert users

Pentaho Data Integration and Analytics Primary Use Case

Michelle Lawson - PeerSpot reviewer
Michelle Lawson
Principal Software Engineer at a tech vendor with 10,001+ employees

My main use case for Pentaho Data Integration and Analytics is data transformation between databases, and I also used it for creating mail that needed to go out to customers.

A specific example of a project where I used Pentaho Data Integration and Analytics for data transformation or customer mailings is when my previous company sent emails related to late car payments to our customers. We would have a query that would hit the database, determine who was behind by a certain amount of time on their payments, and then create an email to go out to them with the correct information related to possible repo.

In addition to that, my previous company, Scopos, acted almost as a data hub between different vendors that we used for car financing, which meant we did a lot of transformation of data from one vendor to another vendor to provide files out to the second vendor.

View full review »
Adrián Moreiras - PeerSpot reviewer
Adrián Moreiras
Data Analyst at Telefonica Digital

I have been using Pentaho Data Integration and Analytics for approximately four years.

My main use case for Pentaho Data Integration and Analytics is data transformation related to the telecommunications sector. For example, I send notifications via SMS depending on the type of mobile plan contracted with the telecom company. Different flows are based on user segmentation, and therefore different SMS messages with different content are sent.

First, raw data comes in as an unpolished block of data where different values can appear in lowercase, uppercase, with an underscore, without an underscore, with a space, and so on. I perform basic data processing first. Once the data is in a uniform form, it goes through a series of filters where I start to categorize the user by the type of telecom plan contracted. Then there is another component in which I have the different SMS codes and texts depending on a key-value pair, which is the type of contract the user has. I match what comes from the raw data with how it is correctly categorized according to the filters and with the different type of text that users will receive, for example, to do the proration of plan changes since rate changes are made annually and the amount is increased.

Pentaho Data Integration and Analytics has been very useful for me in user segmentation when rate changes were made because prices change in the telecommunications sector, and depending on each user's billing cycle, I have to do proration for different days. One month might have eighteen days with the old plan and thirteen days with the new plan. This has been a very important use case in my career. I have also worked with Net Promoter Score, which is very important in the telecommunications sector. I have worked with all the information from all the segments, all the regions, divided by province, by product contracted, divided by operating system, by everything. I have worked on Net Promoter Score with Pentaho Data Integration and Analytics, and it has been very easy to handle everything there because it was quite easy to work with.

View full review »
AmolDhormare - PeerSpot reviewer
AmolDhormare
Data Integration Developer at a tech services company with 1,001-5,000 employees

My main use case for Pentaho Data Integration and Analytics focuses primarily on integration. We utilize it to handle data from various front-end sources, such as Oracle databases, SAP data, and Salesforce and FIS data.

When I refer to integration, I mean connecting these sources and transforming the data before sending it elsewhere, such as loading the data into Snowflake. After loading the data into Snowflake, we perform transformations, create nodes, aggregations, and visualizations. We schedule tasks daily to transform the data received from the front-end and provide it to the data warehouse for further reporting use and integration into Salesforce.

View full review »
Buyer's Guide
Pentaho Data Integration and Analytics
March 2026
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: March 2026.
885,264 professionals have used our research since 2012.
Jayakrishnanmg MG - PeerSpot reviewer
Jayakrishnanmg MG
Data architect at a tech vendor with 10,001+ employees
Pentaho Data Integration and Analytics is my primary tool for ETL activities, where I extract data from different kinds of sources. We have two editions available: Community Edition and Enterprise Edition. Since Community Edition is open-source with limited features, we opted for the Enterprise Edition. I was able to perform transformations after extracting data from multiple heterogeneous sources, applying operations such as merge, SCDs, data cleansing, and data transformations according to client requirements, and finally loading the processed data to cloud solutions, cloud storage, or data warehouses in either SQL or NoSQL databases like MongoDB and Cassandra.

I used Pentaho Data Integration and Analytics for a specific project with a client called Walmart, a retail chain located in the US, Mexico, and Canada, where I aimed to build a centralized data warehouse for daily sales analytics. The data came from various sources including online sales, Excel sheets, REST API, MySQL DB, CSV, and flat files, requiring necessary transformations dictated by the transformation sheet provided by the client, which detailed how to handle the input source data, the required cleansing, and the transformations needed. This led to a recurring activity where I created pipelines with Pentaho Data Integration and Analytics, designing jobs and transformations, and eventually loading the data into Azure SQL DB, NoSQL databases, or Snowflake, while scheduling the ETL pipelines based on activity frequency—daily, weekly, or monthly.

After creating jobs and transformations for each activity with Pentaho Data Integration and Analytics, we scheduled these jobs, allowing the Business Intelligence team to review the loaded data in the data warehouses. We also used Pentaho Report Designer for conducting predictive analysis, such as forecasting sales over the next five years based on prior data, enabling us to conduct in-depth analysis for future trends.

View full review »
reviewer2787603 - PeerSpot reviewer
reviewer2787603
Data engineer at a educational organization with 1,001-5,000 employees

My main use case for Pentaho Data Integration and Analytics is creating ETLs to extract data from different sources, be it databases, APIs, or files and storing it into a certain database, be it a data warehouse or data lake.

One specific example of an ETL process I built with Pentaho Data Integration and Analytics is one that took data from a public API, normalized it with different parameters, and extracted the data so that it made sense from a business standpoint, then loaded it into our data warehouse in Snowflake.

View full review »
Alberto Pedro - PeerSpot reviewer
Alberto Pedro
Founder-CEO at Ubuntu Analytica

My main use case for Pentaho Data Integration and Analytics is to build an ETL process.

View full review »
JuanCarlosMartinezLara - PeerSpot reviewer
JuanCarlosMartinezLara
Project Manager at Laberit

I use Pentaho Data Integration and Analytics to perform ETL processes to migrate data between different systems. For example, I need to move patient data from our system database to a sub database. I believe this is better for my organization, but I must complete this task because if I do not, my organization will not know the solution. For this reason, I complete the task with Pentaho Data Integration and Analytics because it is a comprehensive solution that allows me to accomplish large tasks in less time.

View full review »
JH
Jefferson Hernandez
Data Architecture and Engineering Specialist at coprocenva

I use Pentaho Data Integration for data integration and ETL processes. I developed with Pentaho from CoproSema. I work on machine learning projects using Pentaho in different projects, such as forecasting for clients who have not paid their credit.

View full review »
Aqeel UR Rehman - PeerSpot reviewer
Aqeel UR Rehman
BI Analyst at a computer software company with 51-200 employees

Currently, I am using Pentaho Data Integration for transforming data and then loading it into different platforms. Sometimes, I use it in conjunction with AWS, particularly S3 and Redshift, to execute the copy command for data processing.

View full review »
Ahad Ahmed - PeerSpot reviewer
Ahad Ahmed
BI developer at Jubilee Life Insurance Company Ltd

I have used the solution to gather data from multiple sources, including APIs, databases like Oracle, and web servers. There are a bunch of data providers available who can provide you with datasets to export in JSON format from clouds or APIs. 

View full review »
Ryan Ferdon - PeerSpot reviewer
Ryan Ferdon
Senior Data Engineer at Burgiss

We used it for ETL to transform data from flat files, CSV files, and database. We used PostgreSQL for the connections, and then we would either import it into our database if the data was in from clients, or we would export it to files if clients wanted files or if a vendor needed to import the files into their database.

View full review »
MARIA PILAR CANDA - PeerSpot reviewer
MARIA PILAR CANDA
Assosiate Partner at Autana Business Partners

I have a team who has experience with integration. We are service providers and partners. Generally, clients buy the product directly from the company.

View full review »
KB
KrishnaBorusu
Senior Product Manager at a retailer with 10,001+ employees

The use cases involve loading the data into the required tables based on the transformations. We do a couple of transformations, and based on the business requirement, we load the data into the required tables.

View full review »
VK
reviewer995501455
Solution Integration Consultant II at a tech vendor with 201-500 employees

My work primarily revolves around data migration and data integration for different products. I have used them in different companies, but for most of our use cases, we use it to integrate all the data that needs to flow into our product. Also, we can have outbound from our product when we need to send to different, various integration points. We use this product extensively to build ETLs for those use cases.

We are developing ETLs for the inbound data into the product as well as outbound to various integration points. Also, we have a number of core ETLs written on this platform to enhance our product.

We have two different modes that we offer: one is on-premises and the other is on the cloud. On the cloud, we have an EC2 instance on AWS, then we have installed that EC2 instance and we call it using the ETL server. We also have another server for the application where the product is installed.

We use version 8.3 in the production environment, but in the dev environment, we use version 9 and onwards.

View full review »
Jacopo Zaccariotto - PeerSpot reviewer
Jacopo Zaccariotto
Head of Data Engineering at InfoCert

We use Pentaho for small ETL integration jobs and cross-storage analytics. It's nothing too major. We have it deployed on-premise, and we are still on the free version of the product.

In our case, processing takes place on the virtual machine where we installed Pentaho. We can ingest data from different on-premises and cloud locations. We still don't carry out the data processing phase inside a different environment from where the VM is running.

View full review »
RicardoDíaz - PeerSpot reviewer
RicardoDíaz
COO / CTO at a tech services company with 11-50 employees

We are a service delivery enterprise, and we have different use cases. We deliver solutions to other enterprises, such as banks. One of the use cases is for real-time analytics of the data we work with. We take CDC data from Oracle Database, and in real-time, we generate a product offer for all the products of a client. All this is in real-time. The client could be at the ATM or maybe at an agency, and they can access the product offer. 

We also use Pentaho within our organization to integrate all the documents and Excel spreadsheets from our consultants and have a dashboard for different hours for different projects.

In terms of version, currently, Pentaho Data Integration is on version 9, but we are using version 8.2. We have all the versions, but we work with the most stable one. 

In terms of deployment, we have two different types of deployments. We have on-prem and private cloud deployments.

View full review »
José Orlando Maia - PeerSpot reviewer
José Orlando Maia
Data Engineer at a tech vendor with 1,001-5,000 employees

My primary use case is to provide integration with my source systems, such as ERP systems and SAP systems, and web-based systems, having them primarily integrate with my data warehouse. For this process, I use ETL to treat and gather all the information from my first system, then consolidate it in my data warehouse.

View full review »
DP
Dan Peacock
Enterprise Data Architect at a manufacturing company with 201-500 employees

We mainly use Lumada to load our operational systems into our data warehouse, but we also use it for monthly reporting out of the data warehouse, so it's to and from. We use some of Lumada's other features within the business to move data around. It's become quite the Swiss army knife.

We're primarily doing batch-type reports that go out. Not many people want to sift through data and pick it to join it in other things. There are a few, but again, I usually wind up doing it. The self-serve feature is not as big a seller to me because of our user base. Most of the people looking at it are salespeople.

Lumada has allowed us to interact with our employees more effectively and compensate them properly. One of the cool aspects is that we use it to generate commissions for our salespeople and bonuses for our warehouse people. It allows us to get information out to them in a timely fashion. We can also see where they're at and how they're doing. 

The process that Lumada replaced was arcane. The sentiment among our employees, particularly the warehouse personnel, was that it was punitive. They would say, "I didn't get a bonus this month because the warehouse manager didn't like me." Now we can show them the numbers and say, "You didn't get a bonus because you were slacking off compared to everybody else." It's allowed us to be very transparent in how we're doing these tasks. Previously, that was all done behind the vest. I want people to trust the numbers, and these tools allow me to do that because I can instantly show that the information is correct.

That is a huge win for us. When we first rolled it out, I spent a third of my time justifying the numbers. Now, I rarely have to do that. It's all there, and they can see it, so they trust what the information is. If something is wrong, it's not a case of "Why is this being computed wrong?" It's more like: "What didn't report?"

We have 200 stores that communicate to our central hub each night. If one of them doesn't send any data, somebody notices now. That wasn't the case in the past. They're saying, "Was there something wrong with the store?" instead of, "There's something wrong with the data."

With Lumada's single end-to-end data management, we no longer need some of the other tools that we developed in-house. Before that, everything was in-house. We had a build-versus-buy mentality. It simplified many aspects that we were already doing and made that process quicker. It has made a world of difference. 

This is primarily anecdotal, but there were times where I'd get an IM from one of the managers saying, "I'm looking at this in the sales meeting and calling out what somebody is saying. I want to make sure that this is what I'm seeing." I made a couple of people mad. Let's say they're no longer working for us, and we'll leave it at that. If you're not making somebody mad, you're not doing BI right. You're not asking the right questions.

Having a single platform for data management experience is crucial for me. It lets me know when something goes wrong from a data standpoint. I know when a load fails due to bad data and don't need to hunt for it. I've got a status board, so I can say, "Everything looks good this morning." I don't have to dig into it, and that has made my job easier. 

What's more, I don't waste time arguing about why the numbers on this report don't match the ones on another because it's all coming from the same place. Before, they were coming from various places, and they wouldn't match for whatever reason. Maybe there's some piece of code in one report that isn't being accounted for in the other. Now, they're all coming from the same place. So everything is on the same level.

View full review »
Ridwan Saeful Rohman - PeerSpot reviewer
Ridwan Saeful Rohman
Data Engineering Associate Manager at Zalora Group

I still use this tool on a daily basis. Comparing it to my experience with other ETL tools, the system I created using this tool was quite straightforward. It involves extracting data from MySQL, exporting it to CSV, storing it on S3, and then loading it into Redshift.

The PDI Kettle Job and Kettle Transformation are bundled by a shell script, then scheduled and orchestrated by Jenkins.

We continue to use this tool primarily because many of our legacy systems still rely on it. However, our new solution is mostly based on Airflow, and we are currently in the transition phase. Airflow is a data orchestration tool that predominantly uses Python for ETL processes, scheduling, and issue monitoring—all within a unified system.


View full review »
TJ
Tobias Johnson
Manager, Systems Development at a manufacturing company with 5,001-10,000 employees

Our primary use case is to populate a data warehouse and data marts, but we also use it for all kinds of data integration scenarios and file movement. It is almost like middleware between different enterprise solutions. We take files from our legacy app system, do some work on them, and then call SAP BAPIs, for example.

It is deployed on-premises. It gives you the flexibility to deploy it in any environment, whether on-premises or in the cloud, but this flexibility is not that important to us. We could deploy it on the cloud by spinning up a new server in AWS or Azure, but as a manufacturing facility, it is not important to us. Our customer preference is primarily to deploy things on-premises.

We usually stay one version behind the latest one. We're a manufacturing facility. So, we're very sensitive to any bugs or issues. We don't do automatic upgrades. They're a fairly manual process.

View full review »
reviewer1855218 - PeerSpot reviewer
reviewer1855218
Data Architect at a consumer goods company with 1,001-5,000 employees

We use it for orchestration and as an ETL tool to move data from one environment to another, including moving data from on-premises to the cloud and moving operational data from different source systems into the data warehouse.

View full review »
RV
Rodrigo Vazquez
CDE & BI Delivery Manager at a tech services company with 501-1,000 employees

I just use it as an ETL. It is a tool that helps me work with data so I can solve any of my production problems. I work with a lot of databases. Therefore, I use this tool to keep information organized. 

I work with a virtual private cloud (VPC) and VPN. If I work in the cloud, I use VPC. If I work on-premises, I work with VPNs.

View full review »
Renan Guedert - PeerSpot reviewer
Renan Guedert
Business Intelligence Specialist at a recruiting/HR firm with 11-50 employees

It was our principle to make the whole ETL and data warehousing on our projects. We created a whole step for collecting all the raw data from APIs and other databases from flat files, like Excel files, CSV files, and JSON files, to do the whole transformation and data preparation, then model the data and put it in SQL Server and integration services.

For business intelligence projects, it is sometimes pretty good, when you are extracting something from the API, to have a step to transform the JSON file from the API to an SQL table.

We use it heavily as a virtual machine running on Windows. We have also installed the open-source version on the desktop.

View full review »
PR
PhilipRobinson
Senior Engineer at a comms service provider with 501-1,000 employees

We're using it for data warehousing. Typically, we collect data from numerous source systems, structure it, and then make it available to drive business intelligence, dashboard reporting, and things like that. That's the main use of it. 

We also do a little bit of moving of data from one system to another, but the data doesn't go into the warehouse. For instance, we sync the data from one of our line of business systems into our support help desk system so that it has extra information there. So, we do a few point-to-point transfers, but mainly, it is for centralizing data for data warehousing.

We use it just as a data integration tool, and we haven't found any problems. When we have big data processing, we use Amazon Redshift. We use Pentaho to load the data into Redshift and then use that for big data processing. We use Tableau for our reporting platform. We've got quite a number of users who are experienced in it, so it is our chosen reporting platform. So, we use Pentaho for the data collection and data modeling aspect of things, such as developing facts and dimensions, but we then publicly export that data to Redshift as a database platform, and then we use Tableau as our reporting platform.

I am using version 8.3, which was the latest long-term support version when I looked at it the last time. Because this is something we use in production, and it is quite core to our operations, we've been advised that we just stick with the long-term support versions of the product.

It is in the cloud on AWS. It is running on an EC2 instance in AWS Cloud.

View full review »
ES
Eric Smets
System Engineer at a tech services company with 11-50 employees

We use it for two major purposes. Most of the time it is for ETL of data. And based on the loaded and converted data, we are generating reports out of it. A small part of that, the pivot tables and the like, are also on the web interface, which is the more interactive part. But about 80 percent of our developers' work is on the background processes for running and transforming and changing data.

View full review »
Anton Abrarov - PeerSpot reviewer
Anton Abrarov
Project Leader at a mining and metals company with 10,001+ employees

The company where I was working previously was using this product. We were using it for ETL process management. It was like a data flow automatization.

In terms of deployment, we were using an on-premise model because we had sensitive data, and there were some restrictions related to information security.

View full review »
reviewer1872000 - PeerSpot reviewer
reviewer1872000
Senior Data Analyst at a tech services company with 51-200 employees

I use it for ETL. We receive data from our clients and we join the most important information and do many segmentations to help with communication between our product and our clients.

View full review »
Aqeel UR Rehman - PeerSpot reviewer
Aqeel UR Rehman
BI Analyst at a computer software company with 51-200 employees

I have used this ETL tool for working with data in projects across several different domains. My use cases include tasks such as transforming data that has been taken from an API like PayPal, extracting data from different sources such as Magenta or other databases, and transforming all of the information.

Once the transformation is complete, we load the data into data warehouses such as Amazon Redshift.

View full review »
KM
Krisjanis Muskars
Data Architect at a tech services company with 1,001-5,000 employees

We use it as an ETL tool. We take data from a source database and move it into a target database. We do some additional processing on our target databases as well, and then load the data into a data warehouse for reports. The end result is a data warehouse and the reports built on top of that.

We are a consulting company and we implement it for clients.

View full review »
Michel Philippenko - PeerSpot reviewer
Michel Philippenko
Project Manager at a computer software company with 51-200 employees

I was working with Pentaho for a client. I had to implement complicated data flows and extraction. I had to take data from several sources in a PostgreSQL database by reading many tables in several databases, as well as from Excel files. I created some complex jobs. I also had to implement business reports with the Pentaho Report Designer.

The client I was working for had Pentaho on virtual machines.

View full review »
Dale Bloom - PeerSpot reviewer
Dale Bloom
Credit Risk Analytics Manager at MarketAxess

The use case is for data ETL on our various data repositories. We use it to aggregate and transform data for visualization purposes for our upper management.

Currently, I am using the PDI locally on my laptop, but we are undergoing an integration to push this off. We have purchased the Enterprise edition and have licenses, and we are just working with our infrastructure to get that set up on a server. 

We haven't yet launched the Enterprise edition, so I've had very minimal touch with Lumada, but I did have an overview with one of the engineers as to how to use the customer portal in terms of learning documentation. So, the documentation and support are basically the two main areas that I've been using it for. I haven't piped any data or anything through it. I've logged in a couple of times to the customer portal, and I've pretty much been using it as support functionality. I have been submitting requests to understand more about how to get everything to be working for the Enterprise edition. So, I have been using the Lumada customer portal mostly for Pentaho Data Integration.

View full review »
SK
Stephen Knox
Lead, Data and BI Architect at a financial services firm with 201-500 employees

We run the payment systems for Canada. We use it as a typical ETL tool to transfer and modify data into a data warehouse. We have many different pipelines that we have built with it.

View full review »
reviewer1751571 - PeerSpot reviewer
reviewer1751571
Systems Analyst at a university with 5,001-10,000 employees

We use it as a data warehouse between our HR system and our student system, because we don't have an application that sits in between them. It's a data warehouse that we do our reporting from.

We also have integrations to other, isolated apps within the university that we gather data from. We use it to bring that into our data warehouse as well.

View full review »
reviewer1772286 - PeerSpot reviewer
reviewer1772286
Director of Software Engineering at a healthcare company with 10,001+ employees

We started using Pentaho for two purposes:

  1. As an ETL tool to bring data in. 
  2. As an analytics tool. 

As our solution progressed, we dropped the ETL piece of Pentaho. We didn't end up using it. What remains in our product today is the analytics tool.

We do a lot of simulations on our data with Pentaho reports. We use Pentaho's reporting capabilities to tell us how contracts need to be negotiated for optimal results by using the analytics tool within Pentaho.

View full review »
TG
Tracy Gettings
Analytics Team Leader at HealtheLink

We use it to connect to multiple databases and generate reporting. We also have ETL processes running on it.

Portions of it are in AWS, but we also have desktop access.

View full review »
AG
ABDULGAFFAR
Assistant General Manager at DTDC Express Limited

We are using just the simple features of this product.

We're using it as a data warehouse and then for building dimensions.

View full review »
OM
Oscar Mejia
IT-Services Manager & Solution Architect at Stratis

We basically receive information from our clients via Excel. We take this information and transform it in order to create some data marks.

With this information, on these processes we are running right now, we receive new data every day. The solution processes the Excels and creates a data mark for them.

While we read the data and transform it as well as put it in a database, in order to explore the information, we need an analytics solution for that - and that is typically Microsoft's solution, Power BI.

View full review »
it_user1510395 - PeerSpot reviewer
it_user1510395
Technical Manager at a computer software company with 51-200 employees

We have an event planning system, which enables us to obtain a large report. It includes data Mart or data warehouse data. This is where we take data from the IT online system and pass it to the data warehouse. Then, from the data warehouse, they generate reports. We have 6 developers who are using the Panel Data Integrator, but there are no end users. We deploy the product, and the customer uses it for reporting. We have one person who undertakes a regular maintenance activity when it is required.

View full review »
reviewer1384743 - PeerSpot reviewer
reviewer1384743
Specialist in Relational Databases and Nosql at a computer software company with 5,001-10,000 employees

The most common use for the solution is gathering data from our databases or files in order to gather them into a different database. Another common use is to compare data between different databases. Due to a lack of integrity, you can attach these to synchronization issues.

View full review »
Buyer's Guide
Pentaho Data Integration and Analytics
March 2026
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: March 2026.
885,264 professionals have used our research since 2012.