No more typing reviews! Try our Samantha, our new voice AI agent.
reviewer1751571 - PeerSpot reviewer
Systems Analyst at a university with 5,001-10,000 employees
Real User
Jan 3, 2022
Reuse of ETLs with metadata injection saves us development time, but the reporting side needs notable work
Pros and Cons
  • "The fact that it enables us to leverage metadata to automate data pipeline templates and reuse them is definitely one of the features that we like the best. The metadata injection is helpful because it reduces the need to create and maintain additional ETLs. If we didn't have that feature, we would have lots of duplicated ETLs that we would have to create and maintain. The data pipeline templates have definitely been helpful when looking at productivity and costs."
  • "Lumada Data Integration definitely helps with decision-making for our deans and upper executives, and the fact that we're able to reuse some of the ETLs with the metadata injection saves us time and costs while making it a pretty quick process for our developers to learn and pick up ETLs from each other."
  • "The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have. A simple feature you would expect a reporting tool to have is the ability to search the repository for a report. It doesn't even have that capability. That's been a feature that we've been asking for since the beginning and it hasn't been implemented yet."
  • "The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have."

What is our primary use case?

We use it as a data warehouse between our HR system and our student system, because we don't have an application that sits in between them. It's a data warehouse that we do our reporting from.

We also have integrations to other, isolated apps within the university that we gather data from. We use it to bring that into our data warehouse as well.

How has it helped my organization?

Lumada Data Integration definitely helps with decision-making for our deans and upper executives. They are the ones who use the product the most to make their decisions. The data warehouse is the only source of information that's available for them to use, and to create that data warehouse we had to use this product.

And it has absolutely reduced our ETL development time. The fact that we're able to reuse some of the ETLs with the metadata injection saves us time and costs. It also makes it a pretty quick process for our developers to learn and pick up ETLs from each other. It's definitely easy for us to transition ETLs from one developer to another. The ETL functionality satisfies 95 percent of all our needs. 

What is most valuable?

The ETL is definitely an awesome feature of the product. It's very easy and quick to use. Once you understand the way it works it's pretty robust.

Lumada Data Integration requires minimal coding. You can do more complex coding if you want to, because it has a scripts option that you can add as a feature, but we haven't found a need to do that yet. We just use what's available, the steps that they have, and that is sufficient for our needs at this point. It makes it easier for other developers to look at the things that we have developed and to understand them quicker, whereas if you have complex coding it's harder to hand off to other people. Being able to transition something to another developer, and having that person pick it up quicker than if there were custom scripting, is an advantage.

In addition, the solution's ability to quickly and effectively solve issues we've brought up has been great. We've been able to use all the available features.

Among them is the ability to develop and deploy data pipeline templates once and reuse them. The fact that it enables us to leverage metadata to automate data pipeline templates and reuse them is definitely one of the features that we like the best. The metadata injection is helpful because it reduces the need to create and maintain additional ETLs. If we didn't have that feature, we would have lots of duplicated ETLs that we would have to create and maintain. The data pipeline templates have definitely been helpful when looking at productivity and costs. The automation of data pipeline templates has also been helpful in scaling the onboarding of data.

What needs improvement?

The transition to the web-based solution has taken a little longer and been more tedious than we would like and it's taken away development efforts towards the reporting side of the tool. They have a reporting tool called Pentaho Business Analytics that does all the report creation based on the data integration tool. There are a lot of features in that product that are missing because they've allocated a lot of their resources to fixing the data integration, to make it more web-based. We would like them to focus more on the user interface for the reporting.

The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have. A simple feature you would expect a reporting tool to have is the ability to search the repository for a report. It doesn't even have that capability. That's been a feature that we've been asking for since the beginning and it hasn't been implemented yet. We have between 500 and 800 reports in our system now. We've had to maintain an external spreadsheet with IDs to identify the location of all of those reports, instead of having that built into the system. It's been frustrating for us that they can't just build a simple search feature into the product to search for report names. It needs to be more in line with other reporting tools, like Tableau. Tableau has a lot more features and functions.

Because the reporting is lacking, only the deans and above are using it. It could be used more, and we'd like it to be used more.

Also, while the solution provides us with a single, end-to-end data management experience from ingestion to insights, it does but it doesn't give us a full history of where it's coming from. If we change a field, we can't trace it through from the reporting to the ETL field. Unfortunately, it's a manual process for us. Hitachi has a new product to do that and it searches all the fields, documents, and files just to get your pipeline mapped, but we haven't bought that product yet.

Buyer's Guide
Pentaho Data Integration and Analytics
March 2026
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: March 2026.
886,349 professionals have used our research since 2012.

For how long have I used the solution?

I've been using Lumada Data Integration since version 4.2. We're now on version 9.1.

What do I think about the stability of the solution?

The stability has been great. Other than for upgrades, it has been pretty stable.

What do I think about the scalability of the solution?

The scalability is great too. We've been able to expand the current system and add a lot of customizations to it.

For maintenance, surprisingly, it's just me who does so in our organization.

How are customer service and support?

The only issue that we've had is that it takes a little longer than we would like for support to resolve something, although things do eventually get incorporated. They're very quick to respond to an issue, but the fixing of the issue is not as quick.

For example, a few versions ago, when we upgraded it, we found that the upgrade caused a whole bunch of issues with the Oracle data types and the way the ETL was working with them. It wasn't transforming to the data types properly, the way we were expecting it to. In the previous version that we were using it was working fine, but the upgrade caused the issue, and it took them a while to fix that.

Which solution did I use previously and why did I switch?

We didn't have another tool. This is the only tool we have used to create the data warehouse between the two systems. When we started looking at solutions, this one was great because it was open source and Java-based, and it had a Community Edition. But we actually purchased the Enterprise Edition.

How was the initial setup?

I came in after it was purchased and after the first deployment.

What's my experience with pricing, setup cost, and licensing?

We renew our license every two years. When I spoke to the project manager, he indicated that the pricing has been going up every two years. It's going to reach a point where, eventually, we're going to have to look at alternative solutions because of the price.

When we first started with it, it was much cheaper. It has gone up drastically, especially since Hitachi bought out Pentaho. When they bought it, the price shot up. They said the increase is because of all the improvements they put into the product and the support that they're providing. From our point of view, their improvements are mostly on the data integration part of it, instead of the reporting part of it, and we aren't particularly happy with that.

Which other solutions did I evaluate?

I've used Tableau and other reporting tools, but Tableau sticks out because the reporting tool is much nicer. Tableau has its drawbacks with the ETL, because you can only use Tableau datasets. You have to get data into a Tableau file dataset and then the ETL part of it is stuck in Tableau forever.

If we could use the Pentaho ETL and the Tableau reporting we'd be happy campers.

What other advice do I have?

It's a great product. The ETL part of the product is really easy to pick up and use. It has a graphical interface with the ability to be more complex via scripting and features that you can add.

When looking at Hitachi Vantara's roadmap, the ability to upgrade more easily is one element of it that is important to us. Also, they're going more towards web-based solutions, instead of having local client development tools. If it does go on the web, and it works the same way it works on the client, that would be a nice feature. Currently, because we have these local client development tools, we have to have a VM client for our developers to use, and that makes it a little more tricky. Whereas if they put it on the web, then all our developers would be able to use any desktop and access the web for development.

When it comes to the query performance of the solution on large datasets, we haven't had any issues with it. We have one table in our data warehouse that has about 120 million rows and we haven't had any performance issues.

The solution gives you the flexibility to deploy it in any environment, whether on-prem or in the cloud. With our particular implementation, we've done a lot of customizations. We have special things that we bolted onto the product, so it's not as easy to put it onto the cloud for us. All of our customizations and bolt-ons end up costing us more because they make upgrades more difficult and time-consuming. We don't use an automated upgrade process. It's manual. We have to do a full reinstall and then apply all our bolt-ons and make sure it still works. If we could automate that process it would certainly reduce our costs.

In terms of updating to version 9.2, which is the latest version, we're going to look into it next year and see what level of effort is required and determine how it impacts our current system. They release a new update about every six months, and there is a major release every year or two, so it's quite a fast schedule for updates.

Overall, I would rate our satisfaction with our decision to purchase Hitachi products as a seven out of 10. I would definitely recommend the data integration tool but I wouldn't recommend the reporting tool.

Which deployment model are you using for this solution?

On-premises
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
reviewer1772286 - PeerSpot reviewer
Director of Software Engineering at a healthcare company with 10,001+ employees
Real User
Feb 3, 2022
Reports on predictions that our product is doing. It would be nice if they could have analytics perform well on large volumes.
Pros and Cons
  • "The way it has improved our product is by giving our users the ability to do ad hoc reports, which is very important to our users. We can do predictive analysis on trends coming in for contracts, which is what our product does. The product helps users decide which way to go based on the predictive analysis done by Pentaho. Pentaho is not doing predictions, but reporting on the predictions that our product is doing. This is a big part of our product."
  • "The way it has improved our product is by giving our users the ability to do ad hoc reports, which is very important to our users."
  • "The performance could be improved. If they could have analytics perform well on large volumes, that would be a big deal for our products."
  • "The only complaint that I have with Pentaho has been with scaling."

What is our primary use case?

We started using Pentaho for two purposes:

  1. As an ETL tool to bring data in. 
  2. As an analytics tool. 

As our solution progressed, we dropped the ETL piece of Pentaho. We didn't end up using it. What remains in our product today is the analytics tool.

We do a lot of simulations on our data with Pentaho reports. We use Pentaho's reporting capabilities to tell us how contracts need to be negotiated for optimal results by using the analytics tool within Pentaho.

How has it helped my organization?

This was an OEM solution for our product. The way it has improved our product is by giving our users the ability to do ad hoc reports, which is very important to our users. We can do predictive analysis on trends coming in for contracts, which is what our product does. The product helps users decide which way to go based on the predictive analysis done by Pentaho. Pentaho is not doing predictions, but reporting on the predictions that our product is doing. This is a big part of our product.

What is most valuable?

There is an end-to-end flow, where a user can say, "I am looking at this field and want to slice and dice my data based on these parameters." That flexibility is provided by Pentaho. This minimal manual coding is important to us.

What needs improvement?

The performance could be improved. If they could have analytics perform well on large volumes, that would be a big deal for our products.  

For how long have I used the solution?

I have been using it for eight years.

What do I think about the stability of the solution?

We are on-prem. Once the product was installed and up and running, I haven't had issues with the product going down or not being responsive.

We have one technical lead who is responsible for making sure that we keep upgrading the solution so we are not on a version that is not supported anymore. In general, it is low maintenance.

What do I think about the scalability of the solution?

The only complaint that I have with Pentaho has been with scaling. As our data grew, we tested it with millions of records. When we started to implement it, we had clients that went from 80 million to 100 million. I think scale did present a problem with the clients. I know that Pentaho talks about being able to manage big data, which is much more data than what we have. I don't know if it was our architecture versus the product limitations, but we did have issues with scaling.

Our product doesn't deal with big data at large. There are probably 17 million records. With those 17 million records, it performs well when it has been internally cached within Pentaho. However, if you are loading the dataset or querying it for the first time, then it does take awhile. Once it has been cached in Pentaho, the subsequent queries are reasonably fast.

How are customer service and support?

We haven't had a lot of functional issues. We had performance issues, especially early on, as we were trying to spin up this product. The response time from the support group has been a three on a scale of one to five.

We had trouble with the performance and had their engineers come in. We shared our troubles and problems, then those engineers had brainstorming sessions. Their ability to solve problems was really good and I would rate that as four out of five.

A lot of the problems were with the performance and scale of data that we had. It could have been that we didn't have a lot of upfront clean architecture. With the brainstorming sessions, we tried giving two sets of reports to users: 

  1. One was more summary level, which was quick, and that is what 80% of our clients use. 
  2. For 20% of our clients, we provided detailed reports that do take awhile. However, you are then not impacting performance for 80% of your clients. 

This was a good solution or compromise that we reached from both a business and technology perspective. 

Now, I feel like the product is doing well. It is almost like their team helped us with rearchitecting and building product expectations.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

Previously, we used to have something called QlikView, which is almost obsolete now. We had a lot of trouble with QlikView. Anytime processing was done, it would take a long time for those processed results to be loaded into QlikView's memory. This meant that there was a lot of time spent once an operation was done. Before users could see results or reports, it would take a couple of hours. We didn't want that lag. 

Pentaho offered an option not to have that lag. It did not have its own in-memory database, where everything had to be loaded. That was one of the big reasons why we wanted to switch away from QlikView, and Pentaho fit that need.

How was the initial setup?

I would say the deployment/implementation process was straightforward enough for both data ingestion and analytics.

When we started with the data ingestion, we went with something called Spoon. Then we realized, while it was a Pentaho product, Spoon was open source. We had integrated with the open source version of it, but later found that it didn't work for commercialization. 

For us to integrate Pentaho and get it working, it took a couple of months because we needed to figure out authentication with Pentaho. So, learning and deployment within our environment took a couple of months. This includes the actual implementation and figuring out how to do what we wanted to do.

Because this is a licensed product, the deployment for the client was a small part of the product's deployment. So, on an individual client basis, the deployment is easy and a small piece. 

It gives us the flexibility to deploy it in any environment, which is important to us.

If we went to the cloud version of Pentaho, that would be a big maintenance relief. We wouldn't have to worry about getting the latest version, installing it, and sending it out to our clients.

What about the implementation team?

For the deployment, we had people come in from Pentaho for a week or two. They were there with us through the process.

Which other solutions did I evaluate?

We looked at Tableau, Pentaho and an IBM solution. In the absence of Pentaho, we would have gone with either Tableau or building our own custom solution. When we were figuring out what third-party tool to use, we did an analysis and a bunch of other tools were compared. Ultimately, we went with Pentaho because it did have a wide variety of features and functionalities within its reports. Though I wasn't involved, there was a cost analysis done and Pentaho did favorably in terms of cost.

For the product that we use Pentaho for, I think we're happy with their decision. There are a few other products in our product suite. Those products ended up using Tableau. I know that there have been discussions about considering Tableau over Pentaho in the future. 

What other advice do I have?

Engage Pentaho's architects early on, so you know what data architecture works best with the product. We built our database and structures, then had performance issues. However, it was too late when we brought in the Pentaho architects, because our data structure was out in the field with multiple clients. Therefore, I think engaging them early on in the data architecture process would be wise.

I am not very familiar with Hitachi's roadmap and what is coming up for them. I know that they are good with sending out newsletters and keeping their customers in the know, but unfortunately, I am unaware of their roadmap.

I feel like this product is doing well. There haven't been complaints and things are moving along. I would rate it as seven out of 10.

Which deployment model are you using for this solution?

On-premises
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Buyer's Guide
Pentaho Data Integration and Analytics
March 2026
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: March 2026.
886,349 professionals have used our research since 2012.
Analytics Team Leader at HealtheLink
Real User
Jan 3, 2022
Enables us to manage our workload and generate a high volume of reporting
Pros and Cons
  • "We're using the PDI and the repository function, and they give us the ability to easily generate reporting and output, and to access data. We also like the ability to schedule."
  • "If we didn't have this solution, we wouldn't be able to manage our workload or generate the volume of reporting that we currently do."
  • "Since Hitachi took over, I don't feel that the documentation is as good within the solution. It used to have very good help built right in."
  • "Since Hitachi took over, I don't feel that the documentation is as good within the solution."

What is our primary use case?

We use it to connect to multiple databases and generate reporting. We also have ETL processes running on it.

Portions of it are in AWS, but we also have desktop access.

How has it helped my organization?

The solution has allowed us to automate reporting by automating its scheduling. 

It is also important to us that the solution enables you to leverage metadata to automate data pipeline templates and reuse them. It allows us to generate reports with fewer resources.

If we didn't have this solution, we wouldn't be able to manage our workload or generate the volume of reporting that we currently do. It's very important for us that it provides a single, end-to-end data management experience from ingestion to insights. We are a high-volume department and without those features, we wouldn't be able to manage the current workload.

What is most valuable?

We're using the PDI and the repository function, and they give us the ability to easily generate reporting and output, and to access data. We also like the ability to schedule.

What needs improvement?

Since Hitachi took over, I don't feel that the documentation is as good within the solution. It used to have very good help built right in. There's good documentation when you go to the site but the help function within the solution hasn't been as good since Hitachi took over.

For how long have I used the solution?

I've been using Lumada Data Integration since 2016, but the company has been using it much longer.

We are currently on version 8.3, but we're going to be doing an upgrade to 9.2 next month.

What do I think about the stability of the solution?

The stability is good. We haven't had any issues related to Pentaho.

What do I think about the scalability of the solution?

Its scalability is very good. We use it with multiple, large databases. We've added to it over time and it scales.

We have about 10 users of the solution including a data quality manager, clinical analyst, healthcare informatics analysts, senior healthcare informatics analyst, and an analytics team leader. It's used very extensively by all of those job roles in their day-to-day work. When we add additional staff members, they routinely get access to and are trained on the solution.

How are customer service and support?

Their ability to quickly and effectively solve issues we have brought up is very good. They have a ticketing system and they're very responsive to any tickets we enter. And that's true not only for issues but if we have questions about functionality.

How would you rate customer service and support?

Positive

How was the initial setup?

The solution is very flexible. It's pretty easy to set up connections within the solution.

Maintenance isn't required day-to-day. Our technical staff does the upgrades. They also, on occasion, have to do things like restarting the services, but that's typically related to server issues, not Pentaho itself.

What other advice do I have?

My advice would be to take advantage of the training that's offered.

The query performance of Lumada on large data sets is good, but the query performance is really only as good as the server.

In terms of Hitachi's roadmap, we haven't seen it in a little while. We did have a concern that they're going to be going away from Pentaho and rolling it into another product and we're not quite sure what the result of that is going to be. We don't have a good understanding of what's going to change. That's the concern.

We currently only use Pentaho. We don't have other Hitachi products but we're satisfied with it. We would recommend Pentaho.

Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Assistant General Manager at DTDC Express Limited
Real User
Jan 8, 2021
Scales well with data and processes, but the cost should be lower and real-time processing capabilities improved
Pros and Cons
  • "The amount of data that it loads and processes is good."
  • "The amount of data that it loads and processes is good."
  • "I would like to see improvements made for real-time data processing."
  • "The price of the regular version is not reasonable and it should be lower."

What is our primary use case?

We are using just the simple features of this product.

We're using it as a data warehouse and then for building dimensions.

What needs improvement?

The shortcoming in version 7 is that we are unable to connect to Google Cloud Storage (GCS), where I can write the results from Pentaho. I'm able to connect to S3 using Pentaho 8, but when using it for GCS, I'm unable to connect. With people moving from on-premises deployments to the cloud, be it S3, Azure, or Google, we need a plugin where we can interact with these cloud vendors.

I would like to see improvements made for real-time data processing. It is something that I will be looking out for.

For how long have I used the solution?

We have been using Pentaho Data Integration for three years.

What do I think about the stability of the solution?

For all of the features that we have been using, it is a stable product.

What do I think about the scalability of the solution?

In terms of data loading and processes, the scalability is good.

We have a team of four people who are using it for analytics.

How are customer service and technical support?

As we are using the Community Version, we have not been in contact with technical support. Instead, we rely on forums and websites when we need to resolve a problem.

Which solution did I use previously and why did I switch?

In the past, I have worked with Talend, as well as SAP BO Data Services (BODS). However, that was with another company. This organization started with Pentaho and we are still using it.

How was the initial setup?

It is a straightforward setup process. It took between three and four hours to complete.

What's my experience with pricing, setup cost, and licensing?

We are using the Community Version, which is available free of charge.

The price of the regular version is not reasonable and it should be lower.

What other advice do I have?

My advice for anybody who is researching this product is that if they want to do batch processing, then this is a good choice. The amount of data that it loads and processes is good.

Based on the features that I have used and my experience, I would rate this solution a seven out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Nirmal Kosuru - PeerSpot reviewer
Nirmal KosuruData Architect at a tech services company with 201-500 employees
Top 10Real User

Yes the integration tool should be made available as Professional or Community / Standard / Enterprise Editions and Pricing should be made accordingly on the industry by industry  basis or cases by case. And also there should be Transparency in the pricing and availability of community edition as the case was earlier when Pentaho management realeased it into market.

IT-Services Manager & Solution Architect at Stratis
Real User
Jul 14, 2021
Free to use, easy to set up, and has great UI
Pros and Cons
  • "It's my understanding that the product can scale."
  • "Running itself with the ETL was very fast; it makes it so that it is very easy to transform the information we have, and we found that very useful."
  • "The product needs more plugins."
  • "The product needs more plugins."

What is our primary use case?

We basically receive information from our clients via Excel. We take this information and transform it in order to create some data marks.

With this information, on these processes we are running right now, we receive new data every day. The solution processes the Excels and creates a data mark for them.

While we read the data and transform it as well as put it in a database, in order to explore the information, we need an analytics solution for that - and that is typically Microsoft's solution, Power BI.

What is most valuable?

Running itself with the ETL was very fast. It makes it so that it is very easy to transform the information we have. We found that very useful. 

The UI is very easy to understand and learn.

The solution offers lots of documentation.

The initial setup is easy.

It's my understanding that the product can scale.

We've found the solution to be stable. 

The product is free to use if you choose the free version.

What needs improvement?

The solution needs better, higher-quality documentation, similar to AWS. Right now, we find that although documentation exists, it's not easy to find the answers we seek.

I have tried some cloud services with the ETL, so perhaps that would be good to add.

The product needs more plugins. Right now, it just has a standard database connection and there are other solutions there that can have straightforward connections for Oracle, MySQL, and stuff like that. However, more plugins would make it a much better product.

For how long have I used the solution?

We recently finished two projects with Pentaho.

What do I think about the stability of the solution?

The product is stable. There are no bugs or glitches. It doesn't crash or freeze. It's reliable. 

What do I think about the scalability of the solution?

According to the documentation, it's quite scalable. That said, I haven't tried to expand it. We just use a single server and that's all we need right now. We don't have plans to increase usage.

We have three people who use the solution currently.

How are customer service and technical support?

We don't really use support. We tend to do everything on our own and solve any problems we have ourselves. We basically have just read the manuals and that's about it. 

How was the initial setup?

The initial setup is not complex or difficult. It's straightforward. 

The deployment process takes about two weeks. 

We had two people who handled the deployment process. They were an AWS DevOps person and a Pentaho expert.

What's my experience with pricing, setup cost, and licensing?

We do not pay any license costs. We use a free version of the product.

What other advice do I have?

I'm a consultant and an end-user.

I downloaded the latest version of the solution. I can't speak to the version number. 

I'd rate the solution at an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
it_user1510395 - PeerSpot reviewer
Technical Manager at a computer software company with 51-200 employees
Real User
Mar 11, 2021
Quite simple to learn and there is a lot of information available online
Pros and Cons
  • "Pentaho Data Integration is quite simple to learn, and there is a lot of information available online."
  • "Pentaho Data Integration is quite simple to learn, and there is a lot of information available online."
  • "I'm still in the very recent stage concerning Pentaho Data Integration, but it can't really handle what I describe as "extreme data processing" i.e. when there is a huge amount of data to process. That is one area where Pentaho is still lacking."
  • "I'm still in the very recent stage concerning Pentaho Data Integration, but it can't really handle what I describe as "extreme data processing" i.e. when there is a huge amount of data to process. That is one area where Pentaho is still lacking."

What is our primary use case?

We have an event planning system, which enables us to obtain a large report. It includes data Mart or data warehouse data. This is where we take data from the IT online system and pass it to the data warehouse. Then, from the data warehouse, they generate reports. We have 6 developers who are using the Panel Data Integrator, but there are no end users. We deploy the product, and the customer uses it for reporting. We have one person who undertakes a regular maintenance activity when it is required.

How has it helped my organization?

As we are a software company, we are using the tools provided with the Pentaho Data Integration for our various teams.

What is most valuable?

Pentaho Data Integration is quite simple to learn, and there is a lot of information available online. It is not a steep learning curve. It also integrates easily with other databases and that is great. We use the provided documentation, which is a simple process for integration compared to other proprietary tools.

What needs improvement?

I don't think they market it that well. We can make suggestions for improvements but they don't seem to take the feedback on board. This contrasts with Informatica who are really helpful and seem to listen more to their customer feedback. I would also really like to see improved data capture. At the moment the emphasis seems to be on data processing. I would like to see a real-time processing data integration tool. This would provide instant reporting whenever the data changes. I'm still in the very recent stage concerning Pentaho Data Integration, but it can't really handle what I describe as "extreme data processing" i.e. when there is a huge amount of data to process. That is one area where Pentaho is still lacking.

For how long have I used the solution?

We have been using Pentaho Data Integration for 6 years. The customer is using Mirabilis Cloud, which is a public cloud. We are currently using version A.3.

How are customer service and support?

Technical Support is really good. To get our answers only takes a little bit of time.

Which solution did I use previously and why did I switch?

One of our customers was completely into the Microsoft core framework. We have to use SSIS because it's readily available with them, and is part of the system. We had to use it for five years. 

As mentioned, one of our teams has worked with Informatica in the past. In terms of integration, Informatica isn't more powerful, but more accurate in some aspects. The community is also quite strong.

How was the initial setup?

The setup of Pentaho Data Integration is straightforward. 

What about the implementation team?

We implemented Pentaho Data Integration in-house. The current deployment has taken three months for the current set of requirements. We have another deployment in the pipeline where we are connecting other different data sources. These projects usually take a few months to complete.

What's my experience with pricing, setup cost, and licensing?

Sometimes we provide the licenses or the customer can procure their own licenses. Previously, we had an enterprise license. Currently, we are on a community license as this is adequate for our needs.

What other advice do I have?

For newcomers to the product, it is best to start with something simple. You can then scale it up fast as it is not a steep learning curve. If somebody wants to set up a good inbound integration platform, they can use the Panel Data Integrator. It's really simple and easy to use. The online community really helps you with numerous issues, such as licensing and a lot of other things. I would rate Pentaho Data Integration 8 out of 10.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Other
Disclosure: My company has a business relationship with this vendor other than being a customer. Partner
PeerSpot user
reviewer1384743 - PeerSpot reviewer
Specialist in Relational Databases and Nosql at a computer software company with 5,001-10,000 employees
Real User
Jul 17, 2020
Free to use, easy to set up, and has a great metadata injection feature
Pros and Cons
  • "The solution has a free to use community version."
  • "The solution is easy to set up, very intuitive, clear to understand and easy to maintain."
  • "It's not very stable, at least not in the case of the community edition. I'm working with the community edition right now and I think perhaps it is because of that it is not very stable, it causes the system to sometimes hang. I'm not sure if this is the case for pair tiers."
  • "It's not very stable, at least not in the case of the community edition. I'm working with the community edition right now and I think perhaps it is because of that it is not very stable, it causes the system to sometimes hang."

What is our primary use case?

The most common use for the solution is gathering data from our databases or files in order to gather them into a different database. Another common use is to compare data between different databases. Due to a lack of integrity, you can attach these to synchronization issues.

What is most valuable?

One important feature, in my opinion, is the Metadata Injection. It gives flexibility to the scripts due to the fact that the scripts don't depend on a fixed structure or a fixed data model. Instead, you can develop transformations that are not dependant on the fixed structure or data models. 

Let me give a pair of examples. Sometimes your tables change, adding fields or dropping some of them. When this happens if you have a transformation without using Metadata Injection your transformation fails or doesn't manage the whole info from the table. If you use Metadata Injection instead, the new fields are included and the dropped columns are excluded from the transformation. Other times you have a complex transformation to apply to a lot of different tables. Traditionally, without the Metadata Injection feature, you had to repeat the transformation for each table, adapting the transformation to the concrete structure of each table. Fortunately, with the Metadata Injection, the same transformation is valid for all the tables you want to treat. A little bit effort gives you a great benefit.

Furthermore, the solution has a free to use community version.

The solution is easy to set up, very intuitive, clear to understand and easy to maintain.

What needs improvement?

I'm currently looking at a new competitor that's got some interesting features that this solution doesn't have. I have found this competitor has a feature braking system that is not present in the Pentaho Data Integration approach. The way their system sets can somehow maintain a track for the last executions and store the state which gives you the potential to run from the point that it ended the last time. It's very interesting. It would be nice if Pentaho had this type of feature.

Often you are required to install plugins. If you need to have access to, in my case, Neo4j databases new folder databases, you do need a plugin to do it.

For how long have I used the solution?

Between my current role and the role at my last company, I've been working with the solution for over five years.

What do I think about the stability of the solution?

It's not very stable, at least not in the case of the community edition. I'm working with the community edition right now and I think perhaps it is because of that it is not very stable, it causes the system to sometimes hang. I'm not sure if this is the case for pair tiers.

What do I think about the scalability of the solution?

I am the only person using the solution currently. There are two other people that occasionally also assist in it. I'm helping them understand the tool and they are beginning to use it. In that sense, we're slowly scaling.

I don't know if the solution scales well on a large scale, however.

It scales very well, overall with the very useful feature to run n copies to Start attribute in every step, perhaps balancing with the side effect of consuming a lot of memory and CPU resources.

How are customer service and support?

We haven't really contacted technical support in the past. We try to handle any issues ourselves in-house. I can't speak to the quality of the technical support, having never directly dealt with them.

Which solution did I use previously and why did I switch?

We've never really used another solution like this in our organization. This is the first.

How was the initial setup?

The solution is pretty simple to set up. It's not complex.

For our, deployment took about one month.

Maintenance is easy. The only maintenance tasks are to upgrade to the newer versions and backing up the repository frequently.

What about the implementation team?

I handled the implementation on my own. I didn't need any help from a reseller or consultant.

What's my experience with pricing, setup cost, and licensing?

We're using the community edition, which is free to use. I'm not sure how much their paid services cost. We haven't purchased any licensing.

What other advice do I have?

We're just users of the solution. We don't have a professional relationship with the company.

The solution is great to use and easy to share with teams via the central repository. It's very functional overall. I'd recommend the solution to other companies.

I'd rate the solution eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
it_user254223 - PeerSpot reviewer
Project Manager - Business Intelligence at www.datademy.es
Consultant
Feb 5, 2018
It has improved our data integration capabilities​
Pros and Cons
  • "It has improved our data integration capabilities​."
  • "Provides a good open source option."
  • "Developed ETL processes to load a data warehouse has improved our data integration capabilities."
  • "​There is not a data quality or MDM solution in the Pentaho DI suite.​"
  • "​I could not connect to our Hadoop environment in an easy and flexible way, and it was important to scale our data warehouse​."
  • "​I work with the Community Edition, therefore I do not have support. There was an issue that I could not resolve with community support.​"
  • "I could not connect to our Hadoop environment in an easy and flexible way, and it was important to scale our data warehouse."

How has it helped my organization?

Developed ETL processes to load a data warehouse. Has improved our data integration capabilities.

What is most valuable?

  • Easy to use
  • Development of the product
  • A lot of predefined steps
  • Good open source option

What needs improvement?

There is not a data quality or MDM solution in the Pentaho DI suite.

For how long have I used the solution?

Three to five years.

What do I think about the stability of the solution?

No issues.

What do I think about the scalability of the solution?

I could not connect to our Hadoop environment in an easy and flexible way, and it was important to scale our data warehouse.

How are customer service and technical support?

I work with the Community Edition, therefore I do not have support. There was an issue that I could not resolve with community support.

Which solution did I use previously and why did I switch?

I switched from our previous solution for cost reasons.

How was the initial setup?

It was not complex.

What's my experience with pricing, setup cost, and licensing?

There is a good open source option (Community Edition).

Which other solutions did I evaluate?

No.

What other advice do I have?

There is a lack of support if you work with the Community Edition.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros sharing their opinions.
Updated: March 2026
Product Categories
Data Integration
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros sharing their opinions.