No more typing reviews! Try our Samantha, our new voice AI agent.
Ahad Ahmed - PeerSpot reviewer
BI developer at Jubilee Life Insurance Company Ltd
Real User
Top 5
May 29, 2024
Offers features for data integration and migration
Pros and Cons
  • "The product is user-friendly and intuitive"
  • "The solution offers features for data integration and migration. Pentaho Data Integration and Analytics allows the integration of multiple data sources into one. The product is user-friendly and intuitive to use for almost any business."
  • "Should provide additional control for the data warehouse"

What is our primary use case?

I have used the solution to gather data from multiple sources, including APIs, databases like Oracle, and web servers. There are a bunch of data providers available who can provide you with datasets to export in JSON format from clouds or APIs. 

What is most valuable?

The solution offers features for data integration and migration. Pentaho Data Integration and Analytics allows the integration of multiple data sources into one. The product is user-friendly and intuitive to use for almost any business. 

What needs improvement?

The solution should provide additional control for the data warehouse and reduce its size, as our organization's clients have expressed concerns regarding it. The vendor can focus on reducing capacity and compensate for it by enhancing product efficiency. 

For how long have I used the solution?

I have been using Pentaho Data Integration and Analytics for a year.  

Buyer's Guide
Pentaho Data Integration and Analytics
May 2026
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: May 2026.
899,917 professionals have used our research since 2012.

How are customer service and support?

I have never encountered any issues with Pentaho Data Integration and Analytics. 

What's my experience with pricing, setup cost, and licensing?

I believe the pricing of the solution is more affordable than the competitors. 

Which other solutions did I evaluate?

I have worked with IBM DataStage along with Pentaho Data Integration and Analytics. The found the IBM DataStage interface to seem outdated in comparison to the Pentaho tool. IBM DataStage demands the user to drag and drop the services as well as the pipelines, similar to the process in SSIS platforms. Pentaho Data Integration and Analytics is also easier to comprehend from the first use than IBM DataStage. 

What other advice do I have?

The solution's ETL capabilities make data integration tasks easier and are used to export data from a source to a destination. At my company, I am using IBM data switches and the overall IBM tech stack for compatibility among the integrations, pipelines and user levels. 

I would absolutely recommend Pentaho Data Integration and Analytics to others. I would rate the solution a seven out of ten. 

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Manager, Systems Development at a manufacturing company with 5,001-10,000 employees
Real User
Aug 7, 2022
An affordable solution that makes it simple to do some fairly complicated things, but it could be improved in terms of consistency of different transformation steps
Pros and Cons
  • "It makes it pretty simple to do some fairly complicated things. Both I and some of our other BI developers have made stabs at using, for example, SQL Server Integration Services, and we found them a little bit frustrating compared to Data Integration. So, its ease of use is right up there."
  • "The speed of developing solutions has been the best improvement, reducing development time by days or weeks compared to using a different tool."
  • "Its basic functionality doesn't need a whole lot of change. There could be some improvement in the consistency of the behavior of different transformation steps. The software did start as open-source and a lot of the fundamental, everyday transformation steps that you use when building ETL jobs were developed by different people. It is not a seamless paradigm. A table input step has a different way of thinking than a data merge step."
  • "In terms of our decision to purchase Hitachi's product services or solutions, our satisfaction level is average or on balance."

What is our primary use case?

Our primary use case is to populate a data warehouse and data marts, but we also use it for all kinds of data integration scenarios and file movement. It is almost like middleware between different enterprise solutions. We take files from our legacy app system, do some work on them, and then call SAP BAPIs, for example.

It is deployed on-premises. It gives you the flexibility to deploy it in any environment, whether on-premises or in the cloud, but this flexibility is not that important to us. We could deploy it on the cloud by spinning up a new server in AWS or Azure, but as a manufacturing facility, it is not important to us. Our customer preference is primarily to deploy things on-premises.

We usually stay one version behind the latest one. We're a manufacturing facility. So, we're very sensitive to any bugs or issues. We don't do automatic upgrades. They're a fairly manual process.

How has it helped my organization?

We've had it for a long time. So, we've realized a lot of the improvements that anybody would realize from almost any data integration product.

The speed of developing solutions has been the best improvement. It has reduced the development time and improved the speed of getting solutions deployed. The reduced ETL development time varies by the size and complexity of the project. We probably spend days or weeks less than then if we were using a different tool.

It is tremendously flexible in terms of adding custom code by using a variety of different languages if you want to, but we had relatively few scenarios where we needed it. We do very little custom coding. Because of the tool we're using, it is not critical. We have developed thousands of transformations and jobs in the tool.

What is most valuable?

It makes it pretty simple to do some fairly complicated things. Both I and some of our other BI developers have made stabs at using, for example, SQL Server Integration Services, and we found them a little bit frustrating compared to Data Integration. So, its ease of use is right up there.

Its performance is a pretty close second. It is a pretty highly performant system. Its query performance on large data sets is very good.

What needs improvement?

Its basic functionality doesn't need a whole lot of change. There could be some improvement in the consistency of the behavior of different transformation steps. The software did start as open-source and a lot of the fundamental, everyday transformation steps that you use when building ETL jobs were developed by different people. It is not a seamless paradigm. A table input step has a different way of thinking than a data merge step.

For how long have I used the solution?

We have been using this solution for more than 10 years.

What do I think about the stability of the solution?

Its stability is very good.

What do I think about the scalability of the solution?

Its scalability is very good. We've been running it for a long time, and we've got dozens, if not hundreds, of jobs running a day.

We probably have 200 or 300 people using it across all areas of the business. We have people in production control, finance, and what we call materials management. We have people in manufacturing, procurement, and of course, IT. It is very widely and extensively used. We're increasing its usage all the time.

How are customer service and support?

They are very good at quickly and effectively solving the issues we have brought up. Their support is well structured. They're very responsive.

Because we're very experienced in it, when we come to them with a problem, it is usually something very obscure and not necessarily easy to solve. We've had cases where when we were troubleshooting issues, they applied just a remarkable amount of time and effort to troubleshoot them.

Support seems to have very good access to development and product management as a tier-two. So, it is pretty good. I would give their technical support an eight out of ten.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We didn't have another data integration product before Pentaho.

How was the initial setup?

I installed it. It was straightforward. It took about a day and a half to get the production environment up and running. That was probably because I was e-learning as I was going. With a services engagement, I bet you would have everything up in a day.

What about the implementation team?

We used Pentaho services for two days. Our experience was very good. We worked with Andy Grohe. I don't know if he is still there or not, but he was excellent.

What was our ROI?

We have absolutely seen an ROI, but I don't have the metrics. There are analytic cases that we just weren't able to do before. Due to the relatively low cost compared to some of the other solutions out there, it has been a no-brainer.

What's my experience with pricing, setup cost, and licensing?

We did a two or three-year deal the last time we did it. As compared to other solutions, at least so far in our experience, it has been very affordable. The licensing is by component. So, you need to make sure you only license the components that you really intend to use.

I am not sure if we have relicensed after the Hitachi acquisition, but previously, multi-year renewals resulted in a good discount. I'm not sure if this is still the case.

We've had the full suite for a lot of years, and there is just the initial cost. I am not aware of any additional costs.

What other advice do I have?

If you haven't used it before, it is worth engaging services with Pentaho for initial implementation. They'll just point out a number of small foibles related to perhaps case sensitivity. They'll just save you a lot of runs through the documentation to identify different configuration points that might be relevant to you.

I would highly recommend the Data Integration product, particularly for anyone with a Java background. Most of our BI developers at this point do not have a Java background, which isn't really that important. Particularly, if you're a Java business and you're looking for extensibility, the whole solution is built in Java, which just makes certain aspects of it a little more intuitive at first.

On the data integration side, it is really a good tool. A lot of investment dollars go into big data and new tech, and often, those are not very compelling for us. We're in an environment where we have medium data, not big data.

It provides a single end-to-end data management experience from ingestion to insights, but at this point, that's not critical to us. We mostly do the data integration work in Pentaho, and then we do the visualization in another tool. The single data management experience hasn't enabled us to discontinue the use of other data management analysis delivery tools just because we didn't really have them.

We take an existing job or transformation and use that as a test. It is certainly easy enough to copy one object to another. I am not aware of a specific templating capability, but we are not really missing anything there. It is very easy for us to clone a job or transformation just by doing a Save As, and we do that extensively.

Vantara's roadmap is a little fuzzy for me. There has been quite a bit of turnover in the customer-facing roles over the last five years. We understand that there is a roadmap to move to a pure web-based solution, but it hasn't been well communicated to us.

In terms of our decision to purchase Hitachi's product services or solutions, our satisfaction level is average or on balance.

I would rate this solution a seven out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Buyer's Guide
Pentaho Data Integration and Analytics
May 2026
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: May 2026.
899,917 professionals have used our research since 2012.
CDE & BI Delivery Manager at a tech services company with 501-1,000 employees
Consultant
May 10, 2022
Connects to different databases, origins of data, files, and SFTP
Pros and Cons
  • "I can create faster instructions than writing with SQL or code. Also, I am able to do some background control of the data process with this tool. Therefore, I use it as an ELT tool. I have a station area where I can work with all the information that I have in my production databases, then I can work with the data that I created."
  • "It is a very good tool if you need to work with data."
  • "I work with different databases. I would like to work with more connectors to new databases, e.g., DynamoDB and MariaDB, and new cloud solutions, e.g., AWS, Azure, and GCP. If they had these connectors, that would be great. They could improve by building new connectors. If you have native connections to different databases, then you can make instructions more efficient and in a more natural way. You don't have to write any scripts to use that connector."

What is our primary use case?

I just use it as an ETL. It is a tool that helps me work with data so I can solve any of my production problems. I work with a lot of databases. Therefore, I use this tool to keep information organized. 

I work with a virtual private cloud (VPC) and VPN. If I work in the cloud, I use VPC. If I work on-premises, I work with VPNs.

How has it helped my organization?

I can create faster instructions than writing with SQL or code. Also, I am able to do some background control of the data process with this tool. Therefore, I use it as an ELT tool. I have a station area where I can work with all the information that I have in my production databases, then I can work with the data that I created.

Right now, I am working in the business intelligence area. However, we use BI in all our companies. So, it is not only in one area. So, I create different data parts for different business units, e.g., HR, IT, sales, and marketing.

What is most valuable?

A valuable feature is the number of connectors that I have. So, I can connect to different databases, origins of data, files, and SFTP. With SQL and NoSQL databases, I can connect, put it in my instructions, send it to my staging area, and create the format. Thus, I can format all my data in just one process.

What needs improvement?

I work with different databases. I would like to work with more connectors to new databases, e.g., DynamoDB and MariaDB, and new cloud solutions, e.g., AWS, Azure, and GCP. If they had these connectors, that would be great. They could improve by building new connectors. If you have native connections to different databases, then you can make instructions more efficient and in a more natural way. You don't have to write any scripts to use that connector.

Hitachi can make a lot of improvements in the tool, e.g., in performance or latency or putting more emphasis on cloud solutions or NoSQL databases. 

For how long have I used the solution?

I have more than 15 years of experience working with it.

What do I think about the stability of the solution?

The stability depends on the version. At the beginning, it was more focused on stability. As of now, some things have been deprecated. I really don't know why. However, I have been pretty happy with the tool. It is a very good tool. Obviously, there are better tools, but Pentaho is fast and pretty easy to use. 

What do I think about the scalability of the solution?

It is scalable. 

How are customer service and support?

Their support team will receive a ticket on any failures that you might have. We have a log file that lets us review our errors, both in Windows and Unix. So, we are able to check both operating systems.

If you don't pay any license, you are not allowed to use their support at all. While I have used it a couple of times, that was more than 10 years ago. Now, I just go to their community and any Pentaho forums. I don't use the support.

Which solution did I use previously and why did I switch?

I have used a lot of ETL data integrators, such as DataStage, Informatica, Talend, Matillion, Python, and even SQL. MicroStrategy, Qlik, and Tableau have instructional features, and I try to use a lot of tools to do instructions. 

How was the initial setup?

I have built the solution. It does not change for cloud or on-premise developments. 

You create in your development environments, then you move to test. After that, you do the volume and integrity testing, then you go to UAT. Finally, you move to production. It does depend on the customer. You can thoroughly create the entire product structure as well as all the files that you need. Once you put it in production, it should work. You should have the same structure in development, test, and production.

What was our ROI?

It is free. I don't spend money on it.

It will reduce a lot of the time that you work with data.

What's my experience with pricing, setup cost, and licensing?

I use it because it is free. I download from their page for free. I don't have to pay for a license. With other tools, I have to pay for the licenses. That is why I use Pentaho.

I used to work with the complete suite of Pentaho, not only Data Integration. I used to build some solutions from scratch. I used to work with the Community version and Enterprise versions. With the Enterprise version, it is more than building cubes. I am building a BI solution that I can explore. Every time that I use Pentaho Data Integration, I never spend any money because it comes free with the tool. If you pay for the Enterprise license, Pentaho Data Integration is included. If you don't pay for it and use the Community version, Data Integration is included for free. 

Which other solutions did I evaluate?

I used to work with a reseller of Pentaho. That is why I started working with it. Also, I did some training for Pentaho at the company that I used to work for in Argentina, where we were a Platinum reseller. 

Pentaho is easy to use. You don't need to install anything. You can just open the script and start working on it. That is why I chose it. With Informatica, you need to do a server installation, but some companies might not allow some installation in their production or normal environment.

I feel pretty comfortable using the solution. I have tried to use other tools, but I always come back to Pentaho because it is easier. 

Pentaho is open source. While Informatica is a very good tool, it is pretty expensive. That is one of the biggest cons for the data team because you don't want to pay money for tools that just only help you to work.  

What other advice do I have?

I would rate this solution as eight out of 10. One of the best things about the solution is that it is free.

I used to sell Pentaho. It has a lot of pros and cons. From my side, there are more pros than cons. There isn't one tool that can do everything that you need, but this tool is one of those tools that helps you to complete your tasks and it is pretty integrable with other tools. So, you can switch Pentaho on and off from different tools and operating systems. You can use it in Unix, Linux, Windows, and Mac.

If you know how to develop different things and are very good at Java, you can create your own connectors. You can create a lot of things. 

It is a very good tool if you need to work with data. There isn't a database that you can't manage with this tool. You can work with it and manage all the data that you want to manage.

Which deployment model are you using for this solution?

Hybrid Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Ryan Ferdon - PeerSpot reviewer
Senior Data Engineer at Burgiss
Real User
Apr 4, 2022
Low-code makes development faster than with Python, but there were caching issues
Pros and Cons
  • "The fact that it's a low-code solution is valuable. It's good for more junior people who may not be as experienced with programming."
  • "With this solution, instead of it taking a week, it was reduced to an afternoon, or about three hours."
  • "If you're working with a larger data set, I'm not so sure it would be the best solution; the larger things got the slower it was and it was kind of buggy sometimes."

What is our primary use case?

We used it for ETL to transform data from flat files, CSV files, and database. We used PostgreSQL for the connections, and then we would either import it into our database if the data was in from clients, or we would export it to files if clients wanted files or if a vendor needed to import the files into their database.

How has it helped my organization?

The biggest benefit is that it's a low-code solution. When you hire junior ETL developers or engineers, who may have a schooling background but no real experience with ETL or coding for ETL, it's a UI-based, low-code solution in which they can make something happen within weeks instead of, potentially, months.

Because it's low-code, while I could technically have done everything in Python alone, that would definitely have taken longer than using Pentaho. In addition, by being able to standardize pipelines to handle the onboarding process for new clients, development costs were significantly reduced. To put in perspective, prior to my leading the effort to standardize things, it would typically take about a week to build a feed from start to finish, and sometimes more depending on how complicated it was. With this solution, instead of it taking a week, it was reduced to an afternoon, or about three hours. That was a significant difference.

Instead of paying a developer a full week's worth of work, which could be $2,500 or more, it cut it down to three hours or about $300. That's a big difference.

What is most valuable?

The fact that it's a low-code solution is valuable. It's good for more junior people who may not be as experienced with programming. In our case, we didn't have a huge data set. We had small and medium-sized data sets, so it worked fine.

The fact that it's open source is also helpful in that, if a junior engineer knows they are going to use it in a job, they can download it themselves, locally, for free, and use test data to learn it.

My role was to use it to write one feed that could facilitate multiple clients. Given that it was an open-source, free solution, it was pretty robust in what it could do. I could make lookup tables and databases and map different clients, and I could use the same feed for 30 clients or 50 clients. It got the job done for our use case.

In addition, you can install it wherever you need it. We had installed versions in the cloud and I also had local versions.

What needs improvement?

If you're working with a larger data set, I'm not so sure it would be the best solution. The larger things got the slower it was.

It was kind of buggy sometimes. And when we ran the flow, it didn't go from a perceived start to end, node by node. Everything kicked off at once. That meant there were times when it would get ahead of itself and a job would fail. That was not because the job was wrong, but because Pentaho decided to go at everything at once, and something would process before it was supposed to. There were nodes you could add to make sure that, before this node kicks off, all these others have processed, but it was a bit tedious. 

There were also caching issues, and we had to write code to clear the cache every time we opened the program, because the cache would fill up and it wouldn't run. I don't know how hard that would be for them to fix, or if it was fixed in version 10.

Also, the UI is a bit outdated, but I'm more of a fan of function over how something looks.

One other thing that would have helped with Pentaho was documentation and support on the internet: how to do things, how to set up. I think there are some sites on how to install it, and Pentaho does have a help repository, but it wasn't always the most useful.

For how long have I used the solution?

I used Hitachi Lumada Data Integration (Pentaho) for three years

What do I think about the stability of the solution?

In terms of the stability of the solution, as I noted, I wouldn't use it for large data sets. But for small to midsize companies that are looking for a low-code solution that isn't going to break the budget, it's a great tool for them to use.

It worked and it was stable enough, once we figured out the little quirks and how to get around them. It mostly handled our production workflows without issue.

What do I think about the scalability of the solution?

I think it could scale, but only up to a point. I didn't test it on larger datasets. But after talking to people who have worked on larger datasets, they wouldn't recommend using it, but that is hearsay.

In my former company, there were about five people in the data engineering department who were using the solution in their roles as ETL data integration Specialists.

In that company, it's their go-to solution and I think it will work for everything that they need. When I was there, I tried opening pathways to different things, but there were so many feeds already on it, and it worked for what they need, and it's low-code and open source, so I think they'll stick with it. As they gain more clients they'll increase their usage of it.

How was the initial setup?

The initial setup wasn't that complicated. You have to set the job environment variables and that was probably the most complicated part, and would be especially so if you're not familiar with it. Otherwise, it was just a matter of downloading the version needed, installing it, and learning how to use the different components. Overall, it was pretty easy and straightforward.

The first time we deployed it, not knowing what we were doing, it took a couple of days, but that was mainly troubleshooting and figuring out what we were doing wrong because we hadn't used it before. After that, it would take maybe 30 minutes or an hour.

In terms of maintenance for Pentaho, one developer per feed is what is typically assigned. It will depend on the workflow of the company and how many feeds are needed. In our case there were five people involved.

What was our ROI?

It saved us a lot of money. Given that it's open source, and the amount of time over the three that I used it, and the fact that they were using it several years prior, means a lot of money was definitely saved by using Pentaho versus something else.

What's my experience with pricing, setup cost, and licensing?

If a company is looking for an ETL solution and wants to integrate it with their tech stack but doesn't want to spend a bunch of money, Pentaho is a good solution. SSIS cores were $10,000 a piece. Although I don't know what they cost nowadays, they're expensive. 

Pentaho is a nice option without having to pay an arm and a leg. We even had a complicated data set and Pentaho was able to handle pretty much every type of scenario, if we thought about it creatively enough. I would recommend it for a company in that position.

Which other solutions did I evaluate?

While the capabilities of Pentaho are good enough for light work, I've started using Alteryx Designer, and it is so much more robust in everything that you can do in real time. I've also used SSIS.

When you run something in Pentaho, you can click on it to see the output of each one, but it's hard to really change anything. For example, if I were to query data from a database and put it into a "select," if I wanted to reorganize within the select based on something like the first initial of someone's name, it provided that option. But when I would do it, sometimes it would throw an error and I'd have to run the feed again to see it.

The nodes, or the components, in Pentaho can probably do about 70 percent of what you can do in Alteryx. Don't get me wrong, Pentaho worked for what we needed it for, with just a few quirks. But as a data engineer, I'm always interested in and excited to work with new technologies that may offer different benefits. In this case, one of the benefits is that each node in Alteryx has many more capabilities in real time. I can look at the data that's coming into the node and the data that's going out. There was a way to do that in Pentaho, if you right-clicked and looked, but it would tell you the fields that were coming in and out and not necessarily the data. It's nice to be able to troubleshoot, on the spot, node-by-node, if you're having an issue. You can do that easily with Alteryx.

In addition to being able to look at data coming in and out of the node, you can also sort it easily and filter it within each data node in Alteryx, and that is something you can't do in Pentaho.

Another cool thing with Alteryx, although it's a very small difference, is that you don't have to save the workflow before you run it. Pentaho forces you to do that. Of course, it's always good to save.

What other advice do I have?

A good thing about Pentaho is that it's not that hard to learn, from an ETL perspective. The way that Pentaho has things laid out they are pretty intuitively organized in the panel: Your input—flat file, CSV, or database—and then the transformation nodes. 

It was a good baseline and a good open-source tool to use to learn ETL. It's good to have exposure to multiple tools because every company has different needs and, depending on their needs, it would be a different recommendation.

The lessons I learned using it: Make sure you clear the cache when you open the program. Also, if there are any critical points in your flow that are dependent upon previous nodes, make sure that you put blocking steps in. Make sure you also set up the job environment variables correctly, so that Pentaho runs.

It worked for what we did but, personally, I wouldn't use it. In the new company I'm working for, we are using large financial data sets and I'm not so sure it could handle that. I know there's an Enterprise version, but I didn't use that.

The solution can handle ingestion through to export, but you still have to have a batch or Python script to run it with an automation process. I don't know if the Lumada version has something different, but with what I was using, you were simply building the pipeline, but the pipeline outside of the program had to be scheduled and run, and we had other tools to check that the output was as expected.

We used version 7 for a while and we were reluctant to upgrade to version 9 because we had an 834 configuration, meaning a government standardized feed that our developer spent two years building. There was an issue whenever we tried to run those feeds on version 9, so we were reluctant to upgrade because things were working on 7. We ended up finding out that it didn't take much work for us to fix the problem that we were having with version 9 and, eventually, we moved to it. With every version upgrade of anything, there are going to be pros and cons.

Depending on what someone needs it for, if it's a small project and they don't want to pay for an enterprise solution, I would recommend it and give it a nine out of 10. The finicky things were a little frustrating, but the fact that it's free, can be deployed easily, and that it can fulfill a lot of things on a small scale, are plusses. If it were for a larger company that needed an enterprise solution, I wouldn't recommend it. In that case, it would be one out of 10.

For a smaller company or one with a smaller budget, a company that doesn't have highly complex ETL needs, Pentaho is definitely a great option. If a company has the budget and has really specific needs and large data sets, I would suggest looking elsewhere.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
it_user1740738 - PeerSpot reviewer
Senior Engineer at a comms service provider with 501-1,000 employees
Real User
Jan 3, 2022
Saves time and makes it easy for our mixed-skilled team to support the product, but more guidance and better error messages are required in the UI
Pros and Cons
  • "The graphical nature of the development interface is most useful because we've got people with quite mixed skills in the team. We've got some very junior, apprentice-level people, and we've got support analysts who don't have an IT background. It allows us to have quite complicated data flows and embed logic in them. Rather than having to troll through lines and lines of code and try and work out what it's doing, you get a visual representation, which makes it quite easy for people with mixed skills to support and maintain the product. That's one side of it."
  • "We have not come across anything that we have not been able to do with Pentaho, and it has proved to be a very flexible way of getting data from anywhere."
  • "Although it is a low-code solution with a graphical interface, often the error messages that you get are of the type that a developer would be happy with. You get a big stack of red text and Java errors displayed on the screen, and less technical people can get intimidated by that. It can be a bit intimidating to get a wall of red error messages displayed. Other graphical tools that are focused at the power user level provide a much more user-friendly experience in dealing with your exceptions and guiding the user into where they've made the mistake."
  • "Its UI is probably a bit too confusing for that level of user, so it doesn't allow us to get the tool as widely distributed across the organization to non-technical users as much as we would like."

What is our primary use case?

We're using it for data warehousing. Typically, we collect data from numerous source systems, structure it, and then make it available to drive business intelligence, dashboard reporting, and things like that. That's the main use of it. 

We also do a little bit of moving of data from one system to another, but the data doesn't go into the warehouse. For instance, we sync the data from one of our line of business systems into our support help desk system so that it has extra information there. So, we do a few point-to-point transfers, but mainly, it is for centralizing data for data warehousing.

We use it just as a data integration tool, and we haven't found any problems. When we have big data processing, we use Amazon Redshift. We use Pentaho to load the data into Redshift and then use that for big data processing. We use Tableau for our reporting platform. We've got quite a number of users who are experienced in it, so it is our chosen reporting platform. So, we use Pentaho for the data collection and data modeling aspect of things, such as developing facts and dimensions, but we then publicly export that data to Redshift as a database platform, and then we use Tableau as our reporting platform.

I am using version 8.3, which was the latest long-term support version when I looked at it the last time. Because this is something we use in production, and it is quite core to our operations, we've been advised that we just stick with the long-term support versions of the product.

It is in the cloud on AWS. It is running on an EC2 instance in AWS Cloud.

How has it helped my organization?

It enables us to create low-code pipelines without custom coding efforts. A lot of transformations are quite straightforward because there are a lot of built-in connectors, which is really good. It has got connectors to Salesforce, which makes it very easy for us to wire up a connection to Salesforce and scrape all of that data into another table. Their flows have got absolutely no code in them. It has a Python integrator, and if you want to go into a coding environment, you've got your choice of writing in Java or Python.

The creation of low-code pipelines is quite important. We have around 200 external data sets that we query and pull the data from on a daily basis. The low-code environment makes it easier for our support function to maintain it because they can open up a transformation and very easily see what that transformation is doing, rather than having to troll through reams and reams of code. ETLs written purely in code become very difficult to trace very quickly. You spend a lot of time trying to unpick it. They never get commented on as well as you'd expect, whereas, with a low-code environment, you have your transformation there, and it almost self documents itself. So, it is much easier for somebody who didn't write the original transformation to pick that up later on.

We reuse various components. For instance, we might develop a transformation that does a lookup based on the domain name to match to a consumer record, and then we can repeat that bit of code in multiple transformations. 

We have a metadata-driven framework. Most of what we do is metadata-driven, which is quite important because that allows us to describe all of our data flows. For example, Table one moves to Table two, Table two moves to table three, etc. Because we've got metadata that explains all of those steps, it helps people investigate where the data comes from and allows us to publish reports that show, "You've got this end metric here, and this is where the data that drives that metric came from." The variable substitution that Pentaho has to allow metadata-driven frameworks is definitely a key feature that Pentaho offers.

The ability to automate data pipeline templates affects our productivity and costs. We run a lot of processes, and if it wasn't reliable, it would take a lot more effort. We would need a lot bigger team to support the 200 integrations that we run every day. Because it is a low-code environment, we don't have to have support instances escalated to the third line support to be investigated, which affects the cost. Very often our support analysts or more junior members are able to look into what an issue is and fix it themselves without having to escalate it to a more senior developer.

The automation of data pipeline templates affects our ability to scale the onboarding of data because after we've done a few different approaches and we get new requirements, they fit into a standard approach. It gives us the ability to scale with code and reuse, which also ties in with the metadata aspect of things. A lot of our intermediate stages of processing data are purely configured in metadata, so in order to implement transformation, no custom coding is required. It is really just writing a few lines of metadata to drive the process, and that gives us quite a big efficiency.

It has certainly reduced our ETL development time. I've worked at other places that had a similar-sized team to manage a system with a much lesser number of integrations. We've certainly managed to scale Pentaho not just for the number of things we do but also for the type of things we do.

We do the obvious direct database connections, but there is a whole raft of different types of integrations that we've developed over time. We have REST APIs, and we download data from Excel files that are hosted in SharePoint. We collect data from S3 buckets in Amazon, and we collect data from Google Analytics and other Google services. We've not come across anything that we've not been able to do with Pentaho. It has proved to be a very flexible way of getting data from anywhere.

Our time savings are probably quite significant. By using some of the components that we've already got written, our developers are able to, for instance, put in a transformation from a staging area to its model data area. They are probably able to put something in place in an hour or a couple of hours. If they were starting from a blank piece of paper, that would be several days worth of work.

What is most valuable?

The graphical nature of the development interface is most useful because we've got people with quite mixed skills in the team. We've got some very junior, apprentice-level people, and we've got support analysts who don't have an IT background. It allows us to have quite complicated data flows and embed logic in them. Rather than having to troll through lines and lines of code and try and work out what it's doing, you get a visual representation, which makes it quite easy for people with mixed skills to support and maintain the product. That's one side of it. 

The other side is that it is quite a modular program. I've worked with other ETL tools, and it is quite difficult to get component reuse by using them. With tools like SSIS, you can develop your packages for moving data from one place to another, but it is really difficult to reuse a lot of it, so you have to implement the same code again. Pentaho seems quite adaptable to have reusable components or sections of code that you can use in different transformations, and that has helped us quite a lot.

One of the things that Pentaho does is that it has the virtual web services ability to expose a transformation as if it was a database connection; for instance, when you have a REST API that you want to be read by something like Tableau that needs a JDBC connection. Pentaho was really helpful in getting that driver enabled for us to do some proof of concept work on that approach.

What needs improvement?

Although it is a low-code solution with a graphical interface, often the error messages that you get are of the type that a developer would be happy with. You get a big stack of red text and Java errors displayed on the screen, and less technical people can get intimidated by that. It can be a bit intimidating to get a wall of red error messages displayed. Other graphical tools that are focused at the power user level provide a much more user-friendly experience in dealing with your exceptions and guiding the user into where they've made the mistake.

Sometimes, there are so many options in some of the components. Some guidance about when to use certain options embedded into the interface would be good so that people know that if they set something, what would it do, and when should they use an option. It is quite light on that aspect.

For how long have I used the solution?

I have been using this solution since the beginning of 2016. It has been about seven years.

What do I think about the stability of the solution?

We haven't had any problems in particular that I can think of. It is quite a workhorse. It just sits there running reliably. It has got a lot to do every day. We have occasional issues of memory if some transformations haven't been written in the best way possible, and we obviously get our own bugs that we introduce into transformations, but generally, we don't have any problems with the product.

What do I think about the scalability of the solution?

It meets our purposes. It does have horizontal scaling capability, but it is not something that we needed to use. We have lots of small-sized and medium-sized data sets. We don't have to deal with super large data sets. Where we do have some requirements for that, it works quite well. We can push some of that processing down onto our cloud provider. We've dealt with some of such issues by using S3, Athena, and Redshift. You can almost offload some of the big data processing to those platforms.

How are customer service and support?

I've contacted them a few times. In terms of Lumada's ability to quickly and effectively solve issues that we brought up, we get a very good response rate. They provide very prompt responses and are quite engaging. You don't have to wait long, and you can get into a dialogue with the support team with back and forth emails in just an hour or so. You don't have to wait a week for each response cycle, which is something I've seen with some of the other support functions. 

I would rate them an eight out of 10. We've got quite a complicated framework, so it is not possible for us to send the whole thing over for them to look into it, but they certainly give help in terms of tweaks to server settings and some memory configurations to try and get things going. We run a codebase that is quite big and quite complicated, so sometimes, it might be difficult to do something that you can send over to show what the errors are. They wouldn't log in and look at your actual environment. It has to be based on the log files. So, it is a bit abstract. If you have something that's occurring just on a very specific transformation that you've got, it might be difficult for them to drill into to see why it is causing a problem on our system.

Which solution did I use previously and why did I switch?

I have a little bit of experience with AWS Glue. Its advantage is that it is tied natively into the AWS PySpark processing. Its disadvantage is that it writes some really difficult-to-maintain lines of code for all of its transformations, which might work fine if you have just a dozen or so transformations, but if you have a lot of transformations going on, it can be quite difficult to maintain.

We've also got quite a lot of experience working with SSIS. I much prefer Pentaho to SSIS. The SSIS ties you rigidly to your data flow structure that exists at design time, whereas Pentaho is very flexible. If, for instance, you wanted to move 15 columns to another table, in SSIS, you'd have to configure that with your 15 columns. If a 16th column appears, it would break that flow. With Pentaho, without amending your ETL, you can just amend your end data set to accept the 16th column, and it would just allow it to flow through. This and the fact that the transformation isn't tied down at the design time make it much more flexible than SSIS.

In terms of component reuse, other ETL tools are not nearly as good at being able to just pick up a transformation or a sub-transformation and drop it into your pipelines. You do tend to keep rewriting things again and again to get the same functionality.

What about the implementation team?

I was here during the initial setup, but I wasn't involved in it. We used an external company. They do our upgrades, etc. The reason for that is that we tend to stick with just the long-term support versions of the product. Apart from service packs, we don't do upgrades very often. We never get a deep experience of that, so it is more efficient for us to bring in this external company that we work with to do that.

What was our ROI?

It is always difficult to quantify a return on investment for data warehousing and business intelligence projects. It is a cost center rather than a profit center, but if you take the starting point as this is something that needs to be done, you could pick up the tools to do it. In the long run, you would necessarily find that they are much cheaper. If you went for more of a coded approach, it might be cheaper in terms of licensing, but then you might have higher costs of maintaining that.

What's my experience with pricing, setup cost, and licensing?

It does seem a bit expensive compared to the serverless product offering. Tools, such as Server Integration Services, are "almost" free with a database engine. It is comparable to products like Alteryx, which is also very expensive.

It would be great if we could use our enterprise license and distribute that to analysts and people around the business to use in place of Tableau Prep, etc, but its UI is probably a bit too confusing for that level of user. So, it doesn't allow us to get the tool as widely distributed across the organization to non-technical users as much as we would like.

What other advice do I have?

I would advise taking advantage of using metadata to drive your transformations. You should take advantage of the very nice and easy way in which variable substitution works in a lot of components. If you use a metadata-driven framework in Pentaho, it will allow you to self-document your process flows. At some point, it always becomes a critical aspect of a project. Often, it doesn't crop up until a year or so later, but somebody always comes asking for proof or documentation of exactly what is happening in terms of how something is getting to here and how something is driving a metric. So, if you start off from the beginning by using a metadata framework that self documents that, you'll be 90% of the way in answering those questions when you need to.

We are satisfied with our decision to purchase Hitachi's products, services, or solutions. In the low-code space, they're probably reasonably priced. With the serverless architectures out there, there is some competition, and you can do things differently using serverless architecture, which would have an overall lower cost of running. However, the fact that we have so many transformations that we run, and those transformations can be maintained by a team of people who aren't Python developers or Java developers, and our apprentices can use this tool quite easily, is an advantage of it.

I'm not too familiar with the overall roadmap for Hitachi Vantara. We're just using the Pentaho data integration products. We don't use the metadata injection aspects of Pentaho mainly because we did have a need for them, but we know they're there. 

I would rate it a seven out of 10. Its UI is a bit techy and more confusing than some of the other graphical ETL tools, and that's where improvements could be made.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
System Engineer at a tech services company with 11-50 employees
Real User
Oct 12, 2022
Enterprise Edition pricing and reduced Community Edition functionality are making us look elsewhere
Pros and Cons
  • "We also haven't had to create any custom Java code. Almost everywhere it's SQL, so it's done in the pipeline and the configuration. That means you can offload the work to people who, while they are not less experienced, are less technical when it comes to logic."
  • "Using the solution we were able to reduce our ETL deployment time by between 10 and 20 percent, and when it comes to personnel costs, we have gained 10 percent."
  • "The support for the Enterprise Edition is okay, but what they have done in the last three or four years is move more and more things to that edition. The result is that they are breaking the Community Edition. That's what our impression is."
  • "Overall, our Hitachi solution was quite good, but over the last couple of years, we have been trying to move away from the product due to a number of things."

What is our primary use case?

We use it for two major purposes. Most of the time it is for ETL of data. And based on the loaded and converted data, we are generating reports out of it. A small part of that, the pivot tables and the like, are also on the web interface, which is the more interactive part. But about 80 percent of our developers' work is on the background processes for running and transforming and changing data.

How has it helped my organization?

Before, a lot of manual work had to be done, work that isn't done anymore. We have also given additional reports to the end-users and, based upon them, they have to take some action. Based on the feedback of the users, some of the data cleaning tasks that were done manually have been automated. It has also given us a fast response to new data that is introduced into the organization.

Using the solution we were able to reduce our ETL deployment time by between 10 and 20 percent. And when it comes to personnel costs, we have gained 10 percent.

What is most valuable?

The graphical user interface is quite okay. That's the most important feature. In addition, the different types of stores and data formats that can be accessed and transferred are an important component.

We also haven't had to create any custom Java code. Almost everywhere it's SQL, so it's done in the pipeline and the configuration. That means you can offload the work to people who, while they are not less experienced, are less technical when it comes to logic. It's more about the business logic and less about the programming logic and that's really important.

Another important feature is that you can deploy it in any environment, whether it's on-premises or cloud, because you can reuse your steps. When it comes to adding to your data processing capacity dynamically that's key because when you have new workflows you have to test them. When you have to do it on a different environment, like your production environment, it's really important.

What needs improvement?

I would like to see better support from one version to the next, and all the more so if there are third-party elements that you are using. That's one of the differences between the Community Edition and the Enterprise Edition. 

In addition to better integration with third-party tools, what we have seen is that some of the tools just break from one version to the next and aren't supported anymore in the Community Edition. What is behind that is not really clear to us, but the result is that we can't migrate, or we have to migrate to other parts. That's the most inconvenient part of the tool.

We need to test to see if all our third-party plugins are still available in a new version. That's one of the reasons we decided we would move from the tool to the completely open-source version for the ETL part. That's one of the results of the migration hassle we have had every time.

The support for the Enterprise Edition is okay, but what they have done in the last three or four years is move more and more things to that edition. The result is that they are breaking the Community Edition. That's what our impression is.

The Enterprise Edition is okay, and there is a clear path for it. You will not use a lot of external plugins with it because, with every new version, a lot of the most popular plugins are transferred to the Enterprise Edition. But the Community Edition is almost not supported anymore. You shouldn't start in the Community Edition because, really early on, you will have to move to the Enterprise Edition. Before, you could live with and use the Community Edition for a longer time.

For how long have I used the solution?

I have been working with Hitachi Lumada Data Integration for seven or eight years.

What do I think about the stability of the solution?

The stability is okay. In the transfer from before it was Hitachi to Hitachi, it was two years of hell, but now it's better.

What do I think about the scalability of the solution?

At the scale we are using it, the solution is sufficient. The scalability is good, but we don't have that big of a data set. We have a couple of billion data records involved in the integration. 

We have it in one location across different departments with an outside disaster recovery location. It's on a cluster of VMs and running on Linux. The backend data store is PostgreSQL.

Maybe our design wasn't quite optimal for reloading the billions of records every night, but that's probably not due to the product but to the migration. The migration should have been done in a bit of a different way.

How are customer service and support?

I had contact with their commercial side and with the technical side for the setup and demos, but not after we implemented it. That is due to the fact that the documentation and the external consultant gave us a lot of information about it.

Which solution did I use previously and why did I switch?

We came from the Microsoft environment to Hitachi, but that was 10 years back. We switched due to the licensing costs and because there wasn't really good support for the PostgreSQL database.

Now, I think the Microsoft environment isn't that bad, and there is also better support for open-source databases.

How was the initial setup?

I was involved in the initial migration from Microsoft to Hitachi. It was rather straightforward, not too complex. Granted, it was a new toolset, but that is the same with every new toolset. The learning curve wasn't too steep.

The maintenance effort is not significant. From time to time we have an error that just pops up without our having any idea where it comes from. And then, the next day, it's gone. We get that error something like three times a year. Nobody cares about it or is looking into the details of it. 

The migrations from one version to the next that we did were all rather simple. During that process, users don't have it available for a day, but they can live with that. The migration was done over a weekend and by the following Monday, everything was up and running again.

What about the implementation team?

We had some external help from someone who knows the product and had already had some experience with implementing the tool.

What was our ROI?

In terms of ROI, over the years it was a good step to make the move to Hitachi. Now, I don't think it would be. Now, it would be a different story.

What's my experience with pricing, setup cost, and licensing?

We are using the Community Edition. We have been trying to use and sell the Enterprise version, but that hasn't been possible due to the budget required for it.

Which other solutions did I evaluate?

When we made the choice, it was between Microsoft, Hitachi, and Cognos. The deciding factor in going with Hitachi was its better support for open-source databases and data stores. Also, the functionality of the Community version was what was needed by most of our customers.

What other advice do I have?

Our experience with the query performance of Lumada on large data sets is that Lumada is not what determines performance. Most of the time, the performance comes from the database or the data store underneath Lumada. Depending on how big your data set is, you have to change or optimize your data store and then you can work with large data sets.

The fine-tuning of the database that is done outside of Lumada is okay because a tool can't provide every insight into every type of data store or dataset. If you are looking into optimization, you have to use your data store optimization tools. Hitachi isn't designed for that, and we were not expecting to have that.

I'm not really that impressed with Hitachi's ability to quickly and effectively solve issues we have brought up, but it's not that bad either. It's halfway, not that good and not that bad.

Overall, our Hitachi solution was quite good, but over the last couple of years, we have been trying to move away from the product due to a number of things. One of them is the price. It's really expensive. And the other is that more and more of what used to be part of the Community Edition functionality is moving to the Enterprise Edition. The latter is okay and its functions are okay, but then we are back to the price. Some of our customers don't have the deeper pockets that Hitachi is aiming for.

Before, it was more likely that I would recommend Hitachi Ventara to a colleague. But now, if you are starting in an environment, you should move to other solutions. If you have the money for the Enterprise Edition, then I would say my likelihood of recommending it, on a scale of one to 10, would be a seven. Otherwise, it would be a one out of 10.

If you are going with Hitachi, go for the Enterprise version or stay away from Hitachi.

It's also really important to think in great detail about your loading process at the start. Make sure that is designed correctly. That's not directly related to the tool itself, but it's more about using the tool and how the loads are transferred.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
reviewer1872000 - PeerSpot reviewer
Senior Data Analyst at a tech services company with 51-200 employees
Real User
Jun 8, 2022
We're able to query large data sets without affecting performance
Pros and Cons
  • "One of the most valuable features is the ability to create many API integrations. I'm always working with advertising agents and using Facebook and Instagram to do campaigns. We use Pentaho to get the results from these campaigns and to create dashboards to analyze the results."
  • "Before we used Pentaho, our processes were in Microsoft Excel and the updates from databases had to be done manually, but now all our routines are done automatically and we have more time to do other jobs, saving us four or five hours daily."
  • "Parallel execution could be better in Pentaho. It's very simple but I don't think it works well."

What is our primary use case?

I use it for ETL. We receive data from our clients and we join the most important information and do many segmentations to help with communication between our product and our clients.

How has it helped my organization?

Before we used Pentaho, our processes were in Microsoft Excel and the updates from databases had to be done manually. Now all our routines are done automatically and we have more time to do other jobs. It saves us four or five hours daily.

In terms of ETL development time, it depends on the complexity of the job, but if the job is simple it saves two or three hours.

What is most valuable?

One of the most valuable features is the ability to create many API integrations. I'm always working with advertising agents and using Facebook and Instagram to do campaigns. We use Pentaho to get the results from these campaigns and to create dashboards to analyze the results.

I'm working with large data sets. One of the clients I'm working with is a large credit card company and the database from this client is very large. Pentaho allows me to query large data sets without affecting its performance.

I use Pentaho with Jenkins to schedule the jobs. I'm using the jobs and transformations in Pentaho to create many links. 

I always find ways to have minimal code and create the processes with many parameters. I am able to reuse processes that I have created before. 

Creating jobs and putting them into production, as well as the visibility that Pentaho gives, are both very simple.

What needs improvement?

Parallel execution could be better in Pentaho. It's very simple but I don't think it works well.

For how long have I used the solution?

I've been working with Pentaho for four or five years.

What do I think about the stability of the solution?

The stability is good. 

What do I think about the scalability of the solution?

It's scalable.

How are customer service and support?

I find help on the forums.

Which solution did I use previously and why did I switch?

I used SQL Server Integration Services, but I have much more experience with Pentaho. I have also worked with Apache NiFi but it is more focused on single data processes but I'm always working with batch processes and large data sets.

How was the initial setup?

The first deployment was very complex because we didn't have experience with the solution, but the next deployment was simpler.

We create jobs weekly in Pentaho. The development time takes, on average, one week and the deployment takes just one day or so.

We just put it on Git and pull a server and schedule the execution.

We use it on-premises while the infrastructure is Amazon and Azure.

What other advice do I have?

I always recommend Pentaho for working with automated processes and to do API integrations.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Renan Guedert - PeerSpot reviewer
Business Intelligence Specialist at a recruiting/HR firm with 11-50 employees
Real User
Apr 20, 2022
Creates a good, visual pipeline that is easy to understand, but doesn't handle big data well
Pros and Cons
  • "Sometimes, it took a whole team about two weeks to get all the data to prepare and present it. After the optimization of the data, it took about one to two hours to do the whole process. Therefore, it has helped a lot when you talk about money, because it doesn't take a whole team to do it, just one person to do one project at a time and run it when you want to run it. So, it has helped a lot on that side."
  • "A big problem after deploying something that we do in Lumada is with Git. You get a binary file to do a code review. So, if you need to do a review, you have to take pictures of the screen to show each step. That is the biggest bug if you are using Git."

What is our primary use case?

It was our principle to make the whole ETL and data warehousing on our projects. We created a whole step for collecting all the raw data from APIs and other databases from flat files, like Excel files, CSV files, and JSON files, to do the whole transformation and data preparation, then model the data and put it in SQL Server and integration services.

For business intelligence projects, it is sometimes pretty good, when you are extracting something from the API, to have a step to transform the JSON file from the API to an SQL table.

We use it heavily as a virtual machine running on Windows. We have also installed the open-source version on the desktop.

How has it helped my organization?

Lumada provides us with a single, end-to-end data management experience from ingestion to insights. This single data management experience is pretty good because then you don't have every analyst doing their own stuff. When you have one unique tool to do that, you can keep improving as well as have good practices and a solid process to do the projects.

What is most valuable?

It has many resourceful things. It has a variety of the things that you can do. It is also pretty open, since you can put in a Python script or JavaScript for everything. If you don't have the native tool on the application, you can build your own using scripts. You can build your other steps and jobs on the application. The liberty of the application has been pretty good.

Lumada enables us to create pipelines with minimal manual coding efforts, which is the most important thing. When creating a pipeline, you can see which steps are failing in the process. You can keep up the process and debug, if you have problems. So, it creates a good, visual pipeline that makes it easy to understand what you are doing during the entire process.

What needs improvement?

There is no straight-line explanation about bugs and errors that happen on the software. I must search heavily on the Internet, some YouTube videos, and other forums to know what is happening. The proper site of Hitachi and Lumada doesn't have the best explanation about bugs, errors, and the functions. I must search for other sources to understand what is happening. Usually, it is some guy in India or Russia who knows the answer.

A big problem after deploying something that we do in Lumada is with Git. You get a binary file to do a code review. So, if you need to do a review, you have to take pictures of the screen to show each step. That is the biggest bug if you are using Git.

After you create a data pipeline, if you could make a JSON file or something with another language, we could simplify the steps for creating what we are doing. Or, a simple flat file of text could be even better than that, but generated by their own platform so people can look and see what is happening. You shouldn't need to download the whole project in your own Pentaho, I would like to just look at the code and see if there is something wrong.

When I use it for open-source applications, it doesn't handle big data too well. Therefore, we have to use other kinds of technologies to manage that.

I would like it more accessible for Macs. Previously, I always used Linux, but some companies that I worked for before used MacBooks. It would be good if I could use Pentaho in that too since I need to use other tools or create a virtual machine to use Pentaho. So, it would be pretty good if the solution had a friendly version for Macs or Linux-based programs, like Ubuntu.

For how long have I used the solution?

I have been using it for six years, but more heavily over the last two years.

How are customer service and support?

I don't bring issues to Hitachi since Lumada is open source in some kind of way. 

Once, when I had a problem with connections because of the software, I saw the issue in the forums on the Internet because there was some type of bug happening.

Which solution did I use previously and why did I switch?

At my first company, we used just Lumada. At my second company, we used a lot of QlikView, SQL, Python, and Lumada. At my third company, we used Python and SQL much more. I used Lumada just once at that company. At my new company, I don't use it at all. I just use Azure Data Factory and SQL.

With Pentaho, we finally have data pipelines. We didn't have solid data pipelines before. After the data pipelines became very solid, the team who created them became very popular in the company.

How was the initial setup?

To set up the things, we used a virtual machine. It was a version where we can download it and unlock a machine too. You can do Ctrl-C and Ctrl-V with Pentaho because all you need to have is the newest version of Java. So, it was pretty smooth to do the setup. It took an hour maximum to deploy.

What was our ROI?

Sometimes, it took a whole team about two weeks to get all the data to prepare and present it. After the optimization of the data, it took about one to two hours to do the whole process. Therefore, it has helped a lot when you talk about money, because it doesn't take a whole team to do it, just one person to do one project at a time and run it when you want to run it. So, it has helped a lot on that side.

The solution reduced our ETL development time by a lot because a whole project used to take about a month to get done previously. After having Lumada, it took just a week. For a big company in Brazil, it saves a team at least $10,000 a month.

Which other solutions did I evaluate?

I just use the ETL tool. For data visualization, we are using Power BI. For data storage, we use SQL Server, Azure, or Google BigQuery.

We are just using the open-source application for ETL. We have never looked into other tools of Hitachi because they are paid.

I know other companies who are using Alteryx, which has a friendlier user interface, but they have fewer tools and are more difficult to utilize. My wife uses Alteryx, and I find it is not as good after I used Lumada because they have more solutions and it's open-source. Though, Alteryx has more security and better support.

What other advice do I have?

For someone who wants simple solutions, open-source tools are very perfect for someone who isn't a programmer or knowledgeable about technology. In one week, you can try to understand this solution and do your first project. In my opinion, it is the best tool for people starting out.

Lumada is a great tool. I would rate it as a straight seven out of 10. It gets the work done. The open-source version doesn't work well with big data sources, but there is a lot of flexibility and liberty to do everything you want and need. If the open-source version worked better with big data, then I would give it a straight eight since there is always room for improvement. Sometimes when debugging, some errors can be pretty difficult. It is a tool in principle, when you are starting business intelligence and data engineering, to understand everything that is going on.

Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros sharing their opinions.
Updated: May 2026
Product Categories
Data Integration
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros sharing their opinions.