Michel Philippenko - PeerSpot reviewer
Project Manager at a computer software company with 51-200 employees
Real User
Forums are helpful, and creating ETL jobs is simpler than in other solutions
Pros and Cons
    • "I was not happy with the Pentaho Report Designer because of the way it was set up. There was a zone and, under it, another zone, and under that another one, and under that another one. There were a lot of levels and places inside the report, and it was a little bit complicated. You have to search all these different places using a mouse, clicking everywhere... each report is coded in a binary file... You cannot search with a text search tool..."

    What is our primary use case?

    I was working with Pentaho for a client. I had to implement complicated data flows and extraction. I had to take data from several sources in a PostgreSQL database by reading many tables in several databases, as well as from Excel files. I created some complex jobs. I also had to implement business reports with the Pentaho Report Designer.

    The client I was working for had Pentaho on virtual machines.

    What is most valuable?

    The ETL feature was the most valuable to me. I like it very much. It was very good.

    What needs improvement?

    I was not happy with the Pentaho Report Designer because of the way it was set up. There was a zone and, under it, another zone, and under that another one, and under that another one. There were a lot of levels and places inside the report, and it was a little bit complicated. You had to search all these different places using a mouse, clicking everywhere. The interface does not enable you to find things and manage all that. I don't know if other tools are better for end-users when it comes to the graphical interface, but this was a bit complicated. In the end, we were able to do everything with Pentaho.

    And when you want to improve the appearance of your report, Pentaho Report Designer has complicated menus. It is not very user-friendly. The result is beautiful, but it takes time.

    Also, each report is coded in a binary file, so you cannot read it. Maybe that's what the community or the developers want, but it is inconvenient because when you want to search for information, you need to open the graphical interface and click everywhere. You cannot search with a text search tool because the reports are coded in binary. When you have a lot of reports and you want to find where a precise part of one of your reports is, you cannot do it easily.

    The way you specify parameters in Pentaho Report Designer is a little bit complex. There are two interfaces. The job creators use the PDI which provides the ETL interface, and it's okay. Creating the jobs for extract/transform/load is simpler than in other solutions. But there is another interface for the end-users of Pentaho and you have to understand how they relate to each other, so it's a little bit complex. You have to go into XML files, which is not so simple.

    Also, using the solution overall is a little bit difficult. You need to be an engineer and somebody with a technical background. It's not absolutely easy, it's a technical tool. I didn't immediately understand it and had to search for information and to think about it.

    For how long have I used the solution?

    I used Hitachi Lumada Data Integration, Pentaho, for approximately two years.

    Buyer's Guide
    Pentaho Data Integration and Analytics
    April 2024
    Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
    770,141 professionals have used our research since 2012.

    What do I think about the stability of the solution?

    The stability was perfect.

    What do I think about the scalability of the solution?

    I didn't scale the solution. I had to migrate from an old Pentaho to a new Pentaho. I had quite a big set of data, but I didn't add new data. I worked with the same volume of data all the time so I didn't test the scaling.

    In the company I consulted for, there were about 15 people who input the data and worked with the technical part of Pentaho. There were a lot of end-users, who were the people interested in the reports; on the order of several thousand end-users. 

    How are customer service and support?

    The technical support was okay. I used the open-source version of Pentaho and I used the forum. I found what I needed. And, the one or two times when I didn't find something, I asked a question in the forum and I received an answer very quickly. I appreciated that a lot. I had an answer one or two hours later. It's very good that somebody from Pentaho Enterprise responds so rapidly.

    How was the initial setup?

    The initial setup was complex, but I'm an engineer and it's my job to deal with complex systems. It's not the most complex that I have dealt with, but it was still somewhat complex. The procedure was explained on the Pentaho website in the documentation. You had to understand which module does what. It was quite complex.

    It took quite a long time because I had to troubleshoot, to understand what was wrong, and I had to do it several times before it worked.

    What's my experience with pricing, setup cost, and licensing?

    I didn't purchase Pentaho. There is a business version but I used only the open source. I was fully satisfied and very happy with it. It's a very good open-source solution. The communication channels, the updates, the patches, et cetera are all good.

    What other advice do I have?

    I would fully recommend Pentaho. I have already recommended it to some colleagues. It's a good product with good performance.

    Overall, I was very happy with it. It was complicated, but that is part of my job. I was happy with the result and the stability. The Data Integration product is simpler than the Report Designer. I would rate the Data Integration at 10 out of 10 and the Report Designer at nine, because of the graphical interface.

    Disclosure: My company has a business relationship with this vendor other than being a customer: System integrator
    PeerSpot user
    Data Architect at a consumer goods company with 1,001-5,000 employees
    Real User
    Top 20
    I can extend and customize existing pipeline templates for changing requirements, saving time
    Pros and Cons
    • "I can use Python, which is open-source, and I can run other scripts, including Linux scripts. It's user-friendly for running any object-based language. That's a very important feature because we live in a world of open-source."
    • "I would like to see improvement when it comes to integrating structured data with text data or anything that is unstructured. Sometimes we get all kinds of different files that we need to integrate into the warehouse."

    What is our primary use case?

    We use it for orchestration and as an ETL tool to move data from one environment to another, including moving data from on-premises to the cloud and moving operational data from different source systems into the data warehouse.

    How has it helped my organization?

    People are now able to get access to the data when they need it. That is what is most important. All the reports go out on time.

    The solution enables us to use one tool that gives a single, end-to-end data management experience from ingestion to insights. From the reporting point of view, we are able to make our customers happy. Are they able to get their reports in time? Are they able to get access to the data that they need on time? Yes. They're happy, we're happy, that's it.

    With the automation of everything, if I start breaking it into numbers, we don't have to hire three or four people to do one simple task. We've been able to develop some generic IT processes so that we don't have to reinvent the wheel. I just have to extend the existing pipeline and customize it to whatever requirements I have at that point in time. Otherwise, whenever we would get a project, we would actually have to reinvent the wheel from scratch. Now, the generic pipeline templates that we can reuse save us so much time and money.

    It has also reduced our ETL development time by 40 percent, and that translates into cost savings.

    Before we used Pentaho, we used to do some of this stuff manually, and some of the ETL jobs would run for hours, but most of the ETL jobs, like the monthly reports, now run within 45 minutes, which is pretty awesome. Everything that we used to do manually is now orchestrated.

    And now, with everything in the cloud, any concerns about hardware are taken care of for us. That helps with maintenance costs.

    What is most valuable?

    I can use Python, which is open-source, and I can run other scripts, including Linux scripts. It's user-friendly for running any object-based language. That's a very important feature because we live in a world of open-source. With open-source on the table, I am in a position to transform the data where it's actually being moved from one environment to another.

    Whether we are working with structured or unstructured data, the tool has been helpful. We are actually able to extend it to read JSON data by creating some Java components.

    The solution gives us the flexibility to deploy it in any environment, including on-premises or in the cloud. That is another very important feature.

    What needs improvement?

    I would like to see improvement when it comes to integrating structured data with text data or anything that is unstructured. Sometimes we get all kinds of different files that we need to integrate into the warehouse. 

    By using some of the Python scripts that we have, we are able to extract all this text data into JSON. Then, from JSON, we are able to create external tables in the cloud whereby, at any one time, somebody has access to this data on the S3 drive.

    For how long have I used the solution?

    I've been using Hitachi Lumada Data Integration since 2014.

    What do I think about the stability of the solution?

    It's been stable.

    What do I think about the scalability of the solution?

    We are able to scale our environment. For example, if I had that many workloads, I could scale the tool to run on three instances, and all the workloads would be distributed equally.

    How are customer service and support?

    Their tech support is awesome. They always answer and attend to any incidents that we raise.

    How would you rate customer service and support?

    Positive

    Which solution did I use previously and why did I switch?

    Everything was done manually in Excel. The main reason we went with Pentaho is that it's open-source.

    How was the initial setup?

    The deployment was like any other deployment. All the steps are written down in a document and you just have to follow those steps. It was simple for us.

    What other advice do I have?

    The performance of Pentaho, like any other ETL tool, starts from the database side, once you write good, optimized scripts. The optimization of Pentaho depends on the hardware it's sitting on. Once you have enough RAM on your VM, you are in a position to run any workloads.

    Overall it is an awesome tool. We are satisfied with our decision to go with Hitachi's product. It's like any other ETL tool.  It's like SQL Server Integration Services, Informatica, or DataStage. On a scale of one to 10, where 10 is best, I would give it a nine in terms of recommending it to a colleague.

    Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
    PeerSpot user
    Buyer's Guide
    Pentaho Data Integration and Analytics
    April 2024
    Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
    770,141 professionals have used our research since 2012.
    Systems Analyst at a university with 5,001-10,000 employees
    Real User
    Reuse of ETLs with metadata injection saves us development time, but the reporting side needs notable work
    Pros and Cons
    • "The fact that it enables us to leverage metadata to automate data pipeline templates and reuse them is definitely one of the features that we like the best. The metadata injection is helpful because it reduces the need to create and maintain additional ETLs. If we didn't have that feature, we would have lots of duplicated ETLs that we would have to create and maintain. The data pipeline templates have definitely been helpful when looking at productivity and costs."
    • "The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have. A simple feature you would expect a reporting tool to have is the ability to search the repository for a report. It doesn't even have that capability. That's been a feature that we've been asking for since the beginning and it hasn't been implemented yet."

    What is our primary use case?

    We use it as a data warehouse between our HR system and our student system, because we don't have an application that sits in between them. It's a data warehouse that we do our reporting from.

    We also have integrations to other, isolated apps within the university that we gather data from. We use it to bring that into our data warehouse as well.

    How has it helped my organization?

    Lumada Data Integration definitely helps with decision-making for our deans and upper executives. They are the ones who use the product the most to make their decisions. The data warehouse is the only source of information that's available for them to use, and to create that data warehouse we had to use this product.

    And it has absolutely reduced our ETL development time. The fact that we're able to reuse some of the ETLs with the metadata injection saves us time and costs. It also makes it a pretty quick process for our developers to learn and pick up ETLs from each other. It's definitely easy for us to transition ETLs from one developer to another. The ETL functionality satisfies 95 percent of all our needs. 

    What is most valuable?

    The ETL is definitely an awesome feature of the product. It's very easy and quick to use. Once you understand the way it works it's pretty robust.

    Lumada Data Integration requires minimal coding. You can do more complex coding if you want to, because it has a scripts option that you can add as a feature, but we haven't found a need to do that yet. We just use what's available, the steps that they have, and that is sufficient for our needs at this point. It makes it easier for other developers to look at the things that we have developed and to understand them quicker, whereas if you have complex coding it's harder to hand off to other people. Being able to transition something to another developer, and having that person pick it up quicker than if there were custom scripting, is an advantage.

    In addition, the solution's ability to quickly and effectively solve issues we've brought up has been great. We've been able to use all the available features.

    Among them is the ability to develop and deploy data pipeline templates once and reuse them. The fact that it enables us to leverage metadata to automate data pipeline templates and reuse them is definitely one of the features that we like the best. The metadata injection is helpful because it reduces the need to create and maintain additional ETLs. If we didn't have that feature, we would have lots of duplicated ETLs that we would have to create and maintain. The data pipeline templates have definitely been helpful when looking at productivity and costs. The automation of data pipeline templates has also been helpful in scaling the onboarding of data.

    What needs improvement?

    The transition to the web-based solution has taken a little longer and been more tedious than we would like and it's taken away development efforts towards the reporting side of the tool. They have a reporting tool called Pentaho Business Analytics that does all the report creation based on the data integration tool. There are a lot of features in that product that are missing because they've allocated a lot of their resources to fixing the data integration, to make it more web-based. We would like them to focus more on the user interface for the reporting.

    The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have. A simple feature you would expect a reporting tool to have is the ability to search the repository for a report. It doesn't even have that capability. That's been a feature that we've been asking for since the beginning and it hasn't been implemented yet. We have between 500 and 800 reports in our system now. We've had to maintain an external spreadsheet with IDs to identify the location of all of those reports, instead of having that built into the system. It's been frustrating for us that they can't just build a simple search feature into the product to search for report names. It needs to be more in line with other reporting tools, like Tableau. Tableau has a lot more features and functions.

    Because the reporting is lacking, only the deans and above are using it. It could be used more, and we'd like it to be used more.

    Also, while the solution provides us with a single, end-to-end data management experience from ingestion to insights, it does but it doesn't give us a full history of where it's coming from. If we change a field, we can't trace it through from the reporting to the ETL field. Unfortunately, it's a manual process for us. Hitachi has a new product to do that and it searches all the fields, documents, and files just to get your pipeline mapped, but we haven't bought that product yet.

    For how long have I used the solution?

    I've been using Lumada Data Integration since version 4.2. We're now on version 9.1.

    What do I think about the stability of the solution?

    The stability has been great. Other than for upgrades, it has been pretty stable.

    What do I think about the scalability of the solution?

    The scalability is great too. We've been able to expand the current system and add a lot of customizations to it.

    For maintenance, surprisingly, it's just me who does so in our organization.

    How are customer service and support?

    The only issue that we've had is that it takes a little longer than we would like for support to resolve something, although things do eventually get incorporated. They're very quick to respond to an issue, but the fixing of the issue is not as quick.

    For example, a few versions ago, when we upgraded it, we found that the upgrade caused a whole bunch of issues with the Oracle data types and the way the ETL was working with them. It wasn't transforming to the data types properly, the way we were expecting it to. In the previous version that we were using it was working fine, but the upgrade caused the issue, and it took them a while to fix that.

    How would you rate customer service and support?

    Neutral

    Which solution did I use previously and why did I switch?

    We didn't have another tool. This is the only tool we have used to create the data warehouse between the two systems. When we started looking at solutions, this one was great because it was open source and Java-based, and it had a Community Edition. But we actually purchased the Enterprise Edition.

    How was the initial setup?

    I came in after it was purchased and after the first deployment.

    What's my experience with pricing, setup cost, and licensing?

    We renew our license every two years. When I spoke to the project manager, he indicated that the pricing has been going up every two years. It's going to reach a point where, eventually, we're going to have to look at alternative solutions because of the price.

    When we first started with it, it was much cheaper. It has gone up drastically, especially since Hitachi bought out Pentaho. When they bought it, the price shot up. They said the increase is because of all the improvements they put into the product and the support that they're providing. From our point of view, their improvements are mostly on the data integration part of it, instead of the reporting part of it, and we aren't particularly happy with that.

    Which other solutions did I evaluate?

    I've used Tableau and other reporting tools, but Tableau sticks out because the reporting tool is much nicer. Tableau has its drawbacks with the ETL, because you can only use Tableau datasets. You have to get data into a Tableau file dataset and then the ETL part of it is stuck in Tableau forever.

    If we could use the Pentaho ETL and the Tableau reporting we'd be happy campers.

    What other advice do I have?

    It's a great product. The ETL part of the product is really easy to pick up and use. It has a graphical interface with the ability to be more complex via scripting and features that you can add.

    When looking at Hitachi Vantara's roadmap, the ability to upgrade more easily is one element of it that is important to us. Also, they're going more towards web-based solutions, instead of having local client development tools. If it does go on the web, and it works the same way it works on the client, that would be a nice feature. Currently, because we have these local client development tools, we have to have a VM client for our developers to use, and that makes it a little more tricky. Whereas if they put it on the web, then all our developers would be able to use any desktop and access the web for development.

    When it comes to the query performance of the solution on large datasets, we haven't had any issues with it. We have one table in our data warehouse that has about 120 million rows and we haven't had any performance issues.

    The solution gives you the flexibility to deploy it in any environment, whether on-prem or in the cloud. With our particular implementation, we've done a lot of customizations. We have special things that we bolted onto the product, so it's not as easy to put it onto the cloud for us. All of our customizations and bolt-ons end up costing us more because they make upgrades more difficult and time-consuming. We don't use an automated upgrade process. It's manual. We have to do a full reinstall and then apply all our bolt-ons and make sure it still works. If we could automate that process it would certainly reduce our costs.

    In terms of updating to version 9.2, which is the latest version, we're going to look into it next year and see what level of effort is required and determine how it impacts our current system. They release a new update about every six months, and there is a major release every year or two, so it's quite a fast schedule for updates.

    Overall, I would rate our satisfaction with our decision to purchase Hitachi products as a seven out of 10. I would definitely recommend the data integration tool but I wouldn't recommend the reporting tool.

    Which deployment model are you using for this solution?

    On-premises
    Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
    PeerSpot user
    Data Architect at a tech services company with 1,001-5,000 employees
    Reseller
    Top 20
    Helped us to fully digitalize a national census process, eliminating door-to-door interviews
    Pros and Cons
    • "One of the valuable features is the ability to use PL/SQL statements inside the data transformations and jobs."
    • "I would like to see support for some additional cloud sources. It doesn't support Azure, for example. I was trying to do a PoC with Azure the other day but it seems they don't support it."

    What is our primary use case?

    We use it as an ETL tool. We take data from a source database and move it into a target database. We do some additional processing on our target databases as well, and then load the data into a data warehouse for reports. The end result is a data warehouse and the reports built on top of that.

    We are a consulting company and we implement it for clients.

    How has it helped my organization?

    As a result of one of the projects that we did in the Middle East, we achieved the main goal of fully digitalizing their population census. They did previous censuses doing door-to-door surveys, but for the last census, using Pentaho Data Integration, we managed to get it all running in a fully digital way, with nothing on paper forms. No one had to go door-to-door and survey the people.

    What is most valuable?

    One of the valuable features is the ability to use PL/SQL statements inside the data transformations and jobs.

    What needs improvement?

    I would like to see support for some additional cloud sources - Azure, Snowflake.

    For how long have I used the solution?

    I have been using Hitachi Lumada Data Integration for four years.

    What do I think about the stability of the solution?

    There have been some bugs and some weird things every now and then, but it is mostly fairly stable.

    What do I think about the scalability of the solution?

    If you work with relatively small data sets, it's all okay. But if you are going to use really huge data sets, then you might get into a bit of trouble, at least from what I have seen.

    How are customer service and support?

    The support from Hitachi is not the greatest, the fixing of bugs can take a really long time.

    How would you rate customer service and support?

    Neutral

    How was the initial setup?

    The initial setup is very straightforward compared to many other ETL tools. It takes about half a day.

    We have about five users, altogether. There are two to three developers, one to two customer people who run the ETLs and one to two admins who take care of the environment itself. It doesn't require much maintenance. Occasionally someone has to restart the server or take a look at logs.

    What was our ROI?

    Because you can basically get Pentaho Data Integration for free, I would give the cost versus performance a pretty good rating.

    Taking the census project that I mentioned earlier as an example (Pentaho Data Integration was not the only contributor though, it was just one part of the whole solution) the statistical authority managed to save huge amounts of money by making the census electronic, versus the traditional version.

    What's my experience with pricing, setup cost, and licensing?

    You don't need the Enterprise Edition, you can go with the Community Edition. That way you can use it for free and, for free, it's a pretty good tool to use. 

    If you pay for licenses, the only thing that you're getting, in addition, is customer support, which is pretty much nonexistent in any case. I would recommend going with the Community Edition.

    Which other solutions did I evaluate?

    I have had experience with other solutions, but for the last project we did not evaluate other options. Because we had previous experience with Pentaho Data Integration, it was pretty much a no-brainer to use it.

    What other advice do I have?

    Hitachi Vantara's roadmap is promising. They came up with Lumada and it seems that they do have some ideas on how to make their product a bit more attractive than it currently is.

    I'm fairly satisfied with using Pentaho Data Integration. It's more or less okay. When it comes to all the other parts, like Pentaho reports and Pentaho dashboards, things could be better there.

    The biggest lesson I've learned from using this solution is that a cheap, open-source tool can sometimes be even more efficient than some of the high-priced enterprise ETL tools. Overall, the solution is okay, considering the low cost. It has all of the main things that you would expect it to have, from a technical perspective.

    Which deployment model are you using for this solution?

    On-premises
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    System Engineer at a tech services company with 11-50 employees
    Real User
    Top 20Leaderboard
    Enterprise Edition pricing and reduced Community Edition functionality are making us look elsewhere
    Pros and Cons
    • "We also haven't had to create any custom Java code. Almost everywhere it's SQL, so it's done in the pipeline and the configuration. That means you can offload the work to people who, while they are not less experienced, are less technical when it comes to logic."
    • "The support for the Enterprise Edition is okay, but what they have done in the last three or four years is move more and more things to that edition. The result is that they are breaking the Community Edition. That's what our impression is."

    What is our primary use case?

    We use it for two major purposes. Most of the time it is for ETL of data. And based on the loaded and converted data, we are generating reports out of it. A small part of that, the pivot tables and the like, are also on the web interface, which is the more interactive part. But about 80 percent of our developers' work is on the background processes for running and transforming and changing data.

    How has it helped my organization?

    Before, a lot of manual work had to be done, work that isn't done anymore. We have also given additional reports to the end-users and, based upon them, they have to take some action. Based on the feedback of the users, some of the data cleaning tasks that were done manually have been automated. It has also given us a fast response to new data that is introduced into the organization.

    Using the solution we were able to reduce our ETL deployment time by between 10 and 20 percent. And when it comes to personnel costs, we have gained 10 percent.

    What is most valuable?

    The graphical user interface is quite okay. That's the most important feature. In addition, the different types of stores and data formats that can be accessed and transferred are an important component.

    We also haven't had to create any custom Java code. Almost everywhere it's SQL, so it's done in the pipeline and the configuration. That means you can offload the work to people who, while they are not less experienced, are less technical when it comes to logic. It's more about the business logic and less about the programming logic and that's really important.

    Another important feature is that you can deploy it in any environment, whether it's on-premises or cloud, because you can reuse your steps. When it comes to adding to your data processing capacity dynamically that's key because when you have new workflows you have to test them. When you have to do it on a different environment, like your production environment, it's really important.

    What needs improvement?

    I would like to see better support from one version to the next, and all the more so if there are third-party elements that you are using. That's one of the differences between the Community Edition and the Enterprise Edition. 

    In addition to better integration with third-party tools, what we have seen is that some of the tools just break from one version to the next and aren't supported anymore in the Community Edition. What is behind that is not really clear to us, but the result is that we can't migrate, or we have to migrate to other parts. That's the most inconvenient part of the tool.

    We need to test to see if all our third-party plugins are still available in a new version. That's one of the reasons we decided we would move from the tool to the completely open-source version for the ETL part. That's one of the results of the migration hassle we have had every time.

    The support for the Enterprise Edition is okay, but what they have done in the last three or four years is move more and more things to that edition. The result is that they are breaking the Community Edition. That's what our impression is.

    The Enterprise Edition is okay, and there is a clear path for it. You will not use a lot of external plugins with it because, with every new version, a lot of the most popular plugins are transferred to the Enterprise Edition. But the Community Edition is almost not supported anymore. You shouldn't start in the Community Edition because, really early on, you will have to move to the Enterprise Edition. Before, you could live with and use the Community Edition for a longer time.

    For how long have I used the solution?

    I have been working with Hitachi Lumada Data Integration for seven or eight years.

    What do I think about the stability of the solution?

    The stability is okay. In the transfer from before it was Hitachi to Hitachi, it was two years of hell, but now it's better.

    What do I think about the scalability of the solution?

    At the scale we are using it, the solution is sufficient. The scalability is good, but we don't have that big of a data set. We have a couple of billion data records involved in the integration. 

    We have it in one location across different departments with an outside disaster recovery location. It's on a cluster of VMs and running on Linux. The backend data store is PostgreSQL.

    Maybe our design wasn't quite optimal for reloading the billions of records every night, but that's probably not due to the product but to the migration. The migration should have been done in a bit of a different way.

    How are customer service and support?

    I had contact with their commercial side and with the technical side for the setup and demos, but not after we implemented it. That is due to the fact that the documentation and the external consultant gave us a lot of information about it.

    Which solution did I use previously and why did I switch?

    We came from the Microsoft environment to Hitachi, but that was 10 years back. We switched due to the licensing costs and because there wasn't really good support for the PostgreSQL database.

    Now, I think the Microsoft environment isn't that bad, and there is also better support for open-source databases.

    How was the initial setup?

    I was involved in the initial migration from Microsoft to Hitachi. It was rather straightforward, not too complex. Granted, it was a new toolset, but that is the same with every new toolset. The learning curve wasn't too steep.

    The maintenance effort is not significant. From time to time we have an error that just pops up without our having any idea where it comes from. And then, the next day, it's gone. We get that error something like three times a year. Nobody cares about it or is looking into the details of it. 

    The migrations from one version to the next that we did were all rather simple. During that process, users don't have it available for a day, but they can live with that. The migration was done over a weekend and by the following Monday, everything was up and running again.

    What about the implementation team?

    We had some external help from someone who knows the product and had already had some experience with implementing the tool.

    What was our ROI?

    In terms of ROI, over the years it was a good step to make the move to Hitachi. Now, I don't think it would be. Now, it would be a different story.

    What's my experience with pricing, setup cost, and licensing?

    We are using the Community Edition. We have been trying to use and sell the Enterprise version, but that hasn't been possible due to the budget required for it.

    Which other solutions did I evaluate?

    When we made the choice, it was between Microsoft, Hitachi, and Cognos. The deciding factor in going with Hitachi was its better support for open-source databases and data stores. Also, the functionality of the Community version was what was needed by most of our customers.

    What other advice do I have?

    Our experience with the query performance of Lumada on large data sets is that Lumada is not what determines performance. Most of the time, the performance comes from the database or the data store underneath Lumada. Depending on how big your data set is, you have to change or optimize your data store and then you can work with large data sets.

    The fine-tuning of the database that is done outside of Lumada is okay because a tool can't provide every insight into every type of data store or dataset. If you are looking into optimization, you have to use your data store optimization tools. Hitachi isn't designed for that, and we were not expecting to have that.

    I'm not really that impressed with Hitachi's ability to quickly and effectively solve issues we have brought up, but it's not that bad either. It's halfway, not that good and not that bad.

    Overall, our Hitachi solution was quite good, but over the last couple of years, we have been trying to move away from the product due to a number of things. One of them is the price. It's really expensive. And the other is that more and more of what used to be part of the Community Edition functionality is moving to the Enterprise Edition. The latter is okay and its functions are okay, but then we are back to the price. Some of our customers don't have the deeper pockets that Hitachi is aiming for.

    Before, it was more likely that I would recommend Hitachi Ventara to a colleague. But now, if you are starting in an environment, you should move to other solutions. If you have the money for the Enterprise Edition, then I would say my likelihood of recommending it, on a scale of one to 10, would be a seven. Otherwise, it would be a one out of 10.

    If you are going with Hitachi, go for the Enterprise version or stay away from Hitachi.

    It's also really important to think in great detail about your loading process at the start. Make sure that is designed correctly. That's not directly related to the tool itself, but it's more about using the tool and how the loads are transferred.

    Which deployment model are you using for this solution?

    On-premises
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Lead, Data and BI Architect at a financial services firm with 201-500 employees
    Real User
    We can use the same tool on all our environments. The patching is buggy.
    Pros and Cons
    • "Flexible deployment, in any environment, is very important to us. That is the key reason why we ended up with these tools. Because we have a very highly secure environment, we must be able to install it in multiple environments on multiple different servers. The fact that we could use the same tool in all our environments, on-prem and in the cloud, was very important to us."
    • "The testing and quality could really improve. Every time that there is a major release, we are very nervous about what is going to get broken. We have had a lot of experience with that, as even the latest one was broken. Some basic things get broken. That doesn't look good for Hitachi at all. If there is one place I would advise them to spend some money and do some effort, it is with the quality. It is not that hard to start putting in some unit tests so basic things don't get broken when they do a new release. That just looks horrible, especially for an organization like Hitachi."

    What is our primary use case?

    We run the payment systems for Canada. We use it as a typical ETL tool to transfer and modify data into a data warehouse. We have many different pipelines that we have built with it.

    How has it helped my organization?

    I love the fact that we haven't come up with a problem yet that we haven't been able to address with this tool. I really appreciate its maturity and the breadth of its capabilities.

    If we did not have this tool, we would probably have to use a whole different variety of tools, then our environment would be a lot more complicated.

    We develop metadata pipelines and use them.

    Flexible deployment, in any environment, is very important to us. That is the key reason why we ended up with these tools. Because we have a very highly secure environment, we must be able to install it in multiple environments on multiple different servers. The fact that we could use the same tool in all our environments, on-prem and in the cloud, was very important to us. 

    What is most valuable?

    Because it comes from an open-source background, it has so many different plugins. It is just extremely broad in what it can do. I appreciate that it has a very broad, wide spectrum of things that it can connect to and do. It has been around for a while, so it is mature and has a lot of things built into it. That is the biggest thing. 

    The visual nature of its development is a big plus. You don't need to have very strong developers to be able to work with it.

    We often have to drop down to JavaScript, but that is fine. I appreciate that it has the capability built-in. When you need to, you can drop down to a scripting language. This is important to us.

    What needs improvement?

    The documentation is very basic.

    The testing and quality could really improve. Every time that there is a major release, we are very nervous about what is going to get broken. We have had a lot of experience with that, as even the latest one was broken. Some basic things get broken. That doesn't look good for Hitachi at all. If there is one place I would advise them to spend some money and do some effort, it is with the quality. It is not that hard to start putting in some unit tests so basic things don't get broken when they do a new release. That just looks horrible, especially for an organization like Hitachi.

    For how long have I used the solution?

    Overall, I have been using it for about 10 years. At my current organization, I have been using it for about seven years. It was used a little bit at my previous organization as well.

    What do I think about the stability of the solution?

    The stability is not great, especially when you start patching it a lot because things get broken. That is not a great look. When you start patching, you are expecting things to get fixed, not new things to get broken.

    With modern programming, you build a lot of automated testing around your solution, and it is specifically for that. I changed this piece of code. Well, what else got broken? Obviously they don't have a lot of unit tests built into their code. They need to start doing that because it looks horrible when they change one thing, then two other things get broken. Then, they released that as a commercial product, which is horrible. Last time, somehow they broke the ability to connect with databases. That is something incredibly basic. How could you release this product without even testing for that?

    What do I think about the scalability of the solution?

    We don't have a huge amount of data, so I can't really answer how we could scale up to very large solutions.

    How are customer service and support?

    Lumada’s ability to quickly and effectively solve issues we have brought up is not great. We have a service for the solution with Hitachi. I don't get the sense that Pentaho, and Hitachi still calls it Pentaho, is a huge center of focus for them. 

    You kind of get help, but the people from whom you get help aren't necessarily super strong. It often goes around in circles forever. I eventually have to find my own solution. 

    I haven't found that the Hitachi support site has a depth of understanding for the solution. They can answer simple questions, but when it gets more in-depth, they have a lot of trouble answering questions. I don't think the support people have the depth of expertise to really deal with difficult questions.

    I would rate them as five out of 10. They are responsive and polite. I don't feel ignored or anything like that, just the depth of knowledge isn't there.

    How would you rate customer service and support?

    Neutral

    Which solution did I use previously and why did I switch?

    It has always been here. There was no solution like it until I got to the company.

    How was the initial setup?

    The initial setup was complex because we had to integrate with SAML. Even though they had some direction on that, it was really a do-it-yourself kind of thing. That was pretty complicated, so if they want to keep this product fresh, I think they have to work on making it integrate more with modern technology, like single sign-on and stuff like that. Every organization has that now and Pentaho doesn't have a good story for that. However, it is the platform that they don't give a lot of love to.

    It took us a long time to figure it out, something like two weeks.

    What was our ROI?

    This has reduced our ETL development time. If it wasn't for this solution, we would be doing custom coding. The reason why we are using the solution is because of its simplicity of development.

    What's my experience with pricing, setup cost, and licensing?

    The cost of these types of solutions are expensive. So, we really appreciate what we get for our money. Though, we don't think of the solution as a top-of-the-line solution or anything like that.

    Which other solutions did I evaluate?

    Apache has a project going on called Apache Hop. Because Pentaho was open sourced, people have taken and forged it. They are really modernizing the solution. As far as I know, Hitachi is not involved yet. I would highly advise them to get involved in that open-source project. It will be the next generation of Pentaho. If they get left behind, they're not going to have anything. It would be a very bad move to just ignore it. Hitachi should not ignore Apache Hop.

    What other advice do I have?

    I really like the data integration tool. However, it is part of a whole platform of tools, and it is obvious the other tools just don't get a lot of love. We are in it for Pentaho Data Integration (PDI) because that is what we want as our ETL tool. We use their reporting platform and stuff like that, but it is obvious that they just don't get a lot of love or concern.

    I haven't looked at the roadmap that much. We are also a Google customer using BigQuery, etc. Hitachi is really just a very niche part of what we do. Therefore, we are not generally looking very seriously at what Hitachi is doing with their products nor a big investor in what Hitachi is doing.

    I would recommend this specific Hitachi product to a friend or colleague, depending on their use case and need. If they have a very similar need, I would recommend it. I wouldn't be saying, "Oh, this is the best thing next to sliced bread," but say, "Hey, if this is what you need, this works well for us."

    On a scale of one to 10 for recommending the product, I would rate it as seven out of 10. Overall, I would also rate it as seven out of 10.

    We really appreciated the breadth of its capabilities. It is not the top-of-the-line solution, but you really get a lot for what you pay for.

    Which deployment model are you using for this solution?

    Hybrid Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Google
    Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
    PeerSpot user
    Director of Software Engineering at a healthcare company with 10,001+ employees
    Real User
    Reports on predictions that our product is doing. It would be nice if they could have analytics perform well on large volumes.
    Pros and Cons
    • "The way it has improved our product is by giving our users the ability to do ad hoc reports, which is very important to our users. We can do predictive analysis on trends coming in for contracts, which is what our product does. The product helps users decide which way to go based on the predictive analysis done by Pentaho. Pentaho is not doing predictions, but reporting on the predictions that our product is doing. This is a big part of our product."
    • "The performance could be improved. If they could have analytics perform well on large volumes, that would be a big deal for our products."

    What is our primary use case?

    We started using Pentaho for two purposes:

    1. As an ETL tool to bring data in. 
    2. As an analytics tool. 

    As our solution progressed, we dropped the ETL piece of Pentaho. We didn't end up using it. What remains in our product today is the analytics tool.

    We do a lot of simulations on our data with Pentaho reports. We use Pentaho's reporting capabilities to tell us how contracts need to be negotiated for optimal results by using the analytics tool within Pentaho.

    How has it helped my organization?

    This was an OEM solution for our product. The way it has improved our product is by giving our users the ability to do ad hoc reports, which is very important to our users. We can do predictive analysis on trends coming in for contracts, which is what our product does. The product helps users decide which way to go based on the predictive analysis done by Pentaho. Pentaho is not doing predictions, but reporting on the predictions that our product is doing. This is a big part of our product.

    What is most valuable?

    There is an end-to-end flow, where a user can say, "I am looking at this field and want to slice and dice my data based on these parameters." That flexibility is provided by Pentaho. This minimal manual coding is important to us.

    What needs improvement?

    The performance could be improved. If they could have analytics perform well on large volumes, that would be a big deal for our products.  

    For how long have I used the solution?

    I have been using it for eight years.

    What do I think about the stability of the solution?

    We are on-prem. Once the product was installed and up and running, I haven't had issues with the product going down or not being responsive.

    We have one technical lead who is responsible for making sure that we keep upgrading the solution so we are not on a version that is not supported anymore. In general, it is low maintenance.

    What do I think about the scalability of the solution?

    The only complaint that I have with Pentaho has been with scaling. As our data grew, we tested it with millions of records. When we started to implement it, we had clients that went from 80 million to 100 million. I think scale did present a problem with the clients. I know that Pentaho talks about being able to manage big data, which is much more data than what we have. I don't know if it was our architecture versus the product limitations, but we did have issues with scaling.

    Our product doesn't deal with big data at large. There are probably 17 million records. With those 17 million records, it performs well when it has been internally cached within Pentaho. However, if you are loading the dataset or querying it for the first time, then it does take awhile. Once it has been cached in Pentaho, the subsequent queries are reasonably fast.

    How are customer service and support?

    We haven't had a lot of functional issues. We had performance issues, especially early on, as we were trying to spin up this product. The response time from the support group has been a three on a scale of one to five.

    We had trouble with the performance and had their engineers come in. We shared our troubles and problems, then those engineers had brainstorming sessions. Their ability to solve problems was really good and I would rate that as four out of five.

    A lot of the problems were with the performance and scale of data that we had. It could have been that we didn't have a lot of upfront clean architecture. With the brainstorming sessions, we tried giving two sets of reports to users: 

    1. One was more summary level, which was quick, and that is what 80% of our clients use. 
    2. For 20% of our clients, we provided detailed reports that do take awhile. However, you are then not impacting performance for 80% of your clients. 

    This was a good solution or compromise that we reached from both a business and technology perspective. 

    Now, I feel like the product is doing well. It is almost like their team helped us with rearchitecting and building product expectations.

    How would you rate customer service and support?

    Positive

    Which solution did I use previously and why did I switch?

    Previously, we used to have something called QlikView, which is almost obsolete now. We had a lot of trouble with QlikView. Anytime processing was done, it would take a long time for those processed results to be loaded into QlikView's memory. This meant that there was a lot of time spent once an operation was done. Before users could see results or reports, it would take a couple of hours. We didn't want that lag. 

    Pentaho offered an option not to have that lag. It did not have its own in-memory database, where everything had to be loaded. That was one of the big reasons why we wanted to switch away from QlikView, and Pentaho fit that need.

    How was the initial setup?

    I would say the deployment/implementation process was straightforward enough for both data ingestion and analytics.

    When we started with the data ingestion, we went with something called Spoon. Then we realized, while it was a Pentaho product, Spoon was open source. We had integrated with the open source version of it, but later found that it didn't work for commercialization. 

    For us to integrate Pentaho and get it working, it took a couple of months because we needed to figure out authentication with Pentaho. So, learning and deployment within our environment took a couple of months. This includes the actual implementation and figuring out how to do what we wanted to do.

    Because this is a licensed product, the deployment for the client was a small part of the product's deployment. So, on an individual client basis, the deployment is easy and a small piece. 

    It gives us the flexibility to deploy it in any environment, which is important to us.

    If we went to the cloud version of Pentaho, that would be a big maintenance relief. We wouldn't have to worry about getting the latest version, installing it, and sending it out to our clients.

    What about the implementation team?

    For the deployment, we had people come in from Pentaho for a week or two. They were there with us through the process.

    Which other solutions did I evaluate?

    We looked at Tableau, Pentaho and an IBM solution. In the absence of Pentaho, we would have gone with either Tableau or building our own custom solution. When we were figuring out what third-party tool to use, we did an analysis and a bunch of other tools were compared. Ultimately, we went with Pentaho because it did have a wide variety of features and functionalities within its reports. Though I wasn't involved, there was a cost analysis done and Pentaho did favorably in terms of cost.

    For the product that we use Pentaho for, I think we're happy with their decision. There are a few other products in our product suite. Those products ended up using Tableau. I know that there have been discussions about considering Tableau over Pentaho in the future. 

    What other advice do I have?

    Engage Pentaho's architects early on, so you know what data architecture works best with the product. We built our database and structures, then had performance issues. However, it was too late when we brought in the Pentaho architects, because our data structure was out in the field with multiple clients. Therefore, I think engaging them early on in the data architecture process would be wise.

    I am not very familiar with Hitachi's roadmap and what is coming up for them. I know that they are good with sending out newsletters and keeping their customers in the know, but unfortunately, I am unaware of their roadmap.

    I feel like this product is doing well. There haven't been complaints and things are moving along. I would rate it as seven out of 10.

    Which deployment model are you using for this solution?

    On-premises
    Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
    PeerSpot user
    Technical Manager at a computer software company with 51-200 employees
    Real User
    Quite simple to learn and there is a lot of information available online
    Pros and Cons
    • "Pentaho Data Integration is quite simple to learn, and there is a lot of information available online."
    • "I'm still in the very recent stage concerning Pentaho Data Integration, but it can't really handle what I describe as "extreme data processing" i.e. when there is a huge amount of data to process. That is one area where Pentaho is still lacking."

    What is our primary use case?

    We have an event planning system, which enables us to obtain a large report. It includes data Mart or data warehouse data. This is where we take data from the IT online system and pass it to the data warehouse. Then, from the data warehouse, they generate reports. We have 6 developers who are using the Panel Data Integrator, but there are no end users. We deploy the product, and the customer uses it for reporting. We have one person who undertakes a regular maintenance activity when it is required.

    How has it helped my organization?

    As we are a software company, we are using the tools provided with the Pentaho Data Integration for our various teams.

    What is most valuable?

    Pentaho Data Integration is quite simple to learn, and there is a lot of information available online. It is not a steep learning curve. It also integrates easily with other databases and that is great. We use the provided documentation, which is a simple process for integration compared to other proprietary tools.

    What needs improvement?

    I don't think they market it that well. We can make suggestions for improvements but they don't seem to take the feedback on board. This contrasts with Informatica who are really helpful and seem to listen more to their customer feedback. I would also really like to see improved data capture. At the moment the emphasis seems to be on data processing. I would like to see a real-time processing data integration tool. This would provide instant reporting whenever the data changes. I'm still in the very recent stage concerning Pentaho Data Integration, but it can't really handle what I describe as "extreme data processing" i.e. when there is a huge amount of data to process. That is one area where Pentaho is still lacking.

    For how long have I used the solution?

    We have been using Pentaho Data Integration for 6 years. The customer is using Mirabilis Cloud, which is a public cloud. We are currently using version A.3.

    How are customer service and support?

    Technical Support is really good. To get our answers only takes a little bit of time.

    Which solution did I use previously and why did I switch?

    One of our customers was completely into the Microsoft core framework. We have to use SSIS because it's readily available with them, and is part of the system. We had to use it for five years. 

    As mentioned, one of our teams has worked with Informatica in the past. In terms of integration, Informatica isn't more powerful, but more accurate in some aspects. The community is also quite strong.

    How was the initial setup?

    The setup of Pentaho Data Integration is straightforward. 

    What about the implementation team?

    We implemented Pentaho Data Integration in-house. The current deployment has taken three months for the current set of requirements. We have another deployment in the pipeline where we are connecting other different data sources. These projects usually take a few months to complete.

    What's my experience with pricing, setup cost, and licensing?

    Sometimes we provide the licenses or the customer can procure their own licenses. Previously, we had an enterprise license. Currently, we are on a community license as this is adequate for our needs.

    What other advice do I have?

    For newcomers to the product, it is best to start with something simple. You can then scale it up fast as it is not a steep learning curve. If somebody wants to set up a good inbound integration platform, they can use the Panel Data Integrator. It's really simple and easy to use. The online community really helps you with numerous issues, such as licensing and a lot of other things. I would rate Pentaho Data Integration 8 out of 10.

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Other
    Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
    PeerSpot user
    Buyer's Guide
    Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros sharing their opinions.
    Updated: April 2024
    Product Categories
    Data Integration
    Buyer's Guide
    Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros sharing their opinions.