Analytics Team Leader at HealtheLink
Real User
Enables us to manage our workload and generate a high volume of reporting
Pros and Cons
  • "We're using the PDI and the repository function, and they give us the ability to easily generate reporting and output, and to access data. We also like the ability to schedule."
  • "Since Hitachi took over, I don't feel that the documentation is as good within the solution. It used to have very good help built right in."

What is our primary use case?

We use it to connect to multiple databases and generate reporting. We also have ETL processes running on it.

Portions of it are in AWS, but we also have desktop access.

How has it helped my organization?

The solution has allowed us to automate reporting by automating its scheduling. 

It is also important to us that the solution enables you to leverage metadata to automate data pipeline templates and reuse them. It allows us to generate reports with fewer resources.

If we didn't have this solution, we wouldn't be able to manage our workload or generate the volume of reporting that we currently do. It's very important for us that it provides a single, end-to-end data management experience from ingestion to insights. We are a high-volume department and without those features, we wouldn't be able to manage the current workload.

What is most valuable?

We're using the PDI and the repository function, and they give us the ability to easily generate reporting and output, and to access data. We also like the ability to schedule.

What needs improvement?

Since Hitachi took over, I don't feel that the documentation is as good within the solution. It used to have very good help built right in. There's good documentation when you go to the site but the help function within the solution hasn't been as good since Hitachi took over.

Buyer's Guide
Pentaho Data Integration and Analytics
April 2024
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
770,292 professionals have used our research since 2012.

For how long have I used the solution?

I've been using Lumada Data Integration since 2016, but the company has been using it much longer.

We are currently on version 8.3, but we're going to be doing an upgrade to 9.2 next month.

What do I think about the stability of the solution?

The stability is good. We haven't had any issues related to Pentaho.

What do I think about the scalability of the solution?

Its scalability is very good. We use it with multiple, large databases. We've added to it over time and it scales.

We have about 10 users of the solution including a data quality manager, clinical analyst, healthcare informatics analysts, senior healthcare informatics analyst, and an analytics team leader. It's used very extensively by all of those job roles in their day-to-day work. When we add additional staff members, they routinely get access to and are trained on the solution.

How are customer service and support?

Their ability to quickly and effectively solve issues we have brought up is very good. They have a ticketing system and they're very responsive to any tickets we enter. And that's true not only for issues but if we have questions about functionality.

How would you rate customer service and support?

Positive

How was the initial setup?

The solution is very flexible. It's pretty easy to set up connections within the solution.

Maintenance isn't required day-to-day. Our technical staff does the upgrades. They also, on occasion, have to do things like restarting the services, but that's typically related to server issues, not Pentaho itself.

What other advice do I have?

My advice would be to take advantage of the training that's offered.

The query performance of Lumada on large data sets is good, but the query performance is really only as good as the server.

In terms of Hitachi's roadmap, we haven't seen it in a little while. We did have a concern that they're going to be going away from Pentaho and rolling it into another product and we're not quite sure what the result of that is going to be. We don't have a good understanding of what's going to change. That's the concern.

We currently only use Pentaho. We don't have other Hitachi products but we're satisfied with it. We would recommend Pentaho.

Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
PeerSpot user
CEO with 51-200 employees
Vendor
Easy to use and has a nice GUI. The json input needs to perform better.

What is most valuable?

Ease of use, stability, graphical interface, small amount of "voodoo" and cost.

What needs improvement?

There some steps that should perform better like the json input, but because of the flexibility we at inflow, override it by using scripting steps. Of course it's ideal to use the steps that come with the software but if you can write your own step that's powerful. Also, it would be nice to have the drivers for the data sources shipped with Pentaho Kettle instead of looking for the right ones on the Internet.

What was my experience with deployment of the solution?

In every project there are issues with the deployment, but we were able to overcome them.

What do I think about the stability of the solution?

I think that Pentaho Kettle is a stable software, if it wasn't, we wouldn't have used it (because we don’t like angry customers).

What do I think about the scalability of the solution?

Actually, Pentaho Kettle comes equipped with the option to scale out, out of the box.
And no, we didn't encountered specific scalability problems.

How are customer service and technical support?

Customer Service:

We mainly use material which was written over the years (pentaho kettle materials), the forum, Matt casters blog, videos, etc. We also try to solve our issues inside the company for our customers before contacting customer service. We even developed a full-scale data integration Pentaho Kettle online course and built a website for it.

When we use the customer service it's very good. There is a large community for the tool, people gladly help each another.

Technical Support:

Very good support.

Which solution did I use previously and why did I switch?

Before Pentaho Kettle we used stored procedures, writing code and also Informatica. Informatica is a very good tool, but it is not open source so it is far more costly compared to Pentaho Kettle. From my perspective I don't see the difference, we can do almost everything with Pentaho Kettle and if we need a little extra we are tech guys, we solve it.

Of course that from the customer's perspective the cheaper the better, so if the customer has a smaller budget, they get more when using Pentaho Kettle open source. Even with the Pentaho Kettle enterprise edition.

What's my experience with pricing, setup cost, and licensing?

I can say from the vendor perspective- usually the part of the data integration (from data source to the warehouse/target) takes at least 60% of the whole initial business intelligence project. It depends on the data sources and complexity, for example: big data, NoSql, xml, web services, "weird" files and more.

After the data integration project is "live" it will work fine until someone breaks something. (Network connectivity, servers, DBA that changes the data source, or any other change for that matter that changes variables that the data integration was built upon) but this is true for all data integration software.

The day-to-day costs are very low if there are no new requirements. Luckily for us (as a vendor) once the customer starts and the users get their fancy reports and dashboards there's no turning back, and the requirements are piling up. But these are new requirements, not maintenance.

What other advice do I have?

Instead of trying to decide on a specific data integration tool, pick the right vendor partner, not a biased one. They will be able to recommend the set of tools you need according to your requirements and budget.

Business intelligence project are made up of at least three components:

  • 1. Data integration tool
  • 2. Data warehouse tool
  • 3. Visualization tool

Several of the software vendors have them all, but not the best solution for each component. From my experience it's better to combine solutions. (Unless it is a small project.)

For example: data integration from Pentaho Kettle, if it's big data we need an in memory/ columnar database for data warehouse but if it's not we can use traditional databases (SQL Server, Oracle, even MySQL for smaller projects) and a BI visualization tool like Yellowfin/Tableau/Sisense/etc.

In the middle you have tens of software vendors that can be suitable for the customer needs.

Of course if the vendor partner is biased then suddenly Tableau/Sisense/Qlikview/etc. become the best data integration tool. Or "you don’t need a data integration tool at all" although they don't have the right components. (They are very good tools for visualization but not for "playing" extract and transform complex data). We work with several vendors such as Sisense and Yellowfin which are are great tool for the specific solution they were made for.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Pentaho Data Integration and Analytics
April 2024
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
770,292 professionals have used our research since 2012.
it_user373128 - PeerSpot reviewer
Data Architect & ETL Lead at a financial services firm with 1,001-5,000 employees
Vendor
It doesn't have the capability to produce crosstab reports with formatting capabilities. It connects seamlessly to most commonly used data sources.​​

Valuable Features

It is a lightweight ETL tool that's easy to get started on. It connects seamlessly to most commonly used data sources.

Improvements to My Organization

The organization went with Pentaho ETL and Reporting solutions as cost effective products, as compared to competitors. The ETL part certainly met those objectives, along with serving the purpose.

Room for Improvement

Since there have already been newer versions, maybe some of these features are already fixed now. The most troublesome missing feature was the capability to produce crosstab reports with formatting capabilities in the BI Reporting product. The one annoyance that troubled us a lot was the fact that every step in a transformation that needed data, created its own data connection. With some data sources like Greenplum, this was a problem, because they have a limit on available number of connections.

Use of Solution

I used it for three years, from 2012 to 2015, and only stopped as I left the organization.

Deployment Issues

One issue with encountered constantly with PDI deployments was that the environment parameters for jobs had to be updated manually through the designer module 'Spoon'. Although the product has a feature of keeping Environment Variables outside Spoon, that didn't work for us, as we had one Development server used for Dev, QA and UAT.

Stability Issues

There were no issues with the stability.

Scalability Issues

We had no issues scaling it across the company as needed.

Customer Service and Technical Support

It's about average. Most of the help we got was through Google searches and Wiki pages. One time we had an issue with a feature - our version of PDI could not handle microseconds. The product owner came up with a solution, but instead of applying the patch, wanted to sell it to us for a fee.

Initial Setup

I am only aware of the client side setup which was simple enough. It was pretty much a one step installation process.

Implementation Team

It was done by an in-house team. A couple of issues we realized later were regarding memory configuration for the environment. This needs to be evaluated and fine tuned otherwise you can run into job failures with large amount of data. We ran into this issue with 'Commit' points and 'Sort' steps.

Other Solutions Considered

There was an evaluation performed, however I was not involved in it.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user414117 - PeerSpot reviewer
Senior Data Engineer at a tech company with 501-1,000 employees
Vendor
It enables a technical product manager to be able to write ETL jobs themselves.

What is most valuable?

The most valuable thing for me is that it enables a technical product manager to be able to write ETL jobs themselves, which saves developers time so that they can do more important things.

How has it helped my organization?

Now developers focus on improving it as a tool (since it's open source) and teach Project Managers about it. The Project Managers are the ones responsible for their own ETL jobs as they know what they want, so hence it's best for them to manage their own jobs.

What needs improvement?

Its performance can be improved so it will work better with Big Data. Also, sometimes it can be very buggy which keeps away some potential users.

For how long have I used the solution?

I've used it for two years.

What was my experience with deployment of the solution?

We have had no issues with the deployment.

What do I think about the stability of the solution?

The performance for Big Data needs to be improved.

What do I think about the scalability of the solution?

We have had no issues scaling it for our needs.

How are customer service and technical support?

There is a community that can support limited technical help. I'll give a 6 to the community since it's not very active.

Which solution did I use previously and why did I switch?

It was already in place when I joined the company.

How was the initial setup?

It's very easy to install.

What about the implementation team?

We did it in-hous. It's worth it to have someone in your company who knows Pentaho really well.

What was our ROI?

ROI is pretty good since it is kind of a major thing in our company.

What's my experience with pricing, setup cost, and licensing?

The only cost is the time it takes for the developer to get to know it.

What other advice do I have?

If your ETL jobs are small and straightforward, then this solution is definitely worth it.

Disclosure: My company has a business relationship with this vendor other than being a customer: The company is also contributing back to the open source project.
PeerSpot user
it_user382572 - PeerSpot reviewer
Pentaho Consultant at a comms service provider with 10,001+ employees
Vendor
It is an open source product it is very easy to build your own solution against it.

What is most valuable?

It is a very good open source ETL tool that's capable of connecting to most databases. It has a lot of functions that makes transforming the data very easy. Also, because it is an open source product, it is very easy to build your own solution with it.

How has it helped my organization?

It is also possible to build a new solution quit quick so the customer sees results quite fast.

What needs improvement?

In the community version the scheduling tool is not good, and we had to build it ourselves.

For how long have I used the solution?

I have worked with different versions of Pentaho since 2009.

What was my experience with deployment of the solution?

There are a couple of bugs in the newer versions. We were forced to wait until those bugs were fixed before we could upgrade.

What do I think about the stability of the solution?

There were no issues with its stability.

What do I think about the scalability of the solution?

There have been no issues scaling it.

How are customer service and technical support?

Because we use the community edition, there is no support from the vendor. When I worked with the Enterprise edition last year the technical support was quick and to the point. I was more than happy with their knowledge.

Which solution did I use previously and why did I switch?

In the past I also worked with SAP BI. The main reason we switched to Pentaho was the cost of SAP. Because of the flexibility of Pentaho, I prefer to work with it.

How was the initial setup?

When I started using Pentaho in 2009 the initial setup was quit complex, mainly because of a lack of good documentation at that time. Since then, it has dramatically improved. Also the community on the web is quit active and there are some good blogs.

What about the implementation team?

I was hired to do the implementation. I think it is necessary to have a good understanding of the product to implement is well so I would recommend, when not in-house, to hire the appropriate knowledge

What other advice do I have?

When you don’t have the knowledge of the product I would recommend to follow some courses in to speed up the learning curve. A cheap way to start with Pentaho is using the Community Edition. You can do almost everything with it and the purchase of the Enterprise Edition is not necessary

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Data Developer at a tech services company with 10,001+ employees
Consultant
It is possible to understand how to develop an ETL solution even when using it for the first time.

What is most valuable?

  • Pentaho Kettle has a very intuitive and easy to use graphical user interface (GUI)
  • It is possible to understand how to develop an ETL solution even when using it for the first time
  • The Community Edition is free and very efficient
  • They have versions for Windows, Linux and Mac
  • Large selection of options.

How has it helped my organization?

We have developed some complex ETL processes for some clients and they are very satisfied with the results.

What needs improvement?

They could improve the logging generator. Sometimes the error description is so generic that it is not possible to detect the problem.

For how long have I used the solution?

We've used it for three years.

What was my experience with deployment of the solution?

There were no issues with the deployment.

What do I think about the stability of the solution?

There were no issues with the stability.

What do I think about the scalability of the solution?

There have been no issues scaling it.

How are customer service and technical support?

I use the Community Edition without support or customer service. I recommend the Pentaho Community Forums for technical issues.

Which solution did I use previously and why did I switch?

I have used Informatica PowerCenter, which is an excellent solution. However, it´s not so easy to use as Pentaho kettle.

How was the initial setup?

The initial setup is straightforward. All you need to do is to download it, unzip the file into a folder and execute the Spoon.bat (for Windows) or Spoon.sh (for Linux) to start the graphical user interface (GUI).

What about the implementation team?

In-house. The implementation is very simple. Data developers will not encounter difficulties to implement ETL solutions.

What's my experience with pricing, setup cost, and licensing?

The community edition is free. If you need a full BI solution, I would recommend the enterprise edition.

What other advice do I have?

Pentaho Kettle is an excellent solution to implement ETL process.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
IT-Services Manager & Solution Architect at Stratis
Real User
Free to use, easy to set up, and has great UI
Pros and Cons
  • "It's my understanding that the product can scale."
  • "The product needs more plugins."

What is our primary use case?

We basically receive information from our clients via Excel. We take this information and transform it in order to create some data marks.

With this information, on these processes we are running right now, we receive new data every day. The solution processes the Excels and creates a data mark for them.

While we read the data and transform it as well as put it in a database, in order to explore the information, we need an analytics solution for that - and that is typically Microsoft's solution, Power BI.

What is most valuable?

Running itself with the ETL was very fast. It makes it so that it is very easy to transform the information we have. We found that very useful. 

The UI is very easy to understand and learn.

The solution offers lots of documentation.

The initial setup is easy.

It's my understanding that the product can scale.

We've found the solution to be stable. 

The product is free to use if you choose the free version.

What needs improvement?

The solution needs better, higher-quality documentation, similar to AWS. Right now, we find that although documentation exists, it's not easy to find the answers we seek.

I have tried some cloud services with the ETL, so perhaps that would be good to add.

The product needs more plugins. Right now, it just has a standard database connection and there are other solutions there that can have straightforward connections for Oracle, MySQL, and stuff like that. However, more plugins would make it a much better product.

For how long have I used the solution?

We recently finished two projects with Pentaho.

What do I think about the stability of the solution?

The product is stable. There are no bugs or glitches. It doesn't crash or freeze. It's reliable. 

What do I think about the scalability of the solution?

According to the documentation, it's quite scalable. That said, I haven't tried to expand it. We just use a single server and that's all we need right now. We don't have plans to increase usage.

We have three people who use the solution currently.

How are customer service and technical support?

We don't really use support. We tend to do everything on our own and solve any problems we have ourselves. We basically have just read the manuals and that's about it. 

How was the initial setup?

The initial setup is not complex or difficult. It's straightforward. 

The deployment process takes about two weeks. 

We had two people who handled the deployment process. They were an AWS DevOps person and a Pentaho expert.

What's my experience with pricing, setup cost, and licensing?

We do not pay any license costs. We use a free version of the product.

What other advice do I have?

I'm a consultant and an end-user.

I downloaded the latest version of the solution. I can't speak to the version number. 

I'd rate the solution at an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user402600 - PeerSpot reviewer
Senior Consultant at a financial services firm with 10,001+ employees
Real User
Needs improvement on the Hadoop and JMS plugins.

Valuable Features:

It allows for rapid prototyping of a wide array of ETL workloads.

Room for Improvement:

Support for common Hadoop utilities can be expanded, such as bulk load with composite row keys for HBase, and include drivers for Impala out-of-the-box. A richer interface to Hive could also be beneficial as we currently have to go through a raw connection and execute SQL scripts, for which some syntax is not respected.

As of version 6, there are also some new issues introduced that pose a bit of an annoyance:


1) On kettle's ramp up - log4j errors

2) IBM Websphere MQ Producer - variable substitution for the URL does not work - you have to hardcode.

3) shared.xml for DB connections - variable substitution for connection properties does not work - have to hardcode things like Kerberos principal for a Hive/Impala connection.

Deployment Issues:

We had no issues deploying it.

Scalability Issues:

The robustness of this solution in a production cluster (>30 nodes) remains to be seen.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros sharing their opinions.
Updated: April 2024
Product Categories
Data Integration
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros sharing their opinions.