Try our new research platform with insights from 80,000+ expert users
PeerSpot user
Owner with 51-200 employees
Vendor
Pentaho BI Suite Review: Pentaho Reporting – Part 3 of 6

This is the third of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.

In this third part, we'll be discussing the tools and facilities, with which all of the reports are designed, generated, and served. A full BI suite should have a few reporting facilities that are usable by users with different level of technical/database knowledge.

Why is this important? Because in the real world, owners of data (people who consume the reports to make various business decisions) ranges from accountants, customer account managers, supply-chain managers, C-level executives, manufacturing managers, etc. Notice that proficiency in writing SQL queries a prerequisite to any of those positions?

In the Pentaho BI Suite, we have these reporting components:

  1. Pentaho Report Designer – A stand-alone program that are par with Jasper or iReport and to the lesser extent Crystal report designers.
  2. Pentaho Model Data Source – A way to encapsulate data sources which includes the most flexible of all, a SQL query. Once this is setup by the data personnel, data owners can use it to generate ad-hoc reports – and dashboards too, which we'll discuss in Part 5 of this review series.
  3. Saiku Reporting Tool – A convenient way to create ad-hoc reports based on the Pentaho Data Sources (see number 2 above).

Let's discuss each of these components individually. The screenshots below are sanitized to remove references to our actual clients. A fictitious company called “DonutWorld” is used to illustrate and relate the concepts.

This Java standalone program feels like the Eclipse Java development IDE because they share the UI library. If you are already familiar with Jasper Reports, iReports, or Crystal Report, the concepts are similar (bands, groups, details, sub-reports). You start with a master report in which you can combine different data sources (SQL and MDX queries in this case) into a layout that is managed via a set of properties.

Learning experience: As with any report designers, which are complex software because of the sheer number of tweak-able properties governing each element of the reports, one has to be prepared to learn the PRD. While the tools are laid out logically, it will take some time for a new personnel to absorb the main concepts. The sub-report facility is one of the most powerful feature of this program and it is the key to create reports that drills into more than one axis (or dimension) of data.

Usage experience: Things like the placement accuracy of elements within the page is not 100% precise and there are times when I had to work around the quirks and inconsistencies revolving around setting default values for properties, especially the ones containing formulas. Be prepared to have a dedicated personnel (either a permanent employee or a consultant) that can be reached for report designs *and* subsequent modifications. In addition, aesthetic considerations are also important in order to create a visually engaging reports (who wants to read a boring and bland report?).

Figure 1. The typical look of PRD when designing a report.

The Data Source facility is accessible from within the Pentaho BI Server UI (the PUC, see Part 2 of this review series for more information). Once you have logged in, look for a section on the screen that allows you to create or manage existing data sources.

This feature allows data personnel to setup “models” that can be constructed from various data sources, that represents a flat-view of data, of which a non-technical data owners can create ad-hoc reports or dashboards. Obviously this feature does not alleviate the need for knowing how to use the available tools for creating those reports and dashboards. It simply detach the dependency on crafting SQL/MDX queries and the intricacies of OLAP data structures from creating an ad-hoc report.

Learning experience: A data personnel who are familiar with the Data Warehouse (DW) can easily create models out of SQL queries against existing tables within the DW, or by using MDX queries against existing OLAP cubes. Data owners who are familiar with the data itself, can then start to use the Saiku Ad hoc Reporting tool or the CDE (Community-tools Dashboard Editor) to create dashboards. In reality, expect a couple of weeks for the personnels to get accustomed to this feature. Assumption: A knowledgeable BI teacher or consultant is available during this time. Usage experience: By separating the technical-database skill from the ability to generate ad-hoc reports, Pentaho has provided a way for organizations to streamline their business decision-making process further away from the technical minutiae that tends to bog down the process with details that are not relevant to the business goals. I highly rate this feature in the Pentaho BI Suite as one of the more innovative contribution to the area of Business Process Management.


Figure 2. Creating a model out of a SQL query

NOTE: The most important part of using this facility has to do more with business process than the familiarity of the data itself. Without a good process in place, it is quite obvious that the reports can get out of sync with the underlying data model. This is where the construction and maturity of the Data Warehouse is tested. For example, a DW with sufficient maturity will notify the data personnel of any data model changes which will trigger the updating of the Model Data Structure, which may or may not have an effect on the ad-hoc reports.

If the DW is designed correctly, there should be quite a few fact tables that can readily be translated into a Model Data Source. This is the first step. Now let's look at how to use this model.

Saiku is the name of two tools available from the PUC. The first one is the Saiku Analytics tool which allows us drill into an OLAP cube and perform analysis using aggregated measures (we'll review this in Part 4). The second one is the Saiku Ad-hoc Reporting tool. This is the one we are going to look into at this time. Using the modern UI library such as jQuery, the developers of Saiku give us a convenient drag-and-drop UI that is easy to learn and use.

Once a model is published, it will be available to choose from the drop-down list on the top left of the Saiku Ad-hoc Reporting tool. See the screenshot below:
Figure 3. A Saiku report in progress

Next, you can start to choose from the list of available fields in the model to specify as part of either the Columns list, or Groups list. Next, from the same list of available fields, you can specify some values as filters. The most obvious example would be the transaction date and time range which determines what period is the report for.

As you select the fields into the proper report elements, the tool started to populate the preview area with what the report would look like. You can also specify aggregation for each of the groupings, which is very handy.

There is a limited control on templates which governs the appearance of the report, but obviously won't be enough for serious usages. The best remedy however, is available, via the exporting to .prpt file, which you can open in the PRD and tweak to your heart's content.

After you are happy with the report, you can save it for later editing. Another thoughtful design decision by the Pentaho team.

In overall, the Saiku Ad-hoc Reporting tool is a handy facility to craft quick reports that answer specific questions based on the available model data sources. If your data personnel diligently updates and maintains the models, this tool can be invaluable to support your business decisions.

None of the above discussions would mean a whole lot without a practical and useful way for the reports to be delivered to its requesters. Here, the comprehensive nature of the Pentaho BI Suite helps by providing the facilities like xaction and input UI controls for report parameters.

For example a report designed in PRD can be published on the PUC. At some point it is opened by the user on the PUC who supplies the necessary parameters, then the xaction script fire an ETL which renders a .prpt file into a .pdf and either email it to the requester or drop it in a shared folder.

Reports can also be “burst” via an ETL script that utilizes the Pentaho Reporting Output step available from within Spoon (the ETL editor). I have used this method to distribute periodically-generated reports to different recipients containing data that is specific to the said recipient's access permission level. This saves a lot of time and increased the efficiency of up-to-date information distribution inside a company.

The reporting tools in the Pentaho BI Suite is designed to allow different users within the company to generate reports that are either pre-designed or ad-hoc. The reports are made available on the Pentaho User Console (PUC) where users login and initiate the report generation. Reports can also be scheduled to be generated via ETL scripts.

The PRD will be instantly recognizable by anyone who has experience using tools like Crystal Reports and its derivatives. You can also specify MDX queries against any OLAP cube schema published in the Pentaho BI Server as a data source.

The Model Data Source facility allows data owners who are not data personnels to create ad-hoc reports quickly and save it for future use and modifications.

The Saiku Ad-Hoc report is the UI with which available models can be used to generate reports on-the-fly. These reports can also be saved for later use.

Next in part-four, we will discuss the Pentaho Mondrian (MDX query engine) and the OLAP Cube Schema tools.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
it_user6855 - PeerSpot reviewer
CEO with 51-200 employees
Vendor
Very capable suite of BI, reporting, and data mining tools with sophisticated functionality

Verdict:
This is a very capable suite of BI, reporting, and data mining tools with sophisticated functionality, and will address the needs of many organisations.

Pentaho BI Suite Community Edition (CE) includes ETL, OLAP, metadata, data mining, reporting and dashboards. This is a very broad capability and forms the basis for the commercial offering provided by Pentaho. A variety of open source solutions are brought together to deliver the functionality including Weka for data mining, Kettle for data integration, Mondrian for OLAP and several others to address reporting, BI, dashboards, OLAP analytics and big data.

The Pentaho BI platform provides the environment for building BI solutions and includes authentication, a rules engine and web services. It includes a solution engine that facilitates the integration of reporting, analysis, dashboards and data mining. Pentaho BI server supports web based report management, application integration and workflow.

The Pentaho Report Designer, Reporting Engine and Reporting SDK support the creation of relational and analytical reports with many output formats and data sources.

If you want a version with support, training and consulting, as well as a few more bells and whistles then Pentaho provide such services and product.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
it_user4401 - PeerSpot reviewer
it_user4401Developer at a transportation company with 1,001-5,000 employees
Vendor

I also built warehouses on Windows and on Ubuntu, but from my point of view the warehouse built on Windows worked better. I will always advise developers to use Pentaho on Windows, but I respect your opinion.

See all 3 comments
Buyer's Guide
Pentaho Business Analytics
May 2025
Learn what your peers think about Pentaho Business Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: May 2025.
857,028 professionals have used our research since 2012.
PeerSpot user
Owner with 51-200 employees
Vendor
Pentaho BI Suite Review: Pentaho BI Server – Part 2 of 6

Introduction

This is the second of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.

In this second part, we'll be discussing the Pentaho BI Server from which all of the reports, dashboards, and analytic tools are served to the users. A BI suite usually has a central place where users log in using their assigned credentials. In this case, the server is a standalone web server (an Apache Tomcat instance) that is augmented by various tools that provides the functionalities – most of these tools are written by Webdetails (webdetails.pt). We'll visit these tools in subsequent review parts, for now, let's focus on the server itself.

In the case of Pentaho BI Server, it has two components:

  • The Pentaho User Console (a.k.a PUC) – this is what we usually associate with the main BI Server in the Pentaho world; where users would spend the majority of their time generating reports (both real-time or scheduled), using the analytic tools, build and publish dashboards, etc. This is also where administrator users can manage who can access which reports either by User or by Role – obviously, Role-based ACL is cleaner and easier to maintain.

  • The Administration Console (a.k.a PAC) – this is where admin users go to create new Users, Roles, and schedule jobs. It is another standalone web server that can be started and stopped when needed, it is totally independent of the main PUC server.

Is it Corporate-Ready?

BI servers are considered ready for corporate “demands” based on the number of users they can support, and the facilities to manage them. The Pentaho BI Suite Enterprise Edition is without a doubt ready for corporate use because it comes with the support that will make sure that is the case.

The Community Edition is more interesting, it is definitely corporate ready, but the personnels who set it up needs to be intimately familiar with the ins and outs of the server itself. Having installed three of these, I am confident that the BI Server, due to its built in ACL management is ready for prime time in the corporate world.

Although the Pentaho BI server includes a scheduler, another “corporate” feature, I find myself using cron (or Windows Task Scheduler) for the most part. The built-in scheduler is based on the Quartz library for Java. It is a good facility with decent UI to schedule reports or ETL from within the PUC.

Is it Easy to Use?

The PAC is very easy to use. The UI interface is simple enough due to the minimum numbers of menus and options. In a sense, it's a simple facility to manage user/role and scheduling – not ACL, just users and roles.

The PUC is more involved, but adopting the familiar file folder look and feel on the left panel, it is quite easy to get into and start using. Administrators would love the way they can set who can Execute, Edit, Schedule each reports, saved analytic views, and dashboards – by the way, Pentaho calls these: Solutions.

Setting up the BI server is better left to the consultants who are used to doing it. Or if there are in-house personnels who would be doing this, it is worth the time to participate in the training webinars that Pentaho held periodically. The steps to setup a BI server far from being simple, but that is the case for all BI servers, regardless the brand.

The collapsible left panel serves as the directory of the solutions, with the top part shows the folders, and the bottom part shows the individual solution. The bigger panel on the right is where you actually see the content of the solutions. And in some cases, that's where you'd create a Dashboard using the CDE tool (we'll revisit this in later review part).

Is it Easy to Create Solutions?

Remember that the concept “solution” here refer to the different types of reports, dashboards, analytic views. Pentaho BI server employs a “glue” scripting facility called the xactions. These are XML documents that contain some sequence of actions that can do various things like:

  1. Asking users for input parameters

  2. Issuing a SQL query based on user input

  3. Trigger an ETL that produce reports

Once you are familiar with this facility, it is not that hard to start producing solutions, but it pays to install the included examples and study them to find out how to do certain things with xaction and/or to copy snippets into your own scripts.

On the PUC, we can build these solutions:

  1. Dashboards using CDE

  2. Ad-hoc reports and data model using the built in Model generator (very handy for accessing those BI tables that are populated by ETL runs)

  3. Analytic Views using tools like Saiku or its equivalent for the Professional and Enterprise edition. NOTE: This requires a pre-published schema which is built using another tool called the schema-workbench (we will see this in the latter parts of this review series)

Is it Customizable?

Being the user-facing tool, one of the requirement would be the ability to customize the appearance via themes, at the very least, a BI server need to allow companies to change the logo into their own.

The good news is, you can do all that with Pentaho BI Server. If you opt for the Professional and Enterprise editions, you can rely on the support that you already paid for. For those using the Community Edition, customizing the appearance requires knowledge on how a typical Java Web Server is structured. Again, any good BI consultant should be able to tackle this without too much difficulties.

Here is an example of a customized PUC login page:

In case you are wondering, yes, you can customize the PUC interface also, and it even comes with a theme structure in which you can assign your graphic artists to redefine the CSS elements.

Summary

The Pentaho BI server, is the central place where users are going to interact with Pentaho BI Suite. It brings together solutions (what Pentaho call contents) produced by the other tools in the suite, and expose it to the user while being protected by a robust ACL.

On the balance between ease-of-use and the ability to customize, the Pentaho BI Server scores well provided that the personnel in charge is familiar with the Java Enterprise environment. To illustrate this, in one project, I managed to tweak the security framework to make the PUC part of a single-sign-on Liferay portal, along with other applications such as Opentaps and Alfresco.

Next in part-three, we will discuss the wide array of Pentaho Reporting tools.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
PeerSpot user
Owner with 51-200 employees
Vendor
Pentaho BI Suite Review: PDI – Part 1 of 6

Introduction

The Pentaho BI Suite is one of the more comprehensive BI suite that is also available as an Open Source project (the Community Edition). Interestingly, the absence of license fees is far from being the only factor in choosing this particular tool to build your Data Warehouses (OLAP systems).

This is the first of a six-part review of the BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.

In this first part, we'll be discussing the Pentaho Data Integration (from here on will be referred to as PDI) which is the ETL tool that comes with the suite. An ETL tool is the means with which you input data from various sources – typically out of some transactional systems, then transform the format and flow into another data model that is OLAP-friendly. Therefore it acts as the gateway into using the other parts of the BI suite.

In the case of PDI, it has two components:

  • Spoon (the GUI), where you string together a set of Steps within a Transformation and optionally string multiple Transformations within a single Job. This is where you would spend the bulk of your time developing ETL scripts.

  • The accompanying set of command-line scripts that we can configure to be launched from a scheduler like cron or Windows Task Scheduler. Notably pan a single Transformation runner, kitchen the Job runner, and carte the slave-server runner. These tools give us the flexibility to create our own network of multi-tiered notification system, should we need to.

Is it Feature-Complete'

ETL tools are interesting because anyone who has implemented a BI system have a standard list of major features expected to be available. This standard list does not change from one tool brand to the other. Let's see how PDI fares:

  1. Serialized vs Parallel ETL processing: PDI handles parallel (async.) steps using Transformations, which can be strung together in a Job when we need a serialized sequences.

  2. Parameter-handling: PDI has a property file that allows us to parameterize things that are specific to different platforms (dev/test/prod) such as database name, credentials, external servers. It also features parameters that can be created during the ETL run out of the data in the stream, then passed on from one Transformation to another within a Job.

  3. Script management: Just like any other IT documents (or as some call it artifacts), ETL scripts need to be managed, version-controlled, and documented. PDI scores high on this front. Not because of some specific features, instead, due to design decisions that favor simplicity: The scripts are plain XML documents. That makes it very easy to manage, version-control, and if necessary batch-edit. NOTE: For those who wants enterprise level script management and version-control built into the tool, Pentaho made it available as part of their Enterprise offerings. But for the rest of us who already have a document management process – because we also develop software using other tools – it is not as crucial.

  4. Clustering: PDI supports round-robin -style load-balancing given a set of slave-servers. For those using Hadoop clusters, Pentaho recently added their support to run Jobs on those.

Is it Easy to Use'

With the drag and drop graphical UI approach, the ease of use is a given. It is quite easy to string together steps to accomplish the ETL process. The trick is knowing which steps to use, and when to use it.

The documentation on how to use each step can stand improvements that fortunately, slowly over the years have started to catch up – and should you have the budget, you can always pay for support that comes with the Enterprise Edition. But overall, it is a matter of using those enough to be familiar with the use cases.

This is why competent BI consultants are worth their weights in gold because they have been in the trenches, and have accumulated ways to deal with the quirks which is bound to be encountered in a software system this complex (not just Pentaho, this applies to any BI Suite products out there).



NOTE: I feel obligated to point out one (very) annoying fact that I cannot hit the Enter key to edit the selected step. Think about how many times we would use this functionality on any ETL tool.

Aside from that, in the few years that I've used various versions of the GUI, I've never encountered severe data loss due to stability problems.

Another measurement of ease-of-use that I evaluate a tool with is: How easy it is to debug the ETL scripts. With PDI, the logical structures of the scripts could be easily followed, therefore it's quite debug-friendly.

Is it Extensible'

It may be a strange question at first, but let us think about it. One of the purpose of using an ETL tool is to deal with a variety of data sources. No matter how comprehensive the included data format readers/writers, sooner or later you would have to talk to a proprietary system that is not widely-known. We had to do this once for one of our clients. We ended up writing a custom PDI step that communicates with the XML-RPC backend of an ERP system.

The good news is, with PDI, anyone with some Java SDK development experience, can readily implement the published interfaces and thus creating their own custom Transformation steps. In this regard, I am quite impressed with the modular design, that allows users to extend the functionality and consequently, the usefulness of the tool.

The scripting ability built into the Steps is also one of the ways to handle proprietary – or extremely complex data. PDI allows us to write Javascript (and Java, should you want faster performance) programs to manipulate the data both at the row level as well as pre- and post- run, which comes very handy to handle variable initializations or sending notifications that contain statistical info about all of the rows.

Summary

The PDI, is one of the jewels in the Pentaho BI Suite. Aside from some minor inconveniences within the GUI tool, the simplicity, extensibility, and stability of the whole package makes PDI a good tool for building a network of ETLs marshaling data from one end of the systems to another. In some cases, it even serves well as a development tool for the batch-processing side of an OLTP system.

Next in part-two, we will discuss the Pentaho BI Server.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Download our free Pentaho Business Analytics Report and get advice and tips from experienced pros sharing their opinions.
Updated: May 2025
Buyer's Guide
Download our free Pentaho Business Analytics Report and get advice and tips from experienced pros sharing their opinions.