PeerSpot user
Senior Consultant at a consumer goods company with 1,001-5,000 employees
Consultant
The Data Integration graphical drag and drop design is easy for new users to follow and can increase productivity.

What is most valuable?

Pentaho Business Analytics platform overall is an outstanding product that offers great cost saving solutions for companies of all sizes. The Pentaho Business Analytics platform is built on top of several underlying open source projects driven by the community’s contributions. There are several features that I find invaluable and with each release, improvements are made.

The Pentaho User Console provides a portal for users that makes it easy for users to explore information interactively. Dashboard Reporting, scheduling jobs, and managing data connections are some of the features that are made easy with the console. For more advanced users you can extend Pentaho Analyzer with custom visualizations or create reporting solutions with Ctools. The Marketplace empowers the community to develop new and innovative plugins and simplifies the installation process of the plugins for the users of the console. The plugin framework provides a plugin contributor that extends the core services offered by the BI Server.

Pentaho Data Integration (Spoon) is also another valuable tool for development. Spoon delivers powerful extraction, transformation, and load capabilities using a Metadata approach. The Data Integration graphical drag and drop design is easy for new users to follow and can increase productivity. More advanced users can extend Pentaho Data Integration creating transformations and jobs dynamically.

How has it helped my organization?

My company was able to reduce software costs and hire additional staff given the cost savings that Pentaho provided. We are moving towards a Hadoop environment after the migration of our current ETL processes and Pentaho’s easy to use development tools and big data analytics capabilities were a factor in choosing Pentaho as a solution.

What needs improvement?

For those that run the open source community edition at times it can be difficult to find updated references for support. Even for companies that use the Enterprise Edition finding useful resources when a problem occurs can be difficult. Pentaho driven best practices should be made available to both the Community and Enterprise users to motivate and empower more users to use the solutions effectively.

How are customer service and support?

Pentaho has stellar support services with extremely intelligent Pentaho and Hitachi consultants all over the world. Those support services and documentation are made available to Enterprise clients that have purchased the Enterprise Edition and have access to the support portal.

Buyer's Guide
Pentaho Business Analytics
April 2024
Learn what your peers think about Pentaho Business Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,857 professionals have used our research since 2012.

How was the initial setup?

Pentaho is easy to deploy, easy to use and maintain. It’s low cost and a fully supported business intelligence solution. I have used Pentaho in small and large organizations with great success.

What's my experience with pricing, setup cost, and licensing?

Enterprise licenses can be paid for the Enterprise Pentaho full service solution which offers support through the portal and access to Pentaho/Hitachi Consultants for additional costs.

What other advice do I have?

Pentaho offers a community edition which is an open source solution and can be downloaded for free. The community edition truly gives most companies everything they need but your solution needs are matched with your business needs. As a cost cutting option Enterprise license fees can be paid to vendors to fund in demand support.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Administrative Assistant at a university with 10,001+ employees
Real User
Top 20Leaderboard
Makes it easy to develop data flows and has a wide range of database connections
Pros and Cons
  • "Pentaho Business Analytics' best features include the ease of developing data flows and the wide range of options to connect to databases, including those on the cloud."
  • "Pentaho Business Analytics' user interface is outdated."

What is our primary use case?

I primarily use Pentaho Business Analytics to create ETL processes, monitoring processes, and hierarchies.

What is most valuable?

Pentaho Business Analytics' best features include the ease of developing data flows and the wide range of options to connect to databases, including those on the cloud.

What needs improvement?

Pentaho Business Analytics' user interface is outdated. It's also limited in out-of-the-box features, which forces you to develop features yourself. There are also some problems with having to update metadata manually, which I would like to see Pentaho fix in the future. 

What do I think about the stability of the solution?

Pentaho Business Analytics is stable.

What do I think about the scalability of the solution?

Pentaho Business Analytics is scalable (though I have only tested this lightly).

How are customer service and support?

Since Pentaho Business Analytics is open-source, it has a very helpful community.

Which solution did I use previously and why did I switch?

I previously used Microsoft Integration Services and Microsoft Azure Data Factory.

How was the initial setup?

The initial setup was easy.

What other advice do I have?

Pentaho Business Analytics is a very good product for those starting to work with ETL processes. Usually, it will solve every problem you may have when creating those processes, and it's free, with a big support community. However, it may not be the best choice if your company has a very strong relationship with Microsoft or if you want to work in the cloud. I would give Pentaho Business Analytics a rating of eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Pentaho Business Analytics
April 2024
Learn what your peers think about Pentaho Business Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,857 professionals have used our research since 2012.
PeerSpot user
IT Manager at a transportation company with 51-200 employees
Vendor
In terms of functionality, they're not growing as fast as other companies. It's good for showing the need for BI.

What is most valuable?

Pentaho Data Integration (PDI).

Pentaho Analysis Services

Pentaho Reporting

How has it helped my organization?

We developed Sales’s and HR's datamarts. So nowadays, managers of these departments can have quick and flexible response with them. I think it was an improvement, because in the past each new analyses demanded IT resources, taking time, and this doesn't occur nowadays. The final users have much more freedom to discover the information they need.

What needs improvement?

I think that Pentaho can improve a lot its UI interface and its tool for dashboard maintenance.

For how long have I used the solution?

2 years

What was my experience with deployment of the solution?

I think the most complex are the solutions with the most hardcore implementations. Pentaho could invest more to make the life of developers’ easier.

What do I think about the stability of the solution?

Yes, once in a while, we have to face a unexpected problem that takes us time to overcome. And it causes problems with user’s satisfaction.

What do I think about the scalability of the solution?

No. I think the choice for Pentaho was right for my company. It fits very well for our purpose, which was demonstrate to the directors the power of BI for the business. But, now there is a perception of the benefits, and the company is become bigger. Perhaps, in the near future, I can evaluate other options, even Pentaho EE.

How are customer service and technical support?

Customer Service:

My company has a procedure to evaluate all of our suppliers and we have questions about promptness, level of expertise, pre-sale and post-sale, effectiveness and efficiency.

Technical Support:

7 out of 10

Which solution did I use previously and why did I switch?

Yes, when I started with Pentaho in 20111 I already had worked in another company that had Cognos BI Suite as a BI solution.

How was the initial setup?

The initial setup was straightforward. The setup was done by my team, which had no expertise with the Pentaho BI Suite. In 2 days, I was presented with the first dashboards.

What about the implementation team?

I implemented my first Pentaho project with a vendor team, which help us a lot, but its level of expertise could be better. In the middle of the project, we had some delays related to doubts which had to be clarified by Pentaho’s professionals.

What was our ROI?

The ROI of this product is good, because in little time you can have the first’s outputs. But it’s not excellent if compared with other BI solutions, like QlikView or Tableau.

What's my experience with pricing, setup cost, and licensing?

My original setup cost for the first project was $30,000 and the final cost was about $35,000.

Which other solutions did I evaluate?

Yes. Cognos, Microstrategy and Jaspersoft.

What other advice do I have?

For me, Pentaho is not growing in terms of functionality, as fast as other companies in the same segment. The UI falls short and for more complex solutions, it’s necessary to have good developers. However, being an Open Source solution, I think it allows IT departments to show with low investment the importance of BI for the company.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Owner with 51-200 employees
Vendor
Pentaho BI Suite Review: Final Thoughts – Part 6 of 6

Introduction

This is the last of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.

Data Mining

In this sixth part, originally, I'd like to at least touch on the only part of Pentaho BI Suite we have not talked about before: Data Mining. However as I gather my materials, I realized that Data Mining (along with its ilks: Machine Learning, Predictive Analysis, etc.) is too big of a topic to fit in the space that we have here. Even if I try, the usefulness would be limited at best since at the moment, while the result is being used to solve real-world problems, the usage of Data Mining tools is still exclusively within the realm of data scientists.

In addition, as of late I use Python more for working with datasets that requires a lot of munging, preparing, and cleaning. So as an extension to that, I ended using Pandas, SciKit Learning, and other Python-specific Data Mining libraries instead of Weka (which is basically what the Pentaho Data Mining tool is).

So for those who are new to Data Mining with Pentaho, here is a good place to start, an interview with Mark Hall who was one of the author of Weka who now works for Pentaho: https://www.floss4science.com/machine-learning-with-weka-mark-hall

The link above also has some links to where to find more information.

For those who are experienced data scientists, you probably already made up your mind on which tool suits your needs best and just like I went with Python libraries, you may or may not prefer the GUI approach like Weka.

New Release: Pentaho 5.0 CE

For the rest of this review, we will go over the new changes that comes with the highly anticipated release of the 5.0 CE version. Overall, there are a lot of improvements in various parts of the suite such as PDI and PRD, but we will focus on the BI Server itself, where the largest impact of the new release can be seen.

A New Repository System

In this new release, one of the biggest shock for existing users is the switch from file-based repository system to the new JCR-based one. JCR is a database-backed content repository system that was implemented by the Apache Foundation and code-named “Jackrabbit.”

The Good:

  • Better metadata management
  • No longer need to refresh the repository manually after publishing solutions
  • A much better UI for dealing with the solutions
  • API to access the solutions via the repository which opens up a lot of opportunities for custom applications

The Bad:

  • It's not as familiar or convenient as the old file-based system
  • Need to use a synchronizer plugin to version-control the solutions'

It remains to be seen if this switch will pay off for both the developers and the users in the long run. But it is stable and working for the most part, so I can't complain.

The Marketplace

One of the best feature of the Pentaho BI Server is its plugin-friendly architecture. In version 5.0 this architecture has been given a new face called the Marketplace:

This new interface serves two important functions:

  1. It allows admins to install and update plugins (almost all Pentaho CE tools are written as plugins) effortlessly
  2. It allows developers to publish their own plugins to the world

There are already several new plugins that is available with this new release, notably Pivot4J Analytics. An alternative to Saiku that shows a lot of promises to become a very useful tool to work with OLAP data. Another one that excites me is Sparkl with which you can create other custom plugins.

The Administration Console

The new version also brings about a new Administration Console where we manage Users and Roles:

No longer do we have to fire-off another server just to do this basic administrator task. In addition, you can manage the Mail server (no more wrangling configuration files).

The New Dashboard Editor

As we discussed in Part V of this review, the CDE is a very powerful dashboard editor. In version 5.0, the list of available Components are further lengthen by new ones. And the overall editor seems to be more responsive in this new release.

Usage experience: The improvements in the Dashboard editor is helping me to create dashboards for my clients that goes beyond the static ones. In fact, the one below (demo purposes only) has the interactivity level that rivals a web application or an electronic form:

NOTE: Nikon and Olympus are trademarks of Nikon Corporation and Olympus Group respectively.

Parting Thoughts

Even though the final product of a Data Warehouse of a BI system is a set of answers and forecasts, or dashboards and reports, it is easy to forget that without the tools that help us to consolidate, clean up, aggregate, and analyze the data, we will never get to the results we are aiming for.

As you can probably tell, I serve my clients with various tools that makes sense given their situation, but time and again, the Pentaho BI Suite (CE version especially) has risen to fulfill the needs. I have created Data Warehouses from scratch using Pentaho BI CE, pulling in data from various sources using the PDI, created OLAP cubes with the PSW, which ends up as the data source for the various dashboards (financial dashboards, inventory dashboards, marketing dashboards, etc.) and published reports created using the PRD.

Of course my familiarity with the tool helps, but I am also familiar with a lot of other BI tools beside Pentaho. And sometimes I do have to use other tools in preference to Pentaho because they suit the needs better.

But as I always mention to my clients, unless you have a good relationship with the vendor to avoid paying hundreds-of-thousands per year just to be able to use tools like IBM Cognos, Oracle BI, or SAP Business Objects, there is a good chance that the Pentaho (either EE or CE version) can do the same for less, even zero license cost in the case of CE.

Given the increased awareness on the value of data analysis in today's companies, these BI tools will continue to become more and more sophisticated and powerful. It is up to us business owners, consultants, and data analysis everywhere to develop the skills to harness the tool and crank out useful, accurate, and yes, easy-on-the-eyes decision-support systems. And I suspect that we will always see Pentaho as one of the viable options. A testament to the quality of the team working on it. The CE team in particular, it would be amiss not to acknowledge their efforts to improve and maintain a tool this complex using the Open Source paradigm.

So here we are, at the end of the sixth part. Writing this six-part review has been a blast. And I would like to give a shout out to the IT Central Station who has graciously hosted this review for all to benefit from. Thanks for reading.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Owner with 51-200 employees
Vendor
Pentaho BI Suite Review: PDI – Part 1 of 6

Introduction

The Pentaho BI Suite is one of the more comprehensive BI suite that is also available as an Open Source project (the Community Edition). Interestingly, the absence of license fees is far from being the only factor in choosing this particular tool to build your Data Warehouses (OLAP systems).

This is the first of a six-part review of the BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.

In this first part, we'll be discussing the Pentaho Data Integration (from here on will be referred to as PDI) which is the ETL tool that comes with the suite. An ETL tool is the means with which you input data from various sources – typically out of some transactional systems, then transform the format and flow into another data model that is OLAP-friendly. Therefore it acts as the gateway into using the other parts of the BI suite.

In the case of PDI, it has two components:

  • Spoon (the GUI), where you string together a set of Steps within a Transformation and optionally string multiple Transformations within a single Job. This is where you would spend the bulk of your time developing ETL scripts.

  • The accompanying set of command-line scripts that we can configure to be launched from a scheduler like cron or Windows Task Scheduler. Notably pan a single Transformation runner, kitchen the Job runner, and carte the slave-server runner. These tools give us the flexibility to create our own network of multi-tiered notification system, should we need to.

Is it Feature-Complete'

ETL tools are interesting because anyone who has implemented a BI system have a standard list of major features expected to be available. This standard list does not change from one tool brand to the other. Let's see how PDI fares:

  1. Serialized vs Parallel ETL processing: PDI handles parallel (async.) steps using Transformations, which can be strung together in a Job when we need a serialized sequences.

  2. Parameter-handling: PDI has a property file that allows us to parameterize things that are specific to different platforms (dev/test/prod) such as database name, credentials, external servers. It also features parameters that can be created during the ETL run out of the data in the stream, then passed on from one Transformation to another within a Job.

  3. Script management: Just like any other IT documents (or as some call it artifacts), ETL scripts need to be managed, version-controlled, and documented. PDI scores high on this front. Not because of some specific features, instead, due to design decisions that favor simplicity: The scripts are plain XML documents. That makes it very easy to manage, version-control, and if necessary batch-edit. NOTE: For those who wants enterprise level script management and version-control built into the tool, Pentaho made it available as part of their Enterprise offerings. But for the rest of us who already have a document management process – because we also develop software using other tools – it is not as crucial.

  4. Clustering: PDI supports round-robin -style load-balancing given a set of slave-servers. For those using Hadoop clusters, Pentaho recently added their support to run Jobs on those.

Is it Easy to Use'

With the drag and drop graphical UI approach, the ease of use is a given. It is quite easy to string together steps to accomplish the ETL process. The trick is knowing which steps to use, and when to use it.

The documentation on how to use each step can stand improvements that fortunately, slowly over the years have started to catch up – and should you have the budget, you can always pay for support that comes with the Enterprise Edition. But overall, it is a matter of using those enough to be familiar with the use cases.

This is why competent BI consultants are worth their weights in gold because they have been in the trenches, and have accumulated ways to deal with the quirks which is bound to be encountered in a software system this complex (not just Pentaho, this applies to any BI Suite products out there).



NOTE: I feel obligated to point out one (very) annoying fact that I cannot hit the Enter key to edit the selected step. Think about how many times we would use this functionality on any ETL tool.

Aside from that, in the few years that I've used various versions of the GUI, I've never encountered severe data loss due to stability problems.

Another measurement of ease-of-use that I evaluate a tool with is: How easy it is to debug the ETL scripts. With PDI, the logical structures of the scripts could be easily followed, therefore it's quite debug-friendly.

Is it Extensible'

It may be a strange question at first, but let us think about it. One of the purpose of using an ETL tool is to deal with a variety of data sources. No matter how comprehensive the included data format readers/writers, sooner or later you would have to talk to a proprietary system that is not widely-known. We had to do this once for one of our clients. We ended up writing a custom PDI step that communicates with the XML-RPC backend of an ERP system.

The good news is, with PDI, anyone with some Java SDK development experience, can readily implement the published interfaces and thus creating their own custom Transformation steps. In this regard, I am quite impressed with the modular design, that allows users to extend the functionality and consequently, the usefulness of the tool.

The scripting ability built into the Steps is also one of the ways to handle proprietary – or extremely complex data. PDI allows us to write Javascript (and Java, should you want faster performance) programs to manipulate the data both at the row level as well as pre- and post- run, which comes very handy to handle variable initializations or sending notifications that contain statistical info about all of the rows.

Summary

The PDI, is one of the jewels in the Pentaho BI Suite. Aside from some minor inconveniences within the GUI tool, the simplicity, extensibility, and stability of the whole package makes PDI a good tool for building a network of ETLs marshaling data from one end of the systems to another. In some cases, it even serves well as a development tool for the batch-processing side of an OLTP system.

Next in part-two, we will discuss the Pentaho BI Server.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user76890 - PeerSpot reviewer
Engineer at a marketing services firm with 51-200 employees
Vendor
It does a lot of what we need but off-the-shelf solutions often can’t do exactly what you need

Being in the business of online-to-offline ad attribution and advertising analytics, we need tools to help us analyze billions of records to discover interesting insights for our clients. One of the tools we use is Pentaho, an open source business intelligence platform that allows us to manage, transform, and explore our data. It offers some nice GUI tools, can be quickly set up on top of existing data, and has the advantage of being on our home team.

But for all the benefits of Pentaho, making it work for us has required tweaking and in some cases replacing Pentaho with other solutions. Don’t take this the wrong way: we like Pentaho, and it does a lot of what we need. But at the edges, any off-the-shelf solution often can’t do exactly what you need.

Perhaps the biggest problem we faced was getting queries against our cubes to run quickly. Because Pentaho is built around Mondrian, and Mondrian is a ROLAP, every query against our cubes requires building dozens of queries that join tables with billions of rows. In some cases this meant that Mondrian queries could require hours to run. Our fix has been to make extensive use of summary tables, i.e. summarizing counts of raw data at levels we know our cubes will need to execute queries. This has allowed us to take queries that ran in hours to run in seconds by doing the summarization for all queries once in advance. At worst our Mondrian queries can take a couple minutes to complete if we ask for really complicated things.

Early on, we tried to extend our internal use of Pentaho to our clients by using Action Sequences, also known as xactions after the Action Sequence file extension. Our primary use of xactions was to create simple interfaces for getting the results of Mondrian queries that could then be displayed to clients in our Rails web application. But in addition to sometimes slow Mondrian queries (in the world of client-facing solutions, even 15 seconds is extremely slow), xactions introduce considerable latency as they start up and execute, adding as much as 5 seconds on top of the time it takes to execute the query.

Ultimately we couldn’t make xactions fast enough to deliver data to the client interface, so we instead took the approach we use today. We first discover what is useful in Pentaho internally, then build solutions that query directly against our RDBMS to quickly deliver results to clients. Although, to be fair to Mondiran, some of these solutions require us to summarize data in advance of user requests to get the speed we want because that data is just that big and the queries are just that complex.

We’ve also made extensive use of Pentaho Data Integration, also known as Kettle. One of the nice features about Kettle is Spoon, a GUI editor for writing Kettle jobs and transforms. Spoon made it easy for us to set up ETL processes in Kettle and take advantage of Kettle’s ability to easily spread load across processing resources. The tradeoff, as we soon learned, was that Spoon makes the XML descriptions of Kettle jobs and transforms difficult to work on concurrently, a major problem for us since we use distributed version control. Additionally, Kettle files don’t have a really good, general way of reusing code short of writing custom Kettle steps in Java, so it makes maintaining our large collection of Kettle jobs and transforms difficult. On the whole, Kettle was great for getting things up and running quickly, but over time we find its rapid development advantages are outweighed by the advantages of using a general programming language for our ETL. The result is that we are slowly transitioning to writing ETL in Ruby, but only transitioning 0n an as-needed basis since our existing Kettle code works well.

As we move forward, we may find additional places where Pentaho does not fully meet our needs and we must find other solutions to our unique problems. But on the whole, Pentaho has proven to be a great starting platform for getting our analytics up and running and has allowed us to iteratively build out our technologies without needing to develop custom solutions from scratch for everything we do. And, I expect, Pentaho will long have a place at our company as an internal tool for initial development of services we will offer to our clients.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user108285 - PeerSpot reviewer
it_user108285Works at a financial services firm
Vendor

Have you looked into using Talend?? It's got a great user interface, very similar to kettle, and their paid for version has version control that works very well, and you get the ability to run "joblets" which are basically re-usable pieces of code. Even in the free version there is version control, although it's pretty clumsy, and not joblets in the free, and the free version is difficult to get working with Github.

See all 7 comments
PeerSpot user
Owner with 51-200 employees
Vendor
Pentaho BI Suite Review: Dashboards – Part 5 of 6

Introduction

This is the fifth of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.

In this fifth part, we'll be discussing how to create useful and meaningful dashboards using the tools available to us in the Pentaho BI Suite. As a complete Data Warehouse building tool, Pentaho offers the most important aspect for delivering enterprise-class dashboards, namely Access Control List (ACL). A dashboard-creation tool without this ability to limit dashboards access to a particular group or role within the company is missing a crucial feature, something that we cannot recommend to our clients.

On the Enterprise Edition (EE) version 5.0, dashboard creation has a user-friendly UI that is as simple as drag-and-drop. It looks like this:

Figure 1. The EE version of the Dashboard Designer (CDE in the CE version)

Here the user is guided to choose a type of grid layout that is already prepared by Pentaho. Of course the option to customize the looks and change individual components are available under the hood, but it is clear that this UI is aimed towards end-users looking for quick results. More experienced dashboard designers would feel severely restricted by this.

In the rest of this review, we will go over dashboard creation using the Community Edition (CE) version 4.5. Here we are going to see a more flexible UI which unfortunately also demands familiarity with javacript and chart library customizations to create something more than just basic dashboards.

BI Server Revisited

In the Pentaho BI Suite, dashboards are setup in these two places:

  1. Using special ETLs we prepare the data to be displayed on the dashboards according to the frequency of update that is required by the user. For example, for daily sales figures, the ETL would be scheduled to run every night. Why do we do this? Because the benefits are two-fold: It increase the performance of the dashboards because it is working with pre-calculated data, and it allows us to apply dashboard-level business rules.
  2. The BI Server is where we design, edit, assign access permissions to dashboards. Deep URLs could be obtained for a particular dashboard to be displayed on a separate website, but some care has to be taken to go through the Pentaho user authorization; depending on the web server setup, it could be as simple as passing authorization tokens, or as complex as registering and configuring a custom module.

Next, we will discuss each of these steps in creating a dashboard. As usual, the screenshots below are sanitized and there are no real data being represented. Data from a fictitious microbrewery is used to illustrate and relate the concepts.

Ready, Set, Dash!

The first step is to initiate the creation of a dashboard. This is accomplished by selecting File > New > CDE Dashboard. A little background note, CDE (which stands for Ctools Dashboard Editor) is part of the Community Tools (or Ctools) created by the team who maintains and improve Pentaho CE.

After initiating the creation of a new dashboard, this is what we will see:

Figure 2. The Layout screen where we perform the layout step

The first thing to do is to save the newly created (empty) dashboard into somewhere within the Pentaho solution folder (just like what we did when we save an Analytic or Ad-Hoc Reports). To save the currently worked on dashboard, use the familiar New | Save | Save As | Reload | Settings menu. We will not go into details on each of this self-explanatory menus.

Now look at the top-right section. There are three buttons that will toggle the screen mode, this particular one is in the Layout mode.

In this mode, we take care of the layout of the dashboard. On the left panel, we see the Layout Structure. It is basically a grid that is made out of Row entries, which contains Column(s) which itself may contain another set of Row(s). The big difference between Row and Column is that the Column actually contains the Components such as charts, tables, and many other types. We give a name to a Column to tie it to a content. Because of this, the names of the Columns must be unique within a dashboard.

The panel to the right, is a list of properties that we can set the values of, mostly HTML and CSS attributes that tells the browser how to render the layout. It is recommended to create a company-wide CSS to show the company logo, colors, and other visual markings on the dashboard.

So basically all we are doing in this Layout mode is determining where certain contents should appear within the dashboard, and we do that by naming each of the place where we want those contents to be displayed.

NOTE: Even though the contents are placed within a Column, it is a good practice to name the Rows clearly to indicate the sections of the dashboard, so we can go back later and be able to locate the dashboard elements quickly.

Lining-Up Components

After we defined the layout of the dashboard using the Layout mode, we move on to the next step by clicking on the Components button on the top horizontal menu as shown in the screenshot below:

Figure 3. The Components mode where we define the dashboard components

Usage experience: Although more complex, the CDE is well implemented and quite robust. During our usage to build dashboards for our clients, we have never seen it produce inconsistent results.

In this Components mode, there are three sections (going from left to right). The left-most panel contains the selection of components (data presentation unit). Ranging from simple table, to the complex charting options (based on Protovis data visualization library), we can choose how to present the data on the dashboard.

The next section to the right contains the current components already chosen for the dashboard we are building. As we select each of these components, its properties are displayed in the section next to it. The Properties section is where we fill-in the information such as:

  • Where the data is coming from
  • Where the Component will be displayed in the dashboard. This is done by referring to the previously defined Column from the Layout screen
  • Customization such as table column width, the colors of a pie chart, custom scripting that should be run before or after the component is drawn

This clean separation between the Layout and the Components makes it easy for us to create dashboards that are easy to maintain and accommodates different versions of the components.

Where The Data Is Sourced

The last mode is the Data Source mode where we define where the dashboard Components will get their data:

Figure 4. The Data Sources mode where we define where the data is coming from

As seen in the left-most panel, the data source type is quite comprehensive. We typically use either SQL or MDX queries to fetch the data set in the format that is suitable to be presented in the Components we defined earlier.

For instance, a data set to be presented in a five-columns table will look different than one that will be presented in a Pie Chart.

This screen follows the other in terms of sections, we have (from left to right) the Data Source type list, the currently defined data sources, and the Properties section on the right.

Usage experience: There may be some confusion for those who are not familiar with the way Pentaho define a data source. There are two “data source” concepts represented here. One is the Data Source defined in this step for the dashboard, and the other, the “data source” or “data model” where the Data Source connects to and run the query against.

After we define the Data Sources and name them, we go back to the Components mode and specify these names as the value of the Data source property of the defined components.

Voila! A Dashboard

By the time we finished defining the Data Sources, Components, and Layout, we end up with a dashboard. Ours looks like this:

Figure 5. The resulting dashboard

The Title of the dashboard and the date range is contained within one Row. So are the first table and the pie chart. This demonstrates the flexibility of the grid system used in the Layout mode.

The company color and fonts used in this dashboard is controlled via the custom CSS specified as Resource in the Layout mode.

All that is left to do at this point is to give the dashboard some role-based permissions so access to it will be limited to those who are in the specified role.

TIP: Never assign permission at the individual user level. Why? Think about what has to happen when the person change position and is replaced by someone else.

Extreme Customization

Anything from table column width to the rotation-degrees of the x-axis labels can be customized via the properties. Furthermore, for those who are well-versed in Javascript language, there are tons of things that we can do to make the dashboard more than just a static display.

These customizations can actually be useful other than just making things sparkle and easier to read. For example, by using some scripting, we can apply some dashboard-level business rules to the dashboard.

Usage experience:Let's say we wanted to trigger some numbers displayed to be in the red when it fell below a certain threshold, we do this using the post-execution property of the component and the script looks like this:

Figure 6. A sample post-execution script

Summary

The CDE is a good tool for building dashboards, coupled with the ACL feature built into the Pentaho BI Server, they serve as a good platform for planning and delivering your dashboard solutions. Are there other tools out there that can do the same thing with the same degree of flexibility? Sure. But for the cost of only time spent on learning (which can be shortened significantly by hiring a competent BI consultant), it is quite hard to beat free licensing cost.

To squeeze out its potentials, CDE requires a lot of familiarity with programming concepts such as formatting masks, javascript scripting, pre- and post- events, and most of the times, the answer to how-to questions can only be found in random conversations between Pentaho CE developers. So please be duly warned.

But if we can get past those hurdles, it can bring about some of the most useful and clear dashboards. Notice we didn't mention “pretty” (as in “gimicky”) because that is not what makes a dashboard really useful for CEOs and Business Owners in day-to-day decision-making.

Next in the final part (part-six), we will wrap up the review with a peek into the Weka Data Mining facility in Pentaho, and some closing thoughts.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Owner with 51-200 employees
Vendor
Pentaho BI Suite Review: Pentaho Analytics – Part 4 of 6
Introduction

This is the fourth of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.

In this fourth part, we'll be discussing the Pentaho Analytics tools and facilities, which provides the ability to view, “slice and dice” data from multiple dimensions. This particular feature is the most associated with the word “Business Intelligence” due to its usefulness to aid cross-data-domain decision-making processes. Any decent BI suites have at least one facility with which users can perform data analysis with.

One important note, specifically for Pentaho, the Analytics toolset is where the real advantage of the Enterprise Edition (EE) over Community Edition (CE) starts to show-through – other than the much more polished UI.

In the Pentaho BI Suite, we have these analytics tools:

  1. Saiku Analytics (In EE this is called “Analysis Report”) – A tool built into Pentaho User Console (PUC) that utilizes the available analysis models. Do not confuse this with the Saiku Reporting.
  2. Pentaho Model Data Source – In part three of the review, we discussed this facility to create data models for Ad-hoc reporting. The second usage of this facility is to create an OLAP “cube” for use with the Saiku Analytics tool. Once this is setup by the data personnel, data owners can use it to generate analytic reports.
  3. Schema Workbench – A separate program that allows for handcrafting OLAP cube schemas. Proviciency with MDX query language is not necessary but can come in handy in certain situations.

As usual, we'll discuss each of these components individually. The screenshots below are sanitized and there are no real data being represented. A fictitious company called “DonutWorld” is used to illustrate and relate the concepts.

Saiku Analytics (Analysis Report in EE)

One of the benefit of having a Data Warehouse is to be able to model existing data in a structure that is conducive to analysis. If we try to feed tools such as this with a heavily normalized transaction database, we are inviting two problems:

1. We will be forced to do complex joins which will manifest itself in performance hit and difficulty when business rules change

2. We lose the ability to apply non-transactional business rules to the data which is closer to the rule maintainers (typically those who work closely with the business decision-makers)

Therefore to use this tool effectively we need to be thinking in terms of what questions need to be answered, then work our way backwards employing data personnels to create the suitable model for the said questions. Coincidentally, this process of modeling data suitable for reporting is a big part of building a Data Warehouse.

Learning experience: Those who are familiar with MS Excel (or Libre Office) Pivot Tables will be at home with this tool. Basically, as the model allows, we can design the view or report by assigning dimensions into columns and rows, and then assigning measures to define what kind of numbers we are expecting to see. We will discuss below what 'dimension' and 'measure' mean in this context, but for an in-depth treatment, we recommend consulting your data personnels.

Usage experience: The EE version of this tool has a clearer interface as far as where to drop dimensions and measures, but the CE version is usable once we are accustomed to how it works. Another point for the EE version (version 5.0) is the ability to generate total sums in both row and column direction and a much more usable Excel export.

Figure 1. The EE version of the Analysis Report (Saiku Analytics in CE)

Pentaho Model Data Source

The Data Source facility is accessible from within the PUC. As described in Part 3 of this review, once you have logged in, look for a section on the screen that allows you to create or manage existing data sources.

Here we are focusing on using this feature to setup “cubes” instead of “models.” This is something that your data personnels should be familiar with, guided by the business questions that needs answering.

Unlike the “model”, the “cubes” are not flat, rather it consists of multiple dimensions that determines how the measures are aggregated. Out of these “cubes” non-technical users can create reports by designing it just like they would Pivot Tables. The most useful aspect of this tool is to abstract a construction of an OLAP cube schema to its most core concepts. For example, given a fact table, this tool will try to generate an OLAP cube schema. And in most part, it's doing a good job in the sense that the cube is immediately usable to generate Analysis Reports.

This tool also hide the distinction between Hierarchies and Levels of dimensions. For the most part, you can do a lot with just one Level anyway, so this is easier to grasp for beginners in OLAP schema design.

Learning experience: The data personnel must be 1) familiar with the BI table structures or at the very least can pinpoint which of the tables are facts and dimensions; 2) comfortable with designing OLAP dimensions and measures. Data owners must be familiar with the structure and usage of the data. The combined efforts by these two roles are the building blocks of a workflow/process.

Usage experience: Utilizing the workflow/process defined above, an organization will generate a collection of OLAP cubes that can be used to analyze the business data with increasing accuracy and usefulness. The most important consideration from the business standpoint, is that all of this will take some time to materialize. The incorrect attitude here would be to expect instant results, which will not transpire unless the dataset is overly simplistic.

Figure 2. Creating a model out of a SQL query

NOTE: Again, this is where the maturity level of the Data Warehouse is tested. For example, a DW with sufficient maturity will notify the data personnel of any data model changes which will trigger the updating of the OLAP cube, which may or may not have an effect on the created reports and dashboards.

If the DW is designed correctly, there should be quite a few fact tables that can readily be used in the OLAP cube.

Schema Workbench

The Schema Workbench is for those who needs to create a custom OLAP schema that cannot be generated via the Data Source facility in the PUC. Usually this involves complicated measure definitions, multi-Hierarchy or multi-Level dimensions, or to evaluate and optimize MDX queries.

NOTE: In the 5.0 version of PUC, we can import existing MDX queries into the Data Source Model making it available for the Analysis Report (or Saiku Ad-Hoc report in the CE version). As can be seen in the screenshot below, the program is quite complex with the numerous features to handcraft an OLAP cube schema.

Once a schema is validated in the Workbench, we need to publish it. Using the password defined in the pentaho-solutions/system/publisher_config.xml, the Workbench will prompt for the location of the cube within the BI Server and the displayed name. From that point, it will be available to choose from the drop-down list on the top left of the Saiku Analytics tool.

Figure 3. A Saiku report in progress

OLAP Cube Schema Considerations

Start by defining the fact table (bi_convection in the above example), then start defining dimensions and measures.

We have been talking about these concepts of dimension and measure. Let's briefly define them:

  1. A dimension is a way to view existing business data. For instance, a single figure such as sales number can be viewed from the perspectives. We can view it per sales regions, per salesperson or department, or chronologically. Using aggregation function such as sum, average, min/max, standard deviation, etc. we can come up with different numbers that shows the data in a manner that we can draw conclusion from.
  2. A measure is the numbers or counts of business data that can provide an indication on how the business is doing. For a shoe manufacturing company, obviously the number of shoes sold is one very important measure, another would be the average price of sold shoes. Combined with dimensions, we can use the measures to make a business decision.

In the Schema Workbench, as you select the existing BI table fields into the proper dimensions, it will validate the accessibility of the fields using the existing database connection, then create a view of the measures using a certain user-configurable way to aggregate the numbers.

In the creation of an OLAP cube schema, there is a special dimension that enables us to see data chronologically. Due to its universal nature, this dimension is a good one to start with. The time dimension is typically served by a special BI table that contains a flat list of rows containing time and date information within the needed granularity (some businesses requires seconds, others days, or even weeks or months).

TIP: Measures can be defined using “case when” SQL construct, which opens a whole other level of flexibility.

When should we use MDX vs SQL?

The MDX query language, with its powerful concepts like ParallelPeriods, is suitable for generating tabular data containing aggregated data that is useful for comparison purposes.

True to its intended purposes, MDX queries allows for querying data which is presented in a multi-dimensional fashion. While SQL is easier to grasp and has a wider base of users/experts in any industry.

In reality, we use these two languages at different levels, the key is to be comfortable with both, and discover the cases where one would make more sense than the other.

NOTE: The powerful Mondrian engine is capable, but without a judicious use of database indexing, query performance can crawl into minutes instead of seconds easily. This is where data personnels with database tuning experiences would be extremely helpful.

Summary

The analytics tools in the Pentaho BI Suite is quite comprehensive. Certainly better than some of the competing tools out there. The analytic reports are made available on the Pentaho User Console (PUC) where users login and initiate the report generation. There are three facilities available:

The Analysis Report (or Saiku Analytics in CE version) is a good tool for building reports that look into an existing OLAP cube and do the “slicing and dicing” of data.

The Data Source facility can also be used to create OLAP cubes from existing BI tables in the DW. A good use of this facility is to build a collection of OLAP cubes to answer business questions.

The Schema Workbench is a standalone tool which allows for handcrafting custom OLAP cube schemas. This tool is handy for complicated measure definitions and multilevel dimensions. It is also a good MDX query builder and evaluator.

Next in part-five, we will discuss the Pentaho Dashboard design tools.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Pentaho Business Analytics Report and get advice and tips from experienced pros sharing their opinions.
Updated: April 2024
Buyer's Guide
Download our free Pentaho Business Analytics Report and get advice and tips from experienced pros sharing their opinions.