Pentaho Data Integration and Analytics Valuable Features

DP
Enterprise Data Architect at a manufacturing company with 201-500 employees

I'm a database guy, not a programmer, so Lumada's ability to create low-code pipelines without custom coding is crucial for me. I don't need to do any Java customization. I've had to write SQL scripts and occasionally a Javascript within it, but those are few and far between. I can do everything else within the tool itself. I got into databases because I was sick and tired of getting errors when I compiled something. 

View full review »
Jacopo Zaccariotto - PeerSpot reviewer
Head of Data Engineering at InfoCert

Pentaho is flexible with a drag-and-drop interface that makes it easier to use than some other ETL products. For example, the full stack we are using in AWS does not have drag-and-drop functionality. Pentaho was a good option at the start of this journey.

We can schedule job execution in the BA Server, which is the front-end product we're using right now. That scheduling interface is nice.

View full review »
Ryan Ferdon - PeerSpot reviewer
Senior Data Engineer at Burgiss

The fact that it's a low-code solution is valuable. It's good for more junior people who may not be as experienced with programming. In our case, we didn't have a huge data set. We had small and medium-sized data sets, so it worked fine.

The fact that it's open source is also helpful in that, if a junior engineer knows they are going to use it in a job, they can download it themselves, locally, for free, and use test data to learn it.

My role was to use it to write one feed that could facilitate multiple clients. Given that it was an open-source, free solution, it was pretty robust in what it could do. I could make lookup tables and databases and map different clients, and I could use the same feed for 30 clients or 50 clients. It got the job done for our use case.

In addition, you can install it wherever you need it. We had installed versions in the cloud and I also had local versions.

View full review »
Buyer's Guide
Pentaho Data Integration and Analytics
April 2024
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,578 professionals have used our research since 2012.
PR
Senior Engineer at a comms service provider with 501-1,000 employees

The graphical nature of the development interface is most useful because we've got people with quite mixed skills in the team. We've got some very junior, apprentice-level people, and we've got support analysts who don't have an IT background. It allows us to have quite complicated data flows and embed logic in them. Rather than having to troll through lines and lines of code and try and work out what it's doing, you get a visual representation, which makes it quite easy for people with mixed skills to support and maintain the product. That's one side of it. 

The other side is that it is quite a modular program. I've worked with other ETL tools, and it is quite difficult to get component reuse by using them. With tools like SSIS, you can develop your packages for moving data from one place to another, but it is really difficult to reuse a lot of it, so you have to implement the same code again. Pentaho seems quite adaptable to have reusable components or sections of code that you can use in different transformations, and that has helped us quite a lot.

One of the things that Pentaho does is that it has the virtual web services ability to expose a transformation as if it was a database connection; for instance, when you have a REST API that you want to be read by something like Tableau that needs a JDBC connection. Pentaho was really helpful in getting that driver enabled for us to do some proof of concept work on that approach.

View full review »
Dale Bloom - PeerSpot reviewer
Credit Risk Analytics Manager at MarketAxess

I'm at the early stages with Lumada, and I have been using the documentation quite a bit. The support has definitely been critical right now in terms of trying to find out more about the architectural elements that need to go in for pushing the Enterprise edition.

I absolutely love Hitachi. I'm one of the forefront supporters of Hitachi for my firm. It's so easy to integrate within our environments. In terms of being able to quickly build ETL jobs, transform, and then automate them, it's really easy to integrate throughout for data analytics. 

I also appreciate the fact that it's not one of the low-code/no-code solutions. You can put as much JavaScript or another code into it as you want, and that makes it a really powerful tool.

View full review »
RicardoDíaz - PeerSpot reviewer
COO / CTO at a tech services company with 11-50 employees

Pentaho from Hitachi is a suite of different tools. Pentaho Data Integration is a part of the suite, and I love the drag-and-drop functionality. It is the best. 

Its drag-and-drop interface lets me and my team implement all the solutions that we need in our company very quickly. It's a very good tool for that.

View full review »
VK
Solution Integration Consultant II at a tech vendor with 201-500 employees

The metadata injection feature is the most valuable because we have used it extensively to build frameworks, where we have used it to dynamically generate code based on different configurations. If you want to make a change at all, you do not need to touch the actual code. You just need to make some configuration changes and the framework will dynamically generate code for that as per your configuration. 

We have a UI where we can create our ETL pipelines as needed, which is a key advantage for us. This is very important because it reduces the time to develop for a given project. When you need to build the whole thing using code, you need to do multiple rounds of testing. Therefore, it helps us to save some effort on the QA side.

Hitachi Vantara's roadmap has a pretty good list of features that they have been releasing with every new version. For instance, in version 9, they have included metadata injection for some of the steps. The most important elements of this roadmap to our organization’s strategy are the data-driven approach that this product is taking and the fact that we have a very low-code platform. Combining these two is what gives us the flexibility to utilize this software to enhance our product.

View full review »
TJ
Manager, Systems Development at a manufacturing company with 5,001-10,000 employees

It makes it pretty simple to do some fairly complicated things. Both I and some of our other BI developers have made stabs at using, for example, SQL Server Integration Services, and we found them a little bit frustrating compared to Data Integration. So, its ease of use is right up there.

Its performance is a pretty close second. It is a pretty highly performant system. Its query performance on large data sets is very good.

View full review »
Anton Abrarov - PeerSpot reviewer
Project Leader at a mining and metals company with 10,001+ employees

It has a really friendly user interface, which is its main feature. The process of automating or combining SQL code with some databases and doing the automation is great and really convenient.

View full review »
RV
CDE & BI Delivery Manager at a tech services company with 501-1,000 employees

A valuable feature is the number of connectors that I have. So, I can connect to different databases, origins of data, files, and SFTP. With SQL and NoSQL databases, I can connect, put it in my instructions, send it to my staging area, and create the format. Thus, I can format all my data in just one process.

View full review »
Ridwan Saeful Rohman - PeerSpot reviewer
Data Engineering Associate Manager at Zalora Group

This solution offers tools that are drag and drop. The script is quite minimal. Even if you do not come from IT or your background is not in software engineering, it is possible. It is quite intuitive. You can drag and drop many functions.

The abstraction is quite good.

Also, if you're familiar with the product itself, they have transformational abstractions and job abstractions. Here, we can create a smaller transformation in the Kettle transformation, and then the bigger ones on the Kettle job. For someone who has familiarity with Python or someone who has no scripting background at all, the product is useful.

For larger data, we are using Spark.

The solution enables us to create pipelines with minimal manual, or custom coding efforts. Even if you have no advanced experience in scripting, it is possible to create ETL tools. I have a recent graduate coming from a management major who has no experience with SQL. I trained him for three months, and within that time he became quite fluent, with no prior experience using ETL tools.

Whether or not it's important to handle the creation of pipelines with minimal coding depends on the team. If I change the solution to Airflow, then I will need more time to teach them to become fluent in the ETL tool. By using these kinds of abstractions in the product, I can compress the training time to just three months. With Airflow, it will take longer than six months to get new users to the same point.

We use the solution's ability to develop and deploy data pipeline templates and reuse them.

The old system was created by someone prior to me in my organization and we still use it. It was developed by him a long time ago. We also use the solution for some ad hoc reporting.

The ability to develop and deploy data pipeline templates once and reuse them is really important to us. There are some requests to create the pipelines. I create them and then deploy them on our server. It then has to be as robust as when we do the scheduling so that it does not fail.

We like the automation. I cannot imagine how the data teams will work if everything was done on an ad hoc basis. Everything should be automated. Using my organization as an example, I can with confidence say that 95% of our data distributions are automated and only 5% ad hoc. With this solution, we query the data manually. We process the data on the spreadsheets manually and then distribute it to the organization. It’s important to be robust and be able to automate.

So far, we can deploy the solution easily on the cloud, which is on AWS. I haven't really tried it on another server. We deploy it on our AWS EC2, however, we develop it on our local computer, which consists of people who use Windows. There are some people who also use MacBooks.

I personally have used it on both. I have to develop both on Windows and MacBook. I can say that Windows is easier to navigate. On the MacBook, the display becomes quite messed up if you are enabling the dark mode.

The solution did reduce our ETL development time if you compare it to the scripting. However, this will really depend on your experience.

View full review »
RK
Senior Data Analyst at a tech services company with 51-200 employees

One of the most valuable features is the ability to create many API integrations. I'm always working with advertising agents and using Facebook and Instagram to do campaigns. We use Pentaho to get the results from these campaigns and to create dashboards to analyze the results.

I'm working with large data sets. One of the clients I'm working with is a large credit card company and the database from this client is very large. Pentaho allows me to query large data sets without affecting its performance.

I use Pentaho with Jenkins to schedule the jobs. I'm using the jobs and transformations in Pentaho to create many links. 

I always find ways to have minimal code and create the processes with many parameters. I am able to reuse processes that I have created before. 

Creating jobs and putting them into production, as well as the visibility that Pentaho gives, are both very simple.

View full review »
Aqeel UR Rehman - PeerSpot reviewer
BI Analyst at Vroozi

The best feature is that it's simple to use. There are simple data transformation steps available, such as trimming data or performing different types of replacement.

This solution allows us to create pipelines using a minimal amount of custom coding. Anyone in the company can do so, and it's just a simple step. If any coding is required then we can use JavaScript.

View full review »
Renan Guedert - PeerSpot reviewer
Business Intelligence Specialist at a recruiting/HR firm with 11-50 employees

It has many resourceful things. It has a variety of the things that you can do. It is also pretty open, since you can put in a Python script or JavaScript for everything. If you don't have the native tool on the application, you can build your own using scripts. You can build your other steps and jobs on the application. The liberty of the application has been pretty good.

Lumada enables us to create pipelines with minimal manual coding efforts, which is the most important thing. When creating a pipeline, you can see which steps are failing in the process. You can keep up the process and debug, if you have problems. So, it creates a good, visual pipeline that makes it easy to understand what you are doing during the entire process.

View full review »
José Orlando Maia - PeerSpot reviewer
Data Engineer at a tech services company with 201-500 employees

The features that I use the most are Microsoft Excel table input, S3 CSV Input, and CSV input. Today, the features that are more valuable to me are the table input, then the CSV input. These both are very important. We extract data from the table system for our transactional databases, which are commonly used. We also use the CSV input to get data from AWS S3 and our data lake.

In Lumada, we can parallelize the steps. The performance to query the databases for me is good, especially for transactional databases. Because Lumada uses Java, we can adjust the amount of memory that we want to use to do transformations. So, it is accessible. It's possible to set up the amount of memory that we want to use in the Java VM, which is good. Therefore, Lumada is good, especially with transactional database extraction. It has good performance, not higher performance, but good performance as we query data, and it is possible to parallelize the query. For example, if we have three or four servers to get the data, then we can retrieve the data at the same time, in parallel, in these databases. This is good because we don't need to wait while one of the extractions finishes. 

Using Lumada, we don't need to do many manual transformations because we have a native company for many of our transformations. Thus, Lumada is a low-code tool to gather data from SQL, Python, or other transformation tools.

View full review »
Michel Philippenko - PeerSpot reviewer
Project Manager at a computer software company with 51-200 employees

The ETL feature was the most valuable to me. I like it very much. It was very good.

View full review »
RE
Data Architect at a consumer goods company with 1,001-5,000 employees

I can use Python, which is open-source, and I can run other scripts, including Linux scripts. It's user-friendly for running any object-based language. That's a very important feature because we live in a world of open-source. With open-source on the table, I am in a position to transform the data where it's actually being moved from one environment to another.

Whether we are working with structured or unstructured data, the tool has been helpful. We are actually able to extend it to read JSON data by creating some Java components.

The solution gives us the flexibility to deploy it in any environment, including on-premises or in the cloud. That is another very important feature.

View full review »
NA
Systems Analyst at a university with 5,001-10,000 employees

The ETL is definitely an awesome feature of the product. It's very easy and quick to use. Once you understand the way it works it's pretty robust.

Lumada Data Integration requires minimal coding. You can do more complex coding if you want to, because it has a scripts option that you can add as a feature, but we haven't found a need to do that yet. We just use what's available, the steps that they have, and that is sufficient for our needs at this point. It makes it easier for other developers to look at the things that we have developed and to understand them quicker, whereas if you have complex coding it's harder to hand off to other people. Being able to transition something to another developer, and having that person pick it up quicker than if there were custom scripting, is an advantage.

In addition, the solution's ability to quickly and effectively solve issues we've brought up has been great. We've been able to use all the available features.

Among them is the ability to develop and deploy data pipeline templates once and reuse them. The fact that it enables us to leverage metadata to automate data pipeline templates and reuse them is definitely one of the features that we like the best. The metadata injection is helpful because it reduces the need to create and maintain additional ETLs. If we didn't have that feature, we would have lots of duplicated ETLs that we would have to create and maintain. The data pipeline templates have definitely been helpful when looking at productivity and costs. The automation of data pipeline templates has also been helpful in scaling the onboarding of data.

View full review »
KM
Data Architect at a tech services company with 1,001-5,000 employees

One of the valuable features is the ability to use PL/SQL statements inside the data transformations and jobs.

View full review »
ES
System Engineer at a tech services company with 11-50 employees

The graphical user interface is quite okay. That's the most important feature. In addition, the different types of stores and data formats that can be accessed and transferred are an important component.

We also haven't had to create any custom Java code. Almost everywhere it's SQL, so it's done in the pipeline and the configuration. That means you can offload the work to people who, while they are not less experienced, are less technical when it comes to logic. It's more about the business logic and less about the programming logic and that's really important.

Another important feature is that you can deploy it in any environment, whether it's on-premises or cloud, because you can reuse your steps. When it comes to adding to your data processing capacity dynamically that's key because when you have new workflows you have to test them. When you have to do it on a different environment, like your production environment, it's really important.

View full review »
SK
Lead, Data and BI Architect at a financial services firm with 201-500 employees

Because it comes from an open-source background, it has so many different plugins. It is just extremely broad in what it can do. I appreciate that it has a very broad, wide spectrum of things that it can connect to and do. It has been around for a while, so it is mature and has a lot of things built into it. That is the biggest thing. 

The visual nature of its development is a big plus. You don't need to have very strong developers to be able to work with it.

We often have to drop down to JavaScript, but that is fine. I appreciate that it has the capability built-in. When you need to, you can drop down to a scripting language. This is important to us.

View full review »
DG
Director of Software Engineering at a healthcare company with 10,001+ employees

There is an end-to-end flow, where a user can say, "I am looking at this field and want to slice and dice my data based on these parameters." That flexibility is provided by Pentaho. This minimal manual coding is important to us.

View full review »
VM
Technical Manager at a computer software company with 51-200 employees

Pentaho Data Integration is quite simple to learn, and there is a lot of information available online. It is not a steep learning curve. It also integrates easily with other databases and that is great. We use the provided documentation, which is a simple process for integration compared to other proprietary tools.

View full review »
TG
Analytics Team Leader at HealtheLink

We're using the PDI and the repository function, and they give us the ability to easily generate reporting and output, and to access data. We also like the ability to schedule.

View full review »
it_user164838 - PeerSpot reviewer
CEO with 51-200 employees

Ease of use, stability, graphical interface, small amount of "voodoo" and cost.

View full review »
it_user373128 - PeerSpot reviewer
Data Architect & ETL Lead at a financial services firm with 1,001-5,000 employees

It is a lightweight ETL tool that's easy to get started on. It connects seamlessly to most commonly used data sources.

View full review »
it_user414117 - PeerSpot reviewer
Senior Data Engineer at a tech company with 501-1,000 employees

The most valuable thing for me is that it enables a technical product manager to be able to write ETL jobs themselves, which saves developers time so that they can do more important things.

View full review »
it_user382572 - PeerSpot reviewer
Pentaho Consultant at a comms service provider with 10,001+ employees

It is a very good open source ETL tool that's capable of connecting to most databases. It has a lot of functions that makes transforming the data very easy. Also, because it is an open source product, it is very easy to build your own solution with it.

View full review »
it_user376926 - PeerSpot reviewer
Data Developer at a tech services company with 10,001+ employees
  • Pentaho Kettle has a very intuitive and easy to use graphical user interface (GUI)
  • It is possible to understand how to develop an ETL solution even when using it for the first time
  • The Community Edition is free and very efficient
  • They have versions for Windows, Linux and Mac
  • Large selection of options.
View full review »
OM
IT-Services Manager & Solution Architect at Stratis

Running itself with the ETL was very fast. It makes it so that it is very easy to transform the information we have. We found that very useful. 

The UI is very easy to understand and learn.

The solution offers lots of documentation.

The initial setup is easy.

It's my understanding that the product can scale.

We've found the solution to be stable. 

The product is free to use if you choose the free version.

View full review »
it_user402600 - PeerSpot reviewer
Senior Consultant at a financial services firm with 10,001+ employees

It allows for rapid prototyping of a wide array of ETL workloads.

View full review »
it_user396720 - PeerSpot reviewer
Graduate Teaching Assistant with 1,001-5,000 employees

The most valuable feature is that it can take inputs from all formats, e.g. CSV, text, Excel, JSON, Hadoop, etc. It has the potential to provide the output in the format we require, and we can also use many database connections. The transformations listed are also very useful and are very self-explanatory. 

Also, the data mining feature which comes with the Pentaho business analytics suite was very useful to our project, especially the Weka plugin. We could score the records in the data warehouse, which helped in predicting the values.

Lastly, the GUI is very easy to use, so we can perform transformations with data very quickly, and create reports indicating the KPI in the reporting tool. I think that a company wouldn't need to spend more money on getting an experienced person to use this tool. All you need is a balance of experienced users and new trainees to get going. You can also start using the business analytics tool once you have integrated data. Coaching and  applying this technology enterprise wide will enable your business to take data driven decisions.

View full review »
VD
Specialist in Relational Databases and Nosql at a computer software company with 5,001-10,000 employees

One important feature, in my opinion, is the Metadata Injection. It gives flexibility to the scripts due to the fact that the scripts don't depend on a fixed structure or a fixed data model. Instead, you can develop transformations that are not dependant on the fixed structure or data models. 

Let me give a pair of examples. Sometimes your tables change, adding fields or dropping some of them. When this happens if you have a transformation without using Metadata Injection your transformation fails or doesn't manage the whole info from the table. If you use Metadata Injection instead, the new fields are included and the dropped columns are excluded from the transformation. Other times you have a complex transformation to apply to a lot of different tables. Traditionally, without the Metadata Injection feature, you had to repeat the transformation for each table, adapting the transformation to the concrete structure of each table. Fortunately, with the Metadata Injection, the same transformation is valid for all the tables you want to treat. A little bit effort gives you a great benefit.

Furthermore, the solution has a free to use community version.

The solution is easy to set up, very intuitive, clear to understand and easy to maintain.

View full review »
it_user391695 - PeerSpot reviewer
Business Intelligence Consultant at Sanmargar Team

First of all, the ease of deployment. I’m pretty sure that almost anyone could do simple transformations without having any knowledge of  IT. Thanks to its graphical interface this tool is just drag and click. Another advantage, is that it fits everywhere. You can connect it to Big Data sources, relational databases, and all types of files. If the developer missed something, you can try finding it in the marketplace or quickly develop it yourself, because it is opensource. 

View full review »
it_user426030 - PeerSpot reviewer
Global Consultant - Big Data, BI, Analytics, DWH & MDM at a tech consulting company with 1,001-5,000 employees

It's an ETL Platform including Big Data enablement. It's the most easy to use, extend and deploy. It helps to connect to various data sources including all available databases.  

We also use Pentaho Analyzer which is an ad-hoc analytics tool built on Mondrian OLAP server that enables the end user to slice and dice the data in various patterns.

View full review »
Ricardo Díaz - PeerSpot reviewer
COO at a tech services company with 11-50 employees
Easy to use, support for all databases (jdbc and odbc connection), xls , csv, files, txt, SAS, R View full review »
it_user384984 - PeerSpot reviewer
Sr BI Administrator at a healthcare company with 1,001-5,000 employees

It allows for very quick development due to the intuitive interface. Compared to other ETL tools like Powercenter, SSIS and SAS DI Studio it excels in rapid development cycles.

View full review »
it_user172275 - PeerSpot reviewer
Consultant at a comms service provider with 11-50 employees

It's very simple compared to other products out there.

View full review »
it_user254223 - PeerSpot reviewer
Project Manager - Business Intelligence at www.datademy.es
  • Easy to use
  • Development of the product
  • A lot of predefined steps
  • Good open source option
View full review »
it_user384993 - PeerSpot reviewer
Datawarehouse Administrator at a tech services company with 501-1,000 employees

Its ability of blending data and the dashboarding with C*TOOLS for creating responsive single page apps.

View full review »
it_user415695 - PeerSpot reviewer
Project Lead at a tech services company with 10,001+ employees

The best benefit of the product is that it is easy to use and to understand.

View full review »
it_user426117 - PeerSpot reviewer
DWH Specialist at a healthcare company with 1,001-5,000 employees

It is extremely flexible, it allows you to use variables/parameters for just about everything. 

View full review »
it_user8199 - PeerSpot reviewer
BI developer - (Jaspersoft/Pentaho/Pentaho C-Tools/Kettle/Talend/Data warehouse) at a tech services company with 501-1,000 employees
  • Best in performance in both hosted and local environments
  • Best open source warehouse solution using the Kimball method
  • Best Big Data discovery components and BI
  • Simple and easy to understand and work with
  • Complete cost effective solutions
  • Best support in forums
  • Best visualizations in the market - Protovis & D3
  • Best custom interactivity features
  • Best product for embedded BI
  • Best for mobile responsive technology integrated, i.e. bootstrap
  • Best support in forums
  • Best documentation - Open API's
View full review »
it_user392367 - PeerSpot reviewer
Research Assistant at a university with 1,001-5,000 employees

I would say that user-defined class operator is currently very valuable to me. Other than that native connectivity to hadoop (MapR), analytical databases and enterprise systems are really important to me these days.

View full review »
it_user375219 - PeerSpot reviewer
Consultant at a tech vendor with 501-1,000 employees
  • It has a nice GUI that anyone can learn to use in just a few days with minimal training.
  • It has great support for big data technologies; Pentaho 5.3 comes with support for HBase, Pig, Oozie, and Hadoop distribution support.
View full review »
it_user369171 - PeerSpot reviewer
Brazil IT Coordinator at a transportation company with 1,001-5,000 employees

Data transformation within Pentaho is a nice feature that they have and that I value.

View full review »
it_user386202 - PeerSpot reviewer
Business Intelligence Supervisor at a manufacturing company with 501-1,000 employees
  • Fast
  • Easy to learn and then teach to our team
  • It integrates with everything on market
View full review »
Buyer's Guide
Pentaho Data Integration and Analytics
April 2024
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,578 professionals have used our research since 2012.