2020-07-15T07:11:00Z
it_user434868 - PeerSpot reviewer
Senior Director of Delivery at a tech services company with 51-200 employees
  • 2
  • 175

What is your primary use case for Pentaho Data Integration?

How do you or your organization use this solution?

Please share with us so that your peers can learn from your experiences.

Thank you!

26
PeerSpot user
26 Answers
Dan Peacock - PeerSpot reviewer
Enterprise Data Architect at a manufacturing company with 201-500 employees
Real User
Top 10
2022-04-21T11:53:02Z
Apr 21, 2022

At first, we were using it as a means of loading our Data Warehouse from our operational systems.  


Now we are using it as a means of bulk distribution (bursting) of reports, and synchronization across the enterprise. 


The latter is especially useful as we have several different database platforms (Oracle, SQL Server, PostgreSQL, Access <shudder>, and Firebird).

Search for a product comparison
ES
System Engineer at a tech services company with 11-50 employees
Real User
Top 20Leaderboard
2022-09-04T22:17:00Z
Sep 4, 2022

We use it for two major purposes. Most of the time it is for ETL of data. And based on the loaded and converted data, we are generating reports out of it. A small part of that, the pivot tables and the like, are also on the web interface, which is the more interactive part. But about 80 percent of our developers' work is on the background processes for running and transforming and changing data.

Ridwan Saeful Rohman - PeerSpot reviewer
Data Engineering Associate Manager at Zalora Group
Real User
Top 10
2022-06-26T13:19:00Z
Jun 26, 2022

I still use this tool on a daily basis. Comparing it to my experience with other EPL tools, the system that I created for the solution was quite simple. It is just as simple as extracting the data from MySQL, exporting it on the CSV, and then putting it on the S3 for the sales button. It is as simple as extracting the data from the MySQL Center and exporting it to the ASB. We still use this solution due to the fact that there are a lot of old systems that still use it. The new solution that we use is mostly Airflow. We are still in the transition phase. To be clear, Airflow is a data orchestration tool that mainly uses Python. Everything from the ETL, all the way to the scheduling and the monitoring of any issues. It's in one system and entirely on Airflow.

RK
Senior Data Analyst at a tech services company with 51-200 employees
Real User
2022-05-30T16:19:00Z
May 30, 2022

I use it for ETL. We receive data from our clients and we join the most important information and do many segmentations to help with communication between our product and our clients.

VK
Solution Integration Consultant II at a tech vendor with 201-500 employees
Consultant
Top 20
2022-05-25T17:24:00Z
May 25, 2022

My work primarily revolves around data migration and data integration for different products. I have used them in different companies, but for most of our use cases, we use it to integrate all the data that needs to flow into our product. Also, we can have outbound from our product when we need to send to different, various integration points. We use this product extensively to build ETLs for those use cases. We are developing ETLs for the inbound data into the product as well as outbound to various integration points. Also, we have a number of core ETLs written on this platform to enhance our product. We have two different modes that we offer: one is on-premises and the other is on the cloud. On the cloud, we have an EC2 instance on AWS, then we have installed that EC2 instance and we call it using the ETL server. We also have another server for the application where the product is installed. We use version 8.3 in the production environment, but in the dev environment, we use version 9 and onwards.

RD
COO / CTO at a tech services company with 11-50 employees
Real User
Top 20
2022-05-19T16:25:00Z
May 19, 2022

We are a service delivery enterprise, and we have different use cases. We deliver solutions to other enterprises, such as banks. One of the use cases is for real-time analytics of the data we work with. We take CDC data from Oracle Database, and in real-time, we generate a product offer for all the products of a client. All this is in real-time. The client could be at the ATM or maybe at an agency, and they can access the product offer. We also use Pentaho within our organization to integrate all the documents and Excel spreadsheets from our consultants and have a dashboard for different hours for different projects. In terms of version, currently, Pentaho Data Integration is on version 9, but we are using version 8.2. We have all the versions, but we work with the most stable one. In terms of deployment, we have two different types of deployments. We have on-prem and private cloud deployments.

Learn what your peers think about Hitachi Lumada Data Integration. Get advice and tips from experienced pros sharing their opinions. Updated: November 2022.
657,397 professionals have used our research since 2012.
KM
Data Architect at a tech services company with 1,001-5,000 employees
Reseller
Top 20
2022-05-11T06:00:00Z
May 11, 2022

We use it as an ETL tool. We take data from a source database and move it into a target database. We do some additional processing on our target databases as well, and then load the data into a data warehouse for reports. The end result is a data warehouse and the reports built on top of that. We are a consulting company and we implement it for clients.

Anton Abrarov - PeerSpot reviewer
Project Leader at a mining and metals company with 10,001+ employees
Real User
Top 10
2022-05-11T04:07:00Z
May 11, 2022

The company where I was working previously was using this product. We were using it for ETL process management. It was like a data flow automatization. In terms of deployment, we were using an on-premise model because we had sensitive data, and there were some restrictions related to information security.

RE
Data Architect at a consumer goods company with 1,001-5,000 employees
Real User
2022-05-10T13:38:00Z
May 10, 2022

We use it for orchestration and as an ETL tool to move data from one environment to another, including moving data from on-premises to the cloud and moving operational data from different source systems into the data warehouse.

RV
CDE & BI Delivery Manager at a tech services company with 501-1,000 employees
Real User
Top 20
2022-05-02T05:34:00Z
May 2, 2022

I just use it as an ETL. It is a tool that helps me work with data so I can solve any of my production problems. I work with a lot of databases. Therefore, I use this tool to keep information organized. I work with a virtual private cloud (VPC) and VPN. If I work in the cloud, I use VPC. If I work on-premises, I work with VPNs.

Tomasz Rabong - PeerSpot reviewer
Client Engagement Leader at Sanmargar Team
User
2022-04-20T09:49:29Z
Apr 20, 2022

We use PDI for complicated data transformations. 


We also use our tool ( www.metastudiodrm.com ) which is integrated with PDI to speed up complex parametrization of ETL jobs. 

Renan Guedert - PeerSpot reviewer
Business Intelligence Specialist at a recruiting/HR firm with 11-50 employees
Real User
Top 20
2022-04-12T20:49:00Z
Apr 12, 2022

It was our principle to make the whole ETL and data warehousing on our projects. We created a whole step for collecting all the raw data from APIs and other databases from flat files, like Excel files, CSV files, and JSON files, to do the whole transformation and data preparation, then model the data and put it in SQL Server and integration services. For business intelligence projects, it is sometimes pretty good, when you are extracting something from the API, to have a step to transform the JSON file from the API to an SQL table. We use it heavily as a virtual machine running on Windows. We have also installed the open-source version on the desktop.

José Orlando Maia - PeerSpot reviewer
Data Engineer at a tech services company with 201-500 employees
Real User
Top 10
2022-04-11T18:31:00Z
Apr 11, 2022

My primary use case is to provide integration with my source systems, such as ERP systems and SAP systems, and web-based systems, having them primarily integrate with my data warehouse. For this process, I use ETL to treat and gather all the information from my first system, then consolidate it in my data warehouse.

Jacopo Zaccariotto - PeerSpot reviewer
Head of Data Engineering at InfoCert
Real User
Top 20
2022-04-05T10:31:00Z
Apr 5, 2022

We use Pentaho for small ETL integration jobs and cross-storage analytics. It's nothing too major. We have it deployed on-premise, and we are still on the free version of the product. In our case, processing takes place on the virtual machine where we installed Pentaho. We can ingest data from different on-premises and cloud locations. We still don't carry out the data processing phase inside a different environment from where the VM is running.

Ryan Ferdon - PeerSpot reviewer
Senior Data Engineer at Burgiss
Real User
Top 10
2022-03-24T15:23:00Z
Mar 24, 2022

We used it for ETL to transform data from flat files, CSV files, and database. We used PostgreSQL for the connections, and then we would either import it into our database if the data was in from clients, or we would export it to files if clients wanted files or if a vendor needed to import the files into their database.

Michel Philippenko - PeerSpot reviewer
Project Manager at a computer software company with 51-200 employees
Real User
Top 20
2022-03-06T11:05:00Z
Mar 6, 2022

I was working with Pentaho for a client. I had to implement complicated data flows and extraction. I had to take data from several sources in a PostgreSQL database by reading many tables in several databases, as well as from Excel files. I created some complex jobs. I also had to implement business reports with the Pentaho Report Designer. The client I was working for had Pentaho on virtual machines.

Dale Bloom - PeerSpot reviewer
Credit Risk Analytics Manager at MarketAxess
Real User
Top 10
2022-01-20T22:28:00Z
Jan 20, 2022

The use case is for data ETL on our various data repositories. We use it to aggregate and transform data for visualization purposes for our upper management. Currently, I am using the PDI locally on my laptop, but we are undergoing an integration to push this off. We have purchased the Enterprise edition and have licenses, and we are just working with our infrastructure to get that set up on a server. We haven't yet launched the Enterprise edition, so I've had very minimal touch with Lumada, but I did have an overview with one of the engineers as to how to use the customer portal in terms of learning documentation. So, the documentation and support are basically the two main areas that I've been using it for. I haven't piped any data or anything through it. I've logged in a couple of times to the customer portal, and I've pretty much been using it as support functionality. I have been submitting requests to understand more about how to get everything to be working for the Enterprise edition. So, I have been using the Lumada customer portal mostly for Pentaho Data Integration.

Tomasz Rabong - PeerSpot reviewer
Client Engagement Leader at Sanmargar Team
User
2022-01-20T12:00:58Z
Jan 20, 2022

We use it for ETL processes with clients. We integrate it with www.metastudiodrm.com due to lack of data dictionaries management in Pentaho.

NA
Systems Analyst at a university with 5,001-10,000 employees
Real User
Top 20
2021-12-22T20:41:00Z
Dec 22, 2021

We use it as a data warehouse between our HR system and our student system, because we don't have an application that sits in between them. It's a data warehouse that we do our reporting from. We also have integrations to other, isolated apps within the university that we gather data from. We use it to bring that into our data warehouse as well.

Tracy Gettings - PeerSpot reviewer
Analytics Team Leader at HealtheLink
Real User
Top 20
2021-12-22T20:35:00Z
Dec 22, 2021

We use it to connect to multiple databases and generate reporting. We also have ETL processes running on it. Portions of it are in AWS, but we also have desktop access.

Dan Peacock - PeerSpot reviewer
Enterprise Data Architect at a manufacturing company with 201-500 employees
Real User
Top 10
2021-12-14T21:23:00Z
Dec 14, 2021

We mainly use Lumada to load our operational systems into our data warehouse, but we also use it for monthly reporting out of the data warehouse, so it's to and from. We use some of Lumada's other features within the business to move data around. It's become quite the Swiss army knife. We're primarily doing batch-type reports that go out. Not many people want to sift through data and pick it to join it in other things. There are a few, but again, I usually wind up doing it. The self-serve feature is not as big a seller to me because of our user base. Most of the people looking at it are salespeople. Lumada has allowed us to interact with our employees more effectively and compensate them properly. One of the cool aspects is that we use it to generate commissions for our salespeople and bonuses for our warehouse people. It allows us to get information out to them in a timely fashion. We can also see where they're at and how they're doing. The process that Lumada replaced was arcane. The sentiment among our employees, particularly the warehouse personnel, was that it was punitive. They would say, "I didn't get a bonus this month because the warehouse manager didn't like me." Now we can show them the numbers and say, "You didn't get a bonus because you were slacking off compared to everybody else." It's allowed us to be very transparent in how we're doing these tasks. Previously, that was all done behind the vest. I want people to trust the numbers, and these tools allow me to do that because I can instantly show that the information is correct. That is a huge win for us. When we first rolled it out, I spent a third of my time justifying the numbers. Now, I rarely have to do that. It's all there, and they can see it, so they trust what the information is. If something is wrong, it's not a case of "Why is this being computed wrong?" It's more like: "What didn't report?" We have 200 stores that communicate to our central hub each night. If one of them doesn't send any data, somebody notices now. That wasn't the case in the past. They're saying, "Was there something wrong with the store?" instead of, "There's something wrong with the data." With Lumada's single end-to-end data management, we no longer need some of the other tools that we developed in-house. Before that, everything was in-house. We had a build-versus-buy mentality. It simplified many aspects that we were already doing and made that process quicker. It has made a world of difference. This is primarily anecdotal, but there were times where I'd get an IM from one of the managers saying, "I'm looking at this in the sales meeting and calling out what somebody is saying. I want to make sure that this is what I'm seeing." I made a couple of people mad. Let's say they're no longer working for us, and we'll leave it at that. If you're not making somebody mad, you're not doing BI right. You're not asking the right questions. Having a single platform for data management experience is crucial for me. It lets me know when something goes wrong from a data standpoint. I know when a load fails due to bad data and don't need to hunt for it. I've got a status board, so I can say, "Everything looks good this morning." I don't have to dig into it, and that has made my job easier. What's more, I don't waste time arguing about why the numbers on this report don't match the ones on another because it's all coming from the same place. Before, they were coming from various places, and they wouldn't match for whatever reason. Maybe there's some piece of code in one report that isn't being accounted for in the other. Now, they're all coming from the same place. So everything is on the same level.

PhilipRobinson - PeerSpot reviewer
Senior Engineer at a comms service provider with 501-1,000 employees
Real User
Top 10
2021-12-13T16:49:00Z
Dec 13, 2021

We're using it for data warehousing. Typically, we collect data from numerous source systems, structure it, and then make it available to drive business intelligence, dashboard reporting, and things like that. That's the main use of it. We also do a little bit of moving of data from one system to another, but the data doesn't go into the warehouse. For instance, we sync the data from one of our line of business systems into our support help desk system so that it has extra information there. So, we do a few point-to-point transfers, but mainly, it is for centralizing data for data warehousing. We use it just as a data integration tool, and we haven't found any problems. When we have big data processing, we use Amazon Redshift. We use Pentaho to load the data into Redshift and then use that for big data processing. We use Tableau for our reporting platform. We've got quite a number of users who are experienced in it, so it is our chosen reporting platform. So, we use Pentaho for the data collection and data modeling aspect of things, such as developing facts and dimensions, but we then publicly export that data to Redshift as a database platform, and then we use Tableau as our reporting platform. I am using version 8.3, which was the latest long-term support version when I looked at it the last time. Because this is something we use in production, and it is quite core to our operations, we've been advised that we just stick with the long-term support versions of the product. It is in the cloud on AWS. It is running on an EC2 instance in AWS Cloud.

Oscar Mejia - PeerSpot reviewer
IT-Services Manager & Solution Architect at Stratis
Real User
Top 5Leaderboard
2021-07-14T17:56:24Z
Jul 14, 2021

We basically receive information from our clients via Excel. We take this information and transform it in order to create some data marks. With this information, on these processes we are running right now, we receive new data every day. The solution processes the Excels and creates a data mark for them. While we read the data and transform it as well as put it in a database, in order to explore the information, we need an analytics solution for that - and that is typically Microsoft's solution, Power BI.

VM
Technical Manager at a computer software company with 51-200 employees
Real User
Top 5Leaderboard
2021-02-22T14:48:00Z
Feb 22, 2021

We have an event planning system, which enables us to obtain a large report. It includes data Mart or data warehouse data. This is where we take data from the IT online system and pass it to the data warehouse. Then, from the data warehouse, they generate reports. We have 6 developers who are using the Panel Data Integrator, but there are no end users. We deploy the product, and the customer uses it for reporting. We have one person who undertakes a regular maintenance activity when it is required.

AG
Assistant General Manager at DTDC Express Limited
Real User
Top 20
2021-01-08T09:43:51Z
Jan 8, 2021

We are using just the simple features of this product. We're using it as a data warehouse and then for building dimensions.

VD
Specialist in Relational Databases and Nosql at a computer software company with 5,001-10,000 employees
Real User
2020-07-15T07:11:00Z
Jul 15, 2020

The most common use for the solution is gathering data from our databases or files in order to gather them into a different database. Another common use is to compare data between different databases. Due to a lack of integrity, you can attach these to synchronization issues.

Related Questions
Netanya Carmi - PeerSpot reviewer
Content Manager at PeerSpot (formerly IT Central Station)
Dec 6, 2022
Are you satisfied with the results this tool gives you for the purposes you use it for?
See 2 answers
Dovid Gelber - PeerSpot reviewer
Tech blogger
Dec 6, 2022
The primary use of this product at my company is for data warehousing. We utilize it as a data integration tool and it works fine, as long as you do not overload it - if so you may have to resort to another product. But Hitachi Lumada Data Integration is great for what it is made for. We collect data from various sources through it and structure it. Through the tool, we access reporting of the data and increase business intelligence. In my opinion, it does a good job for the value. We have used competitors of this solution before and this one ranks on the top according to my colleagues and I.
Beth Safire - PeerSpot reviewer
Tech Blogger
Dec 6, 2022
My company has used this product to transform data from databases, CSV files, and flat files. It really does a good job. We were most satisfied with the results in terms of how many people could use it. We have ETL developers but most of our team varies in computer background. This tool is low-code and has many visualization features that make using it extremely easy! All the things we have utilized it for were easy to explain and perform by most of my teammates and the company was very satisfied that everyone could access the data this tool integrated.
Netanya Carmi - PeerSpot reviewer
Content Manager at PeerSpot (formerly IT Central Station)
Dec 6, 2022
Are you completely satisfied with this product or is there something you think stops its seamless data integration?
See 2 answers
Dovid Gelber - PeerSpot reviewer
Tech blogger
Dec 6, 2022
In my opinion, the reporting side of this tool needs serious improvements. In my previous company, we worked with Hitachi Lumada Data Integration and while it does a good job for what it’s worth, it can definitely up its reporting features. While I used it, there were plenty of basic components I felt that it missed. For example, it did not provide the option to search the repository for a report. I feel like this is a huge miss out since it would save on a lot of manual searching. If they upped their game there, I would be among the first to recommend it.
Beth Safire - PeerSpot reviewer
Tech Blogger
Dec 6, 2022
I do not see any big problems with this solution. It works very well for my company, a medium-sized one. We have been using it for a while now and the only issue I have caught during that time is with caching. We had to integrate a larger data set once and then we encountered a slight problem with the tool. We were used to its high speed with all operations but with the larger data set, things got slower than usual. I am not sure if it was a one-time thing or if it is common, so you may want to look into that if you are a big data company.
Download Free Report
Download our free Hitachi Lumada Data Integration Report and get advice and tips from experienced pros sharing their opinions. Updated: November 2022.
DOWNLOAD NOW
657,397 professionals have used our research since 2012.