What is our primary use case?
My main use case for Pentaho Data Integration and Analytics focuses primarily on integration. We utilize it to handle data from various front-end sources, such as Oracle databases, SAP data, and Salesforce and FIS data.
When I refer to integration, I mean connecting these sources and transforming the data before sending it elsewhere, such as loading the data into Snowflake. After loading the data into Snowflake, we perform transformations, create nodes, aggregations, and visualizations. We schedule tasks daily to transform the data received from the front-end and provide it to the data warehouse for further reporting use and integration into Salesforce.
What is most valuable?
Pentaho Data Integration and Analytics has positively impacted my organization by saving costs, managing large data sets, and integrating multiple sources. It assists in integrating data from Excel and CSV files quickly. It automates the data workflow, including extraction, cleansing, and loading into warehouses for BI reporting purposes, while also removing duplicates, validating data, and standardizing formats, enabling real-time decision-making.
Pentaho Data Integration and Analytics is highly scalable, able to accommodate growing data volumes and more complex workflows seamlessly.
Pentaho Data Integration and Analytics is generally stable; crashes are rare and only occurred during upgrades or due to JVM memory issues, which are addressed in newer versions.
The best features of Pentaho Data Integration and Analytics include a very friendly user interface. It has a no-code interface, allowing us to modify and create nodes. For example, if we need a Salesforce node that is not available, we can create our own nodes using JavaScript, supported by a drag-and-drop interface.
The no-code interface and custom node creation in Pentaho Data Integration and Analytics have greatly facilitated our work. We can easily search for any nodes from the panel and drag and drop them onto the main canvas. When performing tasks, the code is generated automatically, capturing the SQL queries and parameters seamlessly. Custom node creation allows flexibility in building complex workflows that streamline connection parts, reducing the time spent on writing and debugging and minimizing dependency on IT and specialized developers.
Additionally, we have different job schedulers available. Trigger tasks are sometimes manual, which means we can schedule them as needed. Pentaho Data Integration and Analytics allows us to create trigger tasks, though API calls need some improvement. The cloud version of Pentaho Data Integration and Analytics is much more advantageous than the earlier server-based remote workspace, and the Kettle files we use provide informative transformations. The integration of SQL queries, Salesforce data, and Oracle data is quite effective.
Pentaho Data Integration and Analytics encourages enterprise code, minimizing errors and providing excellent reusability. It fosters collaboration across different environments and modules, covering governance and standardization effectively. For data security, it includes robust features for maintaining PII data confidentiality, and the maintenance aspect is simplified with scheduled logs that help identify and resolve errors more efficiently.
What needs improvement?
In terms of improvements, while creating on a remote service, we face challenges related to limited memory, which calls for optimization. Additionally, Pentaho Data Integration and Analytics employs row-by-row operations that could benefit from block processing features for sorting and grouping. This improvement would enhance speed and reduce memory usage. We should also explore more effective partitioning for parallel processing and fine-tuning database connections to reduce load times and improve ETL speed.
To enhance the experience further, automation within Pentaho Data Integration and Analytics needs to be more reliable, with a better retry mechanism for failed tasks. I believe that implementing version control for Kettle and job files automatically rather than manually would be beneficial. Additionally, extending the capabilities of plugins and extensions would enhance functionality, especially as new globalization applications arise.
For how long have I used the solution?
I have been working for more than one and a half years on Pentaho Data Integration and Analytics, focusing on integration and transformation for Power BI reporting and visualization, data ingestion, and transformation purposes. Pentaho Data Integration and Analytics provides us with a robust integration platform, complete with schedulers and job plans that enable us to run tasks daily, schedule tasks, and trigger tasks across various platforms like SAP, Oracle, and Salesforce, also integrating with Snowflake and PostgreSQL.
What do I think about the stability of the solution?
Pentaho Data Integration and Analytics is generally stable; crashes are rare and only occurred during upgrades or due to JVM memory issues, which are addressed in newer versions.
What do I think about the scalability of the solution?
Pentaho Data Integration and Analytics is highly scalable, able to accommodate growing data volumes and more complex workflows seamlessly.
How are customer service and support?
Our experience with customer support has been positive; they provide effective solutions and take the time to understand our business requirements, offering appropriate recommendations.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
Previously, we used an internal tool called IDQMS, which had memory issues and limitations, prompting us to switch to Pentaho Data Integration and Analytics. Pentaho Data Integration and Analytics provided a free development studio, an environment-friendly tool, and excellent scheduling and transformation features, which enhanced our data quality management, especially critical as we transitioned to AWS S3.
What was our ROI?
I can testify to the return on investment with metrics regarding time saved; we have increased our efficiency by about 20 to 30 percent due to the swift migration processes facilitated by the tool. These enhancements make it easier for us to make prompt decisions regarding technology adoption.
Which other solutions did I evaluate?
Before choosing Pentaho Data Integration and Analytics, we evaluated other options including SnapLogic and Talend. We found them to be more expensive, and while their workflows were similar, Pentaho Data Integration and Analytics met our business requirements more feasibly in terms of cost-effectiveness and user-friendliness for developers.
What other advice do I have?
My advice for others considering Pentaho Data Integration and Analytics is to first assess the memory and cloud version options, followed by checking scheduling and automation capabilities. Governance and data security are vital, as is error handling; thus, it is essential to ensure proper debugging and retry mechanisms are in place, as well as exploring command line automation for transformations and job files.
Pentaho Data Integration and Analytics encourages enterprise code, minimizing errors and providing excellent reusability. It fosters collaboration across different environments and modules, covering governance and standardization effectively. For data security, it includes robust features for maintaining PII data confidentiality, and the maintenance aspect is simplified with scheduled logs that help identify and resolve errors more efficiently. I would rate this product an 8 out of 10.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Other