If you were talking to someone whose organization is considering Pentaho Data Integration, what would you say?
How would you rate it and why? Any other tips or advice?
We put it on our Uber2/EC2 server, however, when we developed it, it was put on our local server. We deploy it onto our EC2 server. We bundle it on our sales scripts and then the sales script will are run by Jenkins. I'd rate the solution a seven out of ten.
I always recommend Pentaho for working with automated processes and to do API integrations.
The query performance depends on the database. It is more likely to be good if you have a good database server with all the indexes and bells and whistles of a database. However, from a data integration tool perspective, I am not seeing any issues with respect to query performance. We do not build visualization features that much with Hitachi. For the reporting purposes, we have been using one of the tools from the product, then prepare the data accordingly. We use this for all the projects that we are currently running. Going forward, we will be sticking only to using this ETL tool. We haven't had any roadblocks using Lumada Data Integration. On a scale of one to 10, I would recommend Hitachi Vantara to a friend or colleague as a nine. If you need to build ETLs quickly in a low-code environment, where you don't want to spend a lot of time on the development side of things but it is a little difficult to find resources, then train them in this product. It is always worth that effort because it ends up saving a lot of time and resources on the development side of projects. Overall, I would rate the product as a nine out of 10.
I have a good knowledge of this solution, and I would highly recommend it to a friend or colleague. It provides a single, end-to-end data management experience from ingestion to insights, but we have to create different pipelines to generate the metadata management. It's a little bit laborious to work with Pentaho, but we can do that. I've heard a lot of people say it's complicated to use, but Pentaho is one of the few tools where you can do anything you can imagine. It is very good and quite simple, but you need to have the right knowledge and the right people to handle the tool. The skills needed to create a business intelligence solution or a data integration solution with Pentaho are problem-solving logic and maybe database knowledge. You can develop new steps, and you can develop new functionality in Pentaho Lumada, but you must have the knowledge of advanced Java programming. Our experience, in general, is very good. Overall, I am satisfied with our decision to purchase Hitachi's product services and solutions. My satisfaction level is at an eight out of ten. I am not much aware of the roadmap of Hitachi Vantara. I don't read much about that. I would rate this solution an eight out of ten.
Hitachi Vantara's roadmap is promising. They came up with Lumada and it seems that they do have some ideas on how to make their product a bit more attractive than it currently is. I'm fairly satisfied with using Pentaho Data Integration. It's more or less okay. When it comes to all the other parts, like Pentaho reports and Pentaho dashboards, things could be better there. The biggest lesson I've learned from using this solution is that a cheap, open-source tool can sometimes be even more efficient than some of the high-priced enterprise ETL tools. Overall, the solution is okay, considering the low cost. It has all of the main things that you would expect it to have, from a technical perspective.
I would recommend using this product for data engineering and Extract, Transform, and Load (ETL) processes. I would rate it an eight out of ten.
The performance of Pentaho, like any other ETL tool, starts from the database side, once you write good, optimized scripts. The optimization of Pentaho depends on the hardware it's sitting on. Once you have enough RAM on your VM, you are in a position to run any workloads. Overall it is an awesome tool. We are satisfied with our decision to go with Hitachi's product. It's like any other ETL tool. It's like SQL Server Integration Services, Informatica, or DataStage. On a scale of one to 10, where 10 is best, I would give it a nine in terms of recommending it to a colleague.
I would rate this solution as eight out of 10. One of the best things about the solution is that it is free. I used to sell Pentaho. It has a lot of pros and cons. From my side, there are more pros than cons. There isn't one tool that can do everything that you need, but this tool is one of those tools that helps you to complete your tasks and it is pretty integrable with other tools. So, you can switch Pentaho on and off from different tools and operating systems. You can use it in Unix, Linux, Windows, and Mac. If you know how to develop different things and are very good at Java, you can create your own connectors. You can create a lot of things. It is a very good tool if you need to work with data. There isn't a database that you can't manage with this tool. You can work with it and manage all the data that you want to manage.
For someone who wants simple solutions, open-source tools are very perfect for someone who isn't a programmer or knowledgeable about technology. In one week, you can try to understand this solution and do your first project. In my opinion, it is the best tool for people starting out. Lumada is a great tool. I would rate it as a straight seven out of 10. It gets the work done. The open-source version doesn't work well with big data sources, but there is a lot of flexibility and liberty to do everything you want and need. If the open-source version worked better with big data, then I would give it a straight eight since there is always room for improvement. Sometimes when debugging, some errors can be pretty difficult. It is a tool in principle, when you are starting business intelligence and data engineering, to understand everything that is going on.
I don't use many templates. I use the solution based on a case-by-case basis. Considering that Lumada is a free tool, I would rate it as nine out of 10 for the free version.
I rate Pentaho eight out of 10. It's a perfect pick for data teams that are getting started and more business-oriented data teams. It's good for a data analyst who isn't so tech-savvy. It is flexible and easy to use.
A good thing about Pentaho is that it's not that hard to learn, from an ETL perspective. The way that Pentaho has things laid out they are pretty intuitively organized in the panel: Your input—flat file, CSV, or database—and then the transformation nodes. It was a good baseline and a good open-source tool to use to learn ETL. It's good to have exposure to multiple tools because every company has different needs and, depending on their needs, it would be a different recommendation. The lessons I learned using it: Make sure you clear the cache when you open the program. Also, if there are any critical points in your flow that are dependent upon previous nodes, make sure that you put blocking steps in. Make sure you also set up the job environment variables correctly, so that Pentaho runs. It worked for what we did but, personally, I wouldn't use it. In the new company I'm working for, we are using large financial data sets and I'm not so sure it could handle that. I know there's an Enterprise version, but I didn't use that. The solution can handle ingestion through to export, but you still have to have a batch or Python script to run it with an automation process. I don't know if the Lumada version has something different, but with what I was using, you were simply building the pipeline, but the pipeline outside of the program had to be scheduled and run, and we had other tools to check that the output was as expected. We used version 7 for a while and we were reluctant to upgrade to version 9 because we had an 834 configuration, meaning a government standardized feed that our developer spent two years building. There was an issue whenever we tried to run those feeds on version 9, so we were reluctant to upgrade because things were working on 7. We ended up finding out that it didn't take much work for us to fix the problem that we were having with version 9 and, eventually, we moved to it. With every version upgrade of anything, there are going to be pros and cons. Depending on what someone needs it for, if it's a small project and they don't want to pay for an enterprise solution, I would recommend it and give it a nine out of 10. The finicky things were a little frustrating, but the fact that it's free, can be deployed easily, and that it can fulfill a lot of things on a small scale, are plusses. If it were for a larger company that needed an enterprise solution, I wouldn't recommend it. In that case, it would be one out of 10. For a smaller company or one with a smaller budget, a company that doesn't have highly complex ETL needs, Pentaho is definitely a great option. If a company has the budget and has really specific needs and large data sets, I would suggest looking elsewhere.
I would fully recommend Pentaho. I have already recommended it to some colleagues. It's a good product with good performance. Overall, I was very happy with it. It was complicated, but that is part of my job. I was happy with the result and the stability. The Data Integration product is simpler than the Report Designer. I would rate the Data Integration at 10 out of 10 and the Report Designer at nine, because of the graphical interface.
If you don't have the comfort level for the architectural build-out, then you can definitely opt for the white gloves treatment with an additional cost of about 50,000 to help with the integration and implementation effort of it. We chose not to go that route. Therefore, we're using support for any of the fine-tuning questions about making it highly available and other things. I have not used Lumada for creating pipelines. I'm using PDI to help with our data pipelines. Similarly, I am not using its ability to develop and deploy data pipeline templates at this time, and I also haven't used it for single end-to-end data management from ingestion to insight. The biggest lesson that I have learned from using this solution is that the order of operations is critical. Other than that, it has been an absolute treat to use. I've been espousing this product to everybody. I would rate it a 10 out of 10.
It's a great product. The ETL part of the product is really easy to pick up and use. It has a graphical interface with the ability to be more complex via scripting and features that you can add. When looking at Hitachi Vantara's roadmap, the ability to upgrade more easily is one element of it that is important to us. Also, they're going more towards web-based solutions, instead of having local client development tools. If it does go on the web, and it works the same way it works on the client, that would be a nice feature. Currently, because we have these local client development tools, we have to have a VM client for our developers to use, and that makes it a little more tricky. Whereas if they put it on the web, then all our developers would be able to use any desktop and access the web for development. When it comes to the query performance of the solution on large datasets, we haven't had any issues with it. We have one table in our data warehouse that has about 120 million rows and we haven't had any performance issues. The solution gives you the flexibility to deploy it in any environment, whether on-prem or in the cloud. With our particular implementation, we've done a lot of customizations. We have special things that we bolted onto the product, so it's not as easy to put it onto the cloud for us. All of our customizations and bolt-ons end up costing us more because they make upgrades more difficult and time-consuming. We don't use an automated upgrade process. It's manual. We have to do a full reinstall and then apply all our bolt-ons and make sure it still works. If we could automate that process it would certainly reduce our costs. In terms of updating to version 9.2, which is the latest version, we're going to look into it next year and see what level of effort is required and determine how it impacts our current system. They release a new update about every six months, and there is a major release every year or two, so it's quite a fast schedule for updates. Overall, I would rate our satisfaction with our decision to purchase Hitachi products as a seven out of 10. I would definitely recommend the data integration tool but I wouldn't recommend the reporting tool.
My advice would be to take advantage of the training that's offered. The query performance of Lumada on large data sets is good, but the query performance is really only as good as the server. In terms of Hitachi's roadmap, we haven't seen it in a little while. We did have a concern that they're going to be going away from Pentaho and rolling it into another product and we're not quite sure what the result of that is going to be. We don't have a good understanding of what's going to change. That's the concern. We currently only use Pentaho. We don't have other Hitachi products but we're satisfied with it. We would recommend Pentaho.
I rate Lumada nine out of 10. The aspect I like about Lumada is its flexibility. I can make it do pretty much whatever I want. It's not perfect, but I haven't run into a tool that is yet. I haven't used every aspect of it, but there's very little that I can't make it do. I haven't run into a scenario where it couldn't handle a challenge we put in front of it. It's been a solid performer for us. I rarely have a problem that is due to Lumada. The issues I have with my loads are never because of the software. If you plan to implement Lumada, I recommend going to the classes. Don't be afraid to ask dumb questions of support because many of them used to be consultants. They've all been there, done that. One of the guys I talk to regularly lives about 80 miles to the north of me. I have a rapport with him. They're willing to go above and beyond to make you successful.
I would advise taking advantage of using metadata to drive your transformations. You should take advantage of the very nice and easy way in which variable substitution works in a lot of components. If you use a metadata-driven framework in Pentaho, it will allow you to self-document your process flows. At some point, it always becomes a critical aspect of a project. Often, it doesn't crop up until a year or so later, but somebody always comes asking for proof or documentation of exactly what is happening in terms of how something is getting to here and how something is driving a metric. So, if you start off from the beginning by using a metadata framework that self documents that, you'll be 90% of the way in answering those questions when you need to. We are satisfied with our decision to purchase Hitachi's products, services, or solutions. In the low-code space, they're probably reasonably priced. With the serverless architectures out there, there is some competition, and you can do things differently using serverless architecture, which would have an overall lower cost of running. However, the fact that we have so many transformations that we run, and those transformations can be maintained by a team of people who aren't Python developers or Java developers, and our apprentices can use this tool quite easily, is an advantage of it. I'm not too familiar with the overall roadmap for Hitachi Vantara. We're just using the Pentaho data integration products. We don't use the metadata injection aspects of Pentaho mainly because we did have a need for them, but we know they're there. I would rate it a seven out of 10. Its UI is a bit techy and more confusing than some of the other graphical ETL tools, and that's where improvements could be made.
I'm a consultant and an end-user. I downloaded the latest version of the solution. I can't speak to the version number. I'd rate the solution at an eight out of ten.
For newcomers to the product, it is best to start with something simple. You can then scale it up fast as it is not a steep learning curve. If somebody wants to set up a good inbound integration platform, they can use the Panel Data Integrator. It's really simple and easy to use. The online community really helps you with numerous issues, such as licensing and a lot of other things. I would rate Pentaho Data Integration 8 out of 10.
My advice for anybody who is researching this product is that if they want to do batch processing, then this is a good choice. The amount of data that it loads and processes is good. Based on the features that I have used and my experience, I would rate this solution a seven out of ten.
We're just users of the solution. We don't have a professional relationship with the company. The solution is great to use and easy to share with teams via the central repository. It's very functional overall. I'd recommend the solution to other companies. I'd rate the solution eight out of ten.