We performed a comparison between AWS Glue and Pentaho Data Integration and Analytics based on real PeerSpot user reviews.
Find out in this report how the two Cloud Data Integration solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."The solution helps organizations gain flexibility in defining the structure of the data."
"The most valuable feature of AWS Glue is scalability."
"AWS Glue is a good solution for developers, they have the ability to write code in different languages and other software."
"It's fairly straightforward as a product; it's not very complicated."
"Our entire use case was very easily handled or solved using this solution."
"The solution's technical support is good. Whenever we raise a use case where we face an issue in our company, we get a response from the solution's technical team."
"Its user interface is quite good. You just need to choose some options to create a job in AWS Glue. The code-generation feature is also useful. If you don't want to customize it and simply want to read a file and store the data in the database, it can generate the code for you."
"The solution is stable and reliable."
"I can create faster instructions than writing with SQL or code. Also, I am able to do some background control of the data process with this tool. Therefore, I use it as an ELT tool. I have a station area where I can work with all the information that I have in my production databases, then I can work with the data that I created."
"The abstraction is quite good."
"One of the valuable features is the ability to use PL/SQL statements inside the data transformations and jobs."
"Pentaho Data Integration is quite simple to learn, and there is a lot of information available online."
"The way it has improved our product is by giving our users the ability to do ad hoc reports, which is very important to our users. We can do predictive analysis on trends coming in for contracts, which is what our product does. The product helps users decide which way to go based on the predictive analysis done by Pentaho. Pentaho is not doing predictions, but reporting on the predictions that our product is doing. This is a big part of our product."
"One of the most valuable features is the ability to create many API integrations. I'm always working with advertising agents and using Facebook and Instagram to do campaigns. We use Pentaho to get the results from these campaigns and to create dashboards to analyze the results."
"I can use Python, which is open-source, and I can run other scripts, including Linux scripts. It's user-friendly for running any object-based language. That's a very important feature because we live in a world of open-source."
"Its drag-and-drop interface lets me and my team implement all the solutions that we need in our company very quickly. It's a very good tool for that."
"Glue could perform better. It sometimes takes too long to test a Glue job. Google Cloud Platform offers more Python scripts than AWS."
"Currently, it supports only two languages in the background: Python and Scala. From our customization point of view, it would be helpful if it can also support Java in the background."
"One area that could be improved is the ETL view. The drag-and-drop interface is not as user-friendly as some other ETL tools."
"Overall, I consider the technical support to be fine, although the response time could be faster in certain cases."
"The mapping area and the use of the data catalog from Glue could be better."
"It would be better if it were more user-friendly. The interesting thing we found is that it was a little strange at the beginning. The way Glue works is not very straightforward. After trying different things, for example, we used just the console to create jobs. Then we realized that things were not working as expected. After researching and learning more, we realized that even though the console creates the script for the ETL processes, you need to modify or write your own script in Spark to do everything you want it to do. For example, we are pulling data from our source database and our application database, which is in Aurora. From there, we are doing the ETL to transform the data and write the results into Redshift. But what was surprising is that it's almost like whatever you want to do, you can do it with Glue because you have the option to put together your own script. Even though there are many functionalities and many connections, you have the opportunity to write your own queries to do whatever transformations you need to do. It's a little deceiving that some options are supposed to work in a certain way when you set them up in the console, but then they are not exactly working the right way or not as expected. It would be better if they provided more examples and more documentation on options."
"The solution should offer features for streaming data in addition to batching data."
"The start-up time is really high right now. For instance, when you start up a new job, you have to wait for five or eight minutes before it starts. If the start-up time is reduced to one or two minutes, it will be great. It will be better to have a direct linkage to Redshift in AWS. If we can use data catalogs from Redshift, it will be so easy to create some data catalogs. Currently, we can only use data catalogs from S3."
"It's not very stable, at least not in the case of the community edition. I'm working with the community edition right now and I think perhaps it is because of that it is not very stable, it causes the system to sometimes hang. I'm not sure if this is the case for pair tiers."
"The product needs more plugins."
"Since Hitachi took over, I don't feel that the documentation is as good within the solution. It used to have very good help built right in."
"As far as I remember, not all connectors worked very well. They can add more connectors and more drivers to the process to integrate with more flows."
"I would like to see more improvements with AS400 DB2."
"I was not happy with the Pentaho Report Designer because of the way it was set up. There was a zone and, under it, another zone, and under that another one, and under that another one. There were a lot of levels and places inside the report, and it was a little bit complicated. You have to search all these different places using a mouse, clicking everywhere... each report is coded in a binary file... You cannot search with a text search tool..."
"I'm still in the very recent stage concerning Pentaho Data Integration, but it can't really handle what I describe as "extreme data processing" i.e. when there is a huge amount of data to process. That is one area where Pentaho is still lacking."
"Its basic functionality doesn't need a whole lot of change. There could be some improvement in the consistency of the behavior of different transformation steps. The software did start as open-source and a lot of the fundamental, everyday transformation steps that you use when building ETL jobs were developed by different people. It is not a seamless paradigm. A table input step has a different way of thinking than a data merge step."
More Pentaho Data Integration and Analytics Pricing and Cost Advice →
AWS Glue is ranked 1st in Cloud Data Integration with 37 reviews while Pentaho Data Integration and Analytics is ranked 16th in Data Integration with 48 reviews. AWS Glue is rated 7.8, while Pentaho Data Integration and Analytics is rated 8.0. The top reviewer of AWS Glue writes "Provides serverless mechanism, easy data transformation and automated infrastructure management". On the other hand, the top reviewer of Pentaho Data Integration and Analytics writes "It's flexible and can do almost anything I want it to do". AWS Glue is most compared with AWS Database Migration Service, Informatica PowerCenter, SSIS, Informatica Cloud Data Integration and Talend Open Studio, whereas Pentaho Data Integration and Analytics is most compared with Azure Data Factory, SSIS, Talend Open Studio, Oracle Data Integrator (ODI) and SAP Data Services. See our AWS Glue vs. Pentaho Data Integration and Analytics report.
See our list of best Cloud Data Integration vendors.
We monitor all Cloud Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.