We performed a comparison between AWS Glue and Pentaho Data Integration and Analytics based on real PeerSpot user reviews.
Find out in this report how the two Cloud Data Integration solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."The solution helps organizations gain flexibility in defining the structure of the data."
"The solution is stable and reliable."
"I also like that you can add custom libraries like JAR files and use them. So, the ability to use a fast processing engine and embed basic jobs easily are significant advantages."
"The key role for Glue is that it hosts our metadata before rolling out our actual data. This is the major advantage of using this solution and our clients client have been very satisfied with it."
"The most valuable features currently are glue studio, jobs, and triggers."
"AWS Glue's most valuable features are the data catalog, including crawlers and tables, and Glue Studio, which means you don't have to use custom code."
"We no longer had to worry much about infrastructure management because AWS Glue is serverless, and Amazon takes care of the underlying infrastructure."
"Transformations are valuable because you can modify or override complex data logic from an open source or Spark to solve issues."
"Sometimes, it took a whole team about two weeks to get all the data to prepare and present it. After the optimization of the data, it took about one to two hours to do the whole process. Therefore, it has helped a lot when you talk about money, because it doesn't take a whole team to do it, just one person to do one project at a time and run it when you want to run it. So, it has helped a lot on that side."
"One of the most valuable features is the ability to create many API integrations. I'm always working with advertising agents and using Facebook and Instagram to do campaigns. We use Pentaho to get the results from these campaigns and to create dashboards to analyze the results."
"Lumada has allowed us to interact with our employees more effectively and compensate them properly. One of the cool things is that we use it to generate commissions for our salespeople and bonuses for our warehouse people. It allows us to get information out to them in a timely fashion. We can also see where they're at and how they're doing."
"It's my understanding that the product can scale."
"It has a really friendly user interface, which is its main feature. The process of automating or combining SQL code with some databases and doing the automation is great and really convenient."
"It makes it pretty simple to do some fairly complicated things. Both I and some of our other BI developers have made stabs at using, for example, SQL Server Integration Services, and we found them a little bit frustrating compared to Data Integration. So, its ease of use is right up there."
"The amount of data that it loads and processes is good."
"The graphical nature of the development interface is most useful because we've got people with quite mixed skills in the team. We've got some very junior, apprentice-level people, and we've got support analysts who don't have an IT background. It allows us to have quite complicated data flows and embed logic in them. Rather than having to troll through lines and lines of code and try and work out what it's doing, you get a visual representation, which makes it quite easy for people with mixed skills to support and maintain the product. That's one side of it."
"The solution's visual ETL tool is of no use for actual implementation."
"The start-up time is really high right now. For instance, when you start up a new job, you have to wait for five or eight minutes before it starts. If the start-up time is reduced to one or two minutes, it will be great. It will be better to have a direct linkage to Redshift in AWS. If we can use data catalogs from Redshift, it will be so easy to create some data catalogs. Currently, we can only use data catalogs from S3."
"If there's a cluster-related configuration, we have to make worker notes, which is quite a headache when processing a large amount of data."
"The solution’s stability could be improved."
"AWS Glue would be improved by making it easier to switch from single to multi-cloud."
"It would be better if it were more user-friendly. The interesting thing we found is that it was a little strange at the beginning. The way Glue works is not very straightforward. After trying different things, for example, we used just the console to create jobs. Then we realized that things were not working as expected. After researching and learning more, we realized that even though the console creates the script for the ETL processes, you need to modify or write your own script in Spark to do everything you want it to do. For example, we are pulling data from our source database and our application database, which is in Aurora. From there, we are doing the ETL to transform the data and write the results into Redshift. But what was surprising is that it's almost like whatever you want to do, you can do it with Glue because you have the option to put together your own script. Even though there are many functionalities and many connections, you have the opportunity to write your own queries to do whatever transformations you need to do. It's a little deceiving that some options are supposed to work in a certain way when you set them up in the console, but then they are not exactly working the right way or not as expected. It would be better if they provided more examples and more documentation on options."
"On occasion, the solution's dashboard reports that a project failed due to runtime but it actually succeeded."
"It fails to handle massive databases acquired from various sources."
"I would like to see improvement when it comes to integrating structured data with text data or anything that is unstructured. Sometimes we get all kinds of different files that we need to integrate into the warehouse."
"If you develop it on MacBook, it'll be quite a hassle."
"Since Hitachi took over, I don't feel that the documentation is as good within the solution. It used to have very good help built right in."
"The web interface is rusty, and the biggest problem with Pentaho is debugging and troubleshooting. It isn't easy to build the pipeline incrementally. At least in our case, it's hard to find a way to execute step by step in the debugging mode."
"Some of the scheduling features about Lumada drive me buggy. The one issue that always drives me up the wall is when Daylight Savings Time changes. It doesn't take that into account elegantly. Every time it changes, I have to do something. It's not a big deal, but it's annoying."
"There is not a data quality or MDM solution in the Pentaho DI suite."
"I have been facing some difficulties when working with large datasets. It seems that when there is a large amount of data, I experience memory errors."
"As far as I remember, not all connectors worked very well. They can add more connectors and more drivers to the process to integrate with more flows."
More Pentaho Data Integration and Analytics Pricing and Cost Advice →
AWS Glue is ranked 1st in Cloud Data Integration with 37 reviews while Pentaho Data Integration and Analytics is ranked 16th in Data Integration with 48 reviews. AWS Glue is rated 7.8, while Pentaho Data Integration and Analytics is rated 8.0. The top reviewer of AWS Glue writes "Provides serverless mechanism, easy data transformation and automated infrastructure management". On the other hand, the top reviewer of Pentaho Data Integration and Analytics writes "It's flexible and can do almost anything I want it to do". AWS Glue is most compared with AWS Database Migration Service, Informatica PowerCenter, SSIS, Informatica Cloud Data Integration and Talend Open Studio, whereas Pentaho Data Integration and Analytics is most compared with Azure Data Factory, SSIS, Talend Open Studio, Oracle Data Integrator (ODI) and SAP Data Services. See our AWS Glue vs. Pentaho Data Integration and Analytics report.
See our list of best Cloud Data Integration vendors.
We monitor all Cloud Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.