AWS Glue vs Pentaho Data Integration and Analytics comparison

Cancel
You must select at least 2 products to compare!
Amazon Web Services (AWS) Logo
12,012 views|8,420 comparisons
92% willing to recommend
Hitachi Vantara Logo
3,346 views|1,127 comparisons
94% willing to recommend
Comparison Buyer's Guide
Executive Summary

We performed a comparison between AWS Glue and Pentaho Data Integration and Analytics based on real PeerSpot user reviews.

Find out in this report how the two Cloud Data Integration solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI.
To learn more, read our detailed AWS Glue vs. Pentaho Data Integration and Analytics Report (Updated: March 2024).
768,415 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"The solution helps organizations gain flexibility in defining the structure of the data.""The solution is stable and reliable.""I also like that you can add custom libraries like JAR files and use them. So, the ability to use a fast processing engine and embed basic jobs easily are significant advantages.""The key role for Glue is that it hosts our metadata before rolling out our actual data. This is the major advantage of using this solution and our clients client have been very satisfied with it.""The most valuable features currently are glue studio, jobs, and triggers.""AWS Glue's most valuable features are the data catalog, including crawlers and tables, and Glue Studio, which means you don't have to use custom code.""We no longer had to worry much about infrastructure management because AWS Glue is serverless, and Amazon takes care of the underlying infrastructure.""Transformations are valuable because you can modify or override complex data logic from an open source or Spark to solve issues."

More AWS Glue Pros →

"Sometimes, it took a whole team about two weeks to get all the data to prepare and present it. After the optimization of the data, it took about one to two hours to do the whole process. Therefore, it has helped a lot when you talk about money, because it doesn't take a whole team to do it, just one person to do one project at a time and run it when you want to run it. So, it has helped a lot on that side.""One of the most valuable features is the ability to create many API integrations. I'm always working with advertising agents and using Facebook and Instagram to do campaigns. We use Pentaho to get the results from these campaigns and to create dashboards to analyze the results.""Lumada has allowed us to interact with our employees more effectively and compensate them properly. One of the cool things is that we use it to generate commissions for our salespeople and bonuses for our warehouse people. It allows us to get information out to them in a timely fashion. We can also see where they're at and how they're doing.""It's my understanding that the product can scale.""It has a really friendly user interface, which is its main feature. The process of automating or combining SQL code with some databases and doing the automation is great and really convenient.""It makes it pretty simple to do some fairly complicated things. Both I and some of our other BI developers have made stabs at using, for example, SQL Server Integration Services, and we found them a little bit frustrating compared to Data Integration. So, its ease of use is right up there.""The amount of data that it loads and processes is good.""The graphical nature of the development interface is most useful because we've got people with quite mixed skills in the team. We've got some very junior, apprentice-level people, and we've got support analysts who don't have an IT background. It allows us to have quite complicated data flows and embed logic in them. Rather than having to troll through lines and lines of code and try and work out what it's doing, you get a visual representation, which makes it quite easy for people with mixed skills to support and maintain the product. That's one side of it."

More Pentaho Data Integration and Analytics Pros →

Cons
"The solution's visual ETL tool is of no use for actual implementation.""The start-up time is really high right now. For instance, when you start up a new job, you have to wait for five or eight minutes before it starts. If the start-up time is reduced to one or two minutes, it will be great. It will be better to have a direct linkage to Redshift in AWS. If we can use data catalogs from Redshift, it will be so easy to create some data catalogs. Currently, we can only use data catalogs from S3.""If there's a cluster-related configuration, we have to make worker notes, which is quite a headache when processing a large amount of data.""The solution’s stability could be improved.""AWS Glue would be improved by making it easier to switch from single to multi-cloud.""It would be better if it were more user-friendly. The interesting thing we found is that it was a little strange at the beginning. The way Glue works is not very straightforward. After trying different things, for example, we used just the console to create jobs. Then we realized that things were not working as expected. After researching and learning more, we realized that even though the console creates the script for the ETL processes, you need to modify or write your own script in Spark to do everything you want it to do. For example, we are pulling data from our source database and our application database, which is in Aurora. From there, we are doing the ETL to transform the data and write the results into Redshift. But what was surprising is that it's almost like whatever you want to do, you can do it with Glue because you have the option to put together your own script. Even though there are many functionalities and many connections, you have the opportunity to write your own queries to do whatever transformations you need to do. It's a little deceiving that some options are supposed to work in a certain way when you set them up in the console, but then they are not exactly working the right way or not as expected. It would be better if they provided more examples and more documentation on options.""On occasion, the solution's dashboard reports that a project failed due to runtime but it actually succeeded.""It fails to handle massive databases acquired from various sources."

More AWS Glue Cons →

"I would like to see improvement when it comes to integrating structured data with text data or anything that is unstructured. Sometimes we get all kinds of different files that we need to integrate into the warehouse.""If you develop it on MacBook, it'll be quite a hassle.""Since Hitachi took over, I don't feel that the documentation is as good within the solution. It used to have very good help built right in.""The web interface is rusty, and the biggest problem with Pentaho is debugging and troubleshooting. It isn't easy to build the pipeline incrementally. At least in our case, it's hard to find a way to execute step by step in the debugging mode.""Some of the scheduling features about Lumada drive me buggy. The one issue that always drives me up the wall is when Daylight Savings Time changes. It doesn't take that into account elegantly. Every time it changes, I have to do something. It's not a big deal, but it's annoying.""​There is not a data quality or MDM solution in the Pentaho DI suite.​""I have been facing some difficulties when working with large datasets. It seems that when there is a large amount of data, I experience memory errors.""As far as I remember, not all connectors worked very well. They can add more connectors and more drivers to the process to integrate with more flows."

More Pentaho Data Integration and Analytics Cons →

Pricing and Cost Advice
  • "The pricing is a bit higher than other solutions like Athena and EC2. If the pricing becomes more scaled or flexible, it will be good because you have to pay 44 cents just for one DPU for an hour. If you increase DPUs to 5 or 10, the pricing gets multiplied. There are also some time limits like 0 to 10 minutes or 10 to 20 minutes. If the pricing is according to the minutes, it would be better because you have to limit your job to 10 minutes or 20 minutes."
  • "It is not expensive. AWS Glue works on the serverless architecture. We get charged for the time the server is up. For our use case, we have to use it once in a day, and it is not expensive for us."
  • "Its price is good. We pay as we go or based on the usage, which is a good thing for us because it is simple to forecast for the tool. It is good in terms of the financial planning of the company, and it is a good way to estimate the cost. It is also simple for our clients. In my opinion, it is one of the best tools in the market for ETL processes because of the fact that you pay as you use, which separates it from other big tools such as PowerCenter, Pentaho Data Integration, and Talend."
  • "Technical support is a paid service, and which subscription you have is dependent on that. You must pay one of them, and it ranges from $15,000 to $25,000 per year."
  • "This solution is affordable and there is an option to pay for the solution based on your usage."
  • "AWS Glue is quite costly, especially for small organizations."
  • "AWS Glue uses a pay-as-you-go approach which is helpful. The price of the overall solution is low and is a great advantage."
  • "The overall cost of AWS Glue could be better. It cost approximately $1,000 a month. There is paid support available from AWS Glue."
  • More AWS Glue Pricing and Cost Advice →

  • "There is a good open source option (Community Edition)​."
  • "The price of the regular version is not reasonable and it should be lower."
  • "Sometimes we provide the licenses or the customer can procure their own licenses. Previously, we had an enterprise license. Currently, we are on a community license as this is adequate for our needs."
  • "It does seem a bit expensive compared to the serverless product offering. Tools, such as Server Integration Services, are "almost" free with a database engine. It is comparable to products like Alteryx, which is also very expensive."
  • "I think Lumada's price is fair compared to some of the others, like BusinessObjects, which is was the other thing that I used at my previous job. BusinessObject's price was more reasonable before SAP acquired it. They jacked the price up significantly. Oracle's OBIEE tool was also prohibitively expensive."
  • "When we first started with it, it was much cheaper. It has gone up drastically, especially since Hitachi bought out Pentaho."
  • "The cost of these types of solutions are expensive. So, we really appreciate what we get for our money. Though, we don't think of the solution as a top-of-the-line solution or anything like that."
  • "The pricing has been pretty good. I'm used to using everything open-source or freeware-based. I understand that organizations need to make sure that the solutions are secure, and that's basically where I hit a roadblock in my current organization. They needed to ensure that we had a license and we had a secure way of accessing it so that no outside parties could get access to our data, but in terms of pricing, considering how much other teams are spending on cloud solutions or even their existing solutions, its price point is pretty good. At this time, there are no additional costs. We just have the licensing fees."
  • More Pentaho Data Integration and Analytics Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Cloud Data Integration solutions are best for your needs.
    768,415 professionals have used our research since 2012.
    Questions from the Community
    Top Answer:AWS Glue and Azure Data factory for ELT best performance cloud services.
    Top Answer:We reviewed AWS Glue before choosing Talend Open Studio. AWS Glue is the managed ETL (extract, transform, and load) from Amazon Web Services. AWS Glue enables AWS users to create and manage jobs in… more »
    Top Answer:AWS Glue's main use case is for allowing users to discover, prepare, move, and integrate data from multiple sources. The product lets you use this data for analytics, application development, or… more »
    Top Answer:Hi Rajneesh yes here is the feature comparison between the community and enterprise edition :… more »
    Top Answer: In my opinion, the reporting side of this tool needs serious improvements. In my previous company, we worked with Hitachi Lumada Data Integration and while it does a good job for what it’s worth, it… more »
    Top Answer:My company has used this product to transform data from databases, CSV files, and flat files. It really does a good job. We were most satisfied with the results in terms of how many people could use… more »
    Ranking
    1st
    Views
    12,012
    Comparisons
    8,420
    Reviews
    32
    Average Words per Review
    419
    Rating
    7.8
    16th
    out of 100 in Data Integration
    Views
    3,346
    Comparisons
    1,127
    Reviews
    15
    Average Words per Review
    1,193
    Rating
    7.7
    Comparisons
    Also Known As
    Hitachi Lumada Data Integration, Kettle, Pentaho Data Integration
    Learn More
    Overview

    AWS Glue is a serverless cloud data integration tool that facilitates the discovery, preparation, movement, and integration of data from multiple sources for machine learning (ML), analytics, and application development. The solution includes additional productivity and data ops tooling for running jobs, implementing business workflows, and authoring.

    AWS Glue allows users to connect to more than 70 diverse data sources and manage data in a centralized data catalog. The solution facilitates visual creation, running, and monitoring of extract, transform, and load (ETL) pipelines to load data into users' data lakes. This Amazon product seamlessly integrates with other native applications of the brand and allows users to search and query cataloged data using Amazon EMR, Amazon Athena, and Amazon Redshift Spectrum.

    The solution also utilizes application programming interface (API) operations to transform users' data, create runtime logs, store job logic, and create notifications for monitoring job runs. The console of AWS Glue connects all of these services into a managed application, facilitating the monitoring and operational processes. The solution also performs provisioning and management of the resources required to run users' workloads in order to minimize manual work time for organizations.

    AWS Glue Features

    AWS Glue groups its features into four categories - discover, prepare, integrate, and transform. Within those groups are the following features:

    • Automatic schema discovery: AWS Glue crawlers connect to the organization's source or target data source through a prioritized list of classifiers to determine the schema for users' data. This feature creates metadata in companies' AWS Glue Data Catalog.

    • Schemas for data stream management: The AWS Glue Schema Registry enables users to validate and control the evolution of streaming data through registered Apache Avro schemas for no additional charge.

    • Automatic scaling based on workload: This feature dynamically scales resources up and down based on workload. The feature controls job resources, removing them depending on how much the workload can be split up.

    • FindMatches: This feature is for machine learning-based data deduplication and cleansing, and works by finding records that are imperfect matches of each other to remove useless data copies.

    • Edit, debug, and test ETL code: This feature helps users who have chosen to interactively develop their ETL code by providing development endpoints for editing, debugging, and testing the code it generates for them.

    • AWS Glue DataBrew: An interactive, point-and-click visual interface for specialists to clean and normalize data without the need to write any code.

    • AWS Glue Interactive Sessions: This feature simplifies the development of data integration jobs by enabling data engineers to interactively prepare and explore data.

    • AWS Glue Studio Job Notebooks: This AWS Glue feature provides serverless notebooks with minimal setup, allowing developers to start working in a timely manner.

    • Complex ETL pipeline building: This feature allows the product to be invoked on a schedule, on demand, or based on an event, allowing users to start multiple jobs in parallel or specify dependencies to build complex ETL pipelines.

    • AWS Glue Studio: This AWS Glue feature allows users to visually transform data through a drag-and-drop interface. The product automatically generates the code for ETL processes for users' data.

    AWS Glue Benefits

    AWS Glue offers a wide range of benefits for its users. These benefits include:

    • Users of other AWS products can easily onboard with AWS Glue, as it is integrated across a wide range of the company's services.

    • The solution is serverless, which allows for a lower total cost of ownership.

    • AWS Glue offers more power for users, as it automates much of the effort in building, maintaining, and running ETL jobs.

    • The product allows customers to easily discover and search across all their AWS datasets through AWS Glue Data Catalog.

    • AWS Glue does not require additional payment for managing and enforcing schemas for data streams.

    • The solution facilitates the authority of scalable ETL jobs for beginners and non-coding experts through a drag-and-drop interface.

    Reviews from Real Users

    Mustapha A., a cloud data engineer at Jems Groupe, likes AWS Glue because it is a product that is great for serverless data transformations.

    Liana I., CEO at Quark Technologies SRL, describes AWS Glue as a highly scalable, reliable, and beneficial pay-as-you-go pricing model.

    Pentaho Data Integration stands as a versatile platform designed to cater to the data integration and analytics needs of organizations, regardless of their size. This powerful solution is the go-to choice for businesses seeking to seamlessly integrate data from diverse sources, including databases, files, and applications. Pentaho Data Integration facilitates the essential tasks of cleaning and transforming data, ensuring it's primed for meaningful analysis. With a wide array of tools for data mining, machine learning, and statistical analysis, Pentaho Data Integration empowers organizations to glean valuable insights from their data. What sets Pentaho Data Integration apart is its maturity and a vibrant community of users and developers, making it a reliable and cost-effective option. Pentaho Data Integration offers a range of features, including a comprehensive ETL toolkit, data cleaning and transformation capabilities, robust data analysis tools, and seamless deployment options for data integration and analytics solutions, making it a go-to solution for organizations seeking to harness the power of their data.

    Sample Customers
    bp, Cerner, Expedia, Finra, HESS, intuit, Kellog's, Philips, TIME, workday
    66Controls, Providential Revenue Agency of Ro Negro, NOAA Information Systems, Swiss Real Estate Institute
    Top Industries
    REVIEWERS
    Computer Software Company47%
    Financial Services Firm18%
    Pharma/Biotech Company12%
    Consumer Goods Company6%
    VISITORS READING REVIEWS
    Financial Services Firm19%
    Computer Software Company14%
    Manufacturing Company7%
    Insurance Company7%
    REVIEWERS
    Healthcare Company19%
    Financial Services Firm19%
    Comms Service Provider11%
    Manufacturing Company11%
    VISITORS READING REVIEWS
    Financial Services Firm19%
    Computer Software Company14%
    Comms Service Provider12%
    Government7%
    Company Size
    REVIEWERS
    Small Business29%
    Midsize Enterprise13%
    Large Enterprise58%
    VISITORS READING REVIEWS
    Small Business15%
    Midsize Enterprise12%
    Large Enterprise73%
    REVIEWERS
    Small Business27%
    Midsize Enterprise31%
    Large Enterprise42%
    VISITORS READING REVIEWS
    Small Business21%
    Midsize Enterprise11%
    Large Enterprise68%
    Buyer's Guide
    AWS Glue vs. Pentaho Data Integration and Analytics
    March 2024
    Find out what your peers are saying about AWS Glue vs. Pentaho Data Integration and Analytics and other solutions. Updated: March 2024.
    768,415 professionals have used our research since 2012.

    AWS Glue is ranked 1st in Cloud Data Integration with 37 reviews while Pentaho Data Integration and Analytics is ranked 16th in Data Integration with 48 reviews. AWS Glue is rated 7.8, while Pentaho Data Integration and Analytics is rated 8.0. The top reviewer of AWS Glue writes "Provides serverless mechanism, easy data transformation and automated infrastructure management". On the other hand, the top reviewer of Pentaho Data Integration and Analytics writes "It's flexible and can do almost anything I want it to do". AWS Glue is most compared with AWS Database Migration Service, Informatica PowerCenter, SSIS, Informatica Cloud Data Integration and Talend Open Studio, whereas Pentaho Data Integration and Analytics is most compared with Azure Data Factory, SSIS, Talend Open Studio, Oracle Data Integrator (ODI) and SAP Data Services. See our AWS Glue vs. Pentaho Data Integration and Analytics report.

    See our list of best Cloud Data Integration vendors.

    We monitor all Cloud Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.