Amazon EMR vs Pentaho Data Integration and Analytics comparison

Cancel
You must select at least 2 products to compare!
Amazon Web Services (AWS) Logo
2,149 views|1,834 comparisons
85% willing to recommend
Hitachi Vantara Logo
3,346 views|1,127 comparisons
94% willing to recommend
Comparison Buyer's Guide
Executive Summary

We performed a comparison between Amazon EMR and Pentaho Data Integration and Analytics based on real PeerSpot user reviews.

Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop.
To learn more, read our detailed Hadoop Report (Updated: April 2024).
767,995 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"The initial setup is straightforward.""Amazon EMR's most valuable features are processing speed and data storage capacity.""Amazon EMR is a good solution that can be used to manage big data.""The solution is pretty simple to set up.""When we grade big jobs from on-prem to the cloud, we do it in EMR with Spark.""In Amazon EMR it is easy to rebuild anything, easy to upgrade and has good fault tolerance.""The solution is scalable.""The ability to resize the cluster is what really makes it stand out over other Hadoop and big data solutions."

More Amazon EMR Pros →

"It's very simple compared to other products out there.""It's my understanding that the product can scale.""One of the most valuable features is the ability to create many API integrations. I'm always working with advertising agents and using Facebook and Instagram to do campaigns. We use Pentaho to get the results from these campaigns and to create dashboards to analyze the results.""The graphical nature of the development interface is most useful because we've got people with quite mixed skills in the team. We've got some very junior, apprentice-level people, and we've got support analysts who don't have an IT background. It allows us to have quite complicated data flows and embed logic in them. Rather than having to troll through lines and lines of code and try and work out what it's doing, you get a visual representation, which makes it quite easy for people with mixed skills to support and maintain the product. That's one side of it.""One of the valuable features is the ability to use PL/SQL statements inside the data transformations and jobs.""It has a really friendly user interface, which is its main feature. The process of automating or combining SQL code with some databases and doing the automation is great and really convenient.""We can schedule job execution in the BA Server, which is the front-end product we're using right now. That scheduling interface is nice.""I can create faster instructions than writing with SQL or code. Also, I am able to do some background control of the data process with this tool. Therefore, I use it as an ELT tool. I have a station area where I can work with all the information that I have in my production databases, then I can work with the data that I created."

More Pentaho Data Integration and Analytics Pros →

Cons
"Amazon EMR can improve by adding some features, such as megastore services and HiveServer2. Additionally, the user interface could be better, similar to what Apache service provides, cross-platform services.""The initial setup was time-consuming.""The dashboard management could be better. Right now, it's lacking a bit.""There is room for improvement in pricing.""As people are shifting from legacy solutions to other technologies, Amazon EMR needs to add more features that give more flexibility in managing user data.""There is no need to pay extra for third-party software.""The product's features for storing data in static clusters could be better.""There were times where they would release new versions and it seemed to end up breaking old versions, which is very strange."

More Amazon EMR Cons →

"If you're working with a larger data set, I'm not so sure it would be the best solution. The larger things got the slower it was.""I work with different databases. I would like to work with more connectors to new databases, e.g., DynamoDB and MariaDB, and new cloud solutions, e.g., AWS, Azure, and GCP. If they had these connectors, that would be great. They could improve by building new connectors. If you have native connections to different databases, then you can make instructions more efficient and in a more natural way. You don't have to write any scripts to use that connector.""As far as I remember, not all connectors worked very well. They can add more connectors and more drivers to the process to integrate with more flows.""The support for the Enterprise Edition is okay, but what they have done in the last three or four years is move more and more things to that edition. The result is that they are breaking the Community Edition. That's what our impression is.""​I work with the Community Edition, therefore I do not have support. There was an issue that I could not resolve with community support.​""A big problem after deploying something that we do in Lumada is with Git. You get a binary file to do a code review. So, if you need to do a review, you have to take pictures of the screen to show each step. That is the biggest bug if you are using Git.""​I could not connect to our Hadoop environment in an easy and flexible way, and it was important to scale our data warehouse​.""The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have. A simple feature you would expect a reporting tool to have is the ability to search the repository for a report. It doesn't even have that capability. That's been a feature that we've been asking for since the beginning and it hasn't been implemented yet."

More Pentaho Data Integration and Analytics Cons →

Pricing and Cost Advice
  • "You don't need to pay for licensing on a yearly or monthly basis, you only pay for what you use, in terms of underlying instances."
  • "The cost of Amazon EMR is very high."
  • "The price of the solution is expensive."
  • "Amazon EMR's price is reasonable."
  • "There is a small fee for the EMR system, but major cost components are the underlying infrastructure resources which we actually use."
  • "There is no need to pay extra for third-party software."
  • "Amazon EMR is not very expensive."
  • "The product is not cheap, but it is not expensive."
  • More Amazon EMR Pricing and Cost Advice →

  • "There is a good open source option (Community Edition)​."
  • "The price of the regular version is not reasonable and it should be lower."
  • "Sometimes we provide the licenses or the customer can procure their own licenses. Previously, we had an enterprise license. Currently, we are on a community license as this is adequate for our needs."
  • "It does seem a bit expensive compared to the serverless product offering. Tools, such as Server Integration Services, are "almost" free with a database engine. It is comparable to products like Alteryx, which is also very expensive."
  • "I think Lumada's price is fair compared to some of the others, like BusinessObjects, which is was the other thing that I used at my previous job. BusinessObject's price was more reasonable before SAP acquired it. They jacked the price up significantly. Oracle's OBIEE tool was also prohibitively expensive."
  • "When we first started with it, it was much cheaper. It has gone up drastically, especially since Hitachi bought out Pentaho."
  • "The cost of these types of solutions are expensive. So, we really appreciate what we get for our money. Though, we don't think of the solution as a top-of-the-line solution or anything like that."
  • "The pricing has been pretty good. I'm used to using everything open-source or freeware-based. I understand that organizations need to make sure that the solutions are secure, and that's basically where I hit a roadblock in my current organization. They needed to ensure that we had a license and we had a secure way of accessing it so that no outside parties could get access to our data, but in terms of pricing, considering how much other teams are spending on cloud solutions or even their existing solutions, its price point is pretty good. At this time, there are no additional costs. We just have the licensing fees."
  • More Pentaho Data Integration and Analytics Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
    767,995 professionals have used our research since 2012.
    Questions from the Community
    Top Answer:Amazon EMR is a good solution that can be used to manage big data.
    Top Answer:As people are shifting from legacy solutions to other technologies, Amazon EMR needs to add more features that give more flexibility in managing user data.
    Top Answer:Hi Rajneesh yes here is the feature comparison between the community and enterprise edition :… more »
    Top Answer: In my opinion, the reporting side of this tool needs serious improvements. In my previous company, we worked with Hitachi Lumada Data Integration and while it does a good job for what it’s worth, it… more »
    Top Answer:My company has used this product to transform data from databases, CSV files, and flat files. It really does a good job. We were most satisfied with the results in terms of how many people could use… more »
    Ranking
    3rd
    out of 22 in Hadoop
    Views
    2,149
    Comparisons
    1,834
    Reviews
    12
    Average Words per Review
    346
    Rating
    7.8
    16th
    out of 100 in Data Integration
    Views
    3,346
    Comparisons
    1,127
    Reviews
    15
    Average Words per Review
    1,193
    Rating
    7.7
    Comparisons
    Also Known As
    Amazon Elastic MapReduce
    Hitachi Lumada Data Integration, Kettle, Pentaho Data Integration
    Learn More
    Overview
    Amazon Elastic MapReduce (Amazon EMR) is a web service that makes it easy to quickly and cost-effectively process vast amounts of data. Amazon EMR simplifies big data processing, providing a managed Hadoop framework that makes it easy, fast, and cost-effective for you to distribute and process vast amounts of your data across dynamically scalable Amazon EC2 instances.

    Pentaho Data Integration stands as a versatile platform designed to cater to the data integration and analytics needs of organizations, regardless of their size. This powerful solution is the go-to choice for businesses seeking to seamlessly integrate data from diverse sources, including databases, files, and applications. Pentaho Data Integration facilitates the essential tasks of cleaning and transforming data, ensuring it's primed for meaningful analysis. With a wide array of tools for data mining, machine learning, and statistical analysis, Pentaho Data Integration empowers organizations to glean valuable insights from their data. What sets Pentaho Data Integration apart is its maturity and a vibrant community of users and developers, making it a reliable and cost-effective option. Pentaho Data Integration offers a range of features, including a comprehensive ETL toolkit, data cleaning and transformation capabilities, robust data analysis tools, and seamless deployment options for data integration and analytics solutions, making it a go-to solution for organizations seeking to harness the power of their data.

    Sample Customers
    Yelp
    66Controls, Providential Revenue Agency of Ro Negro, NOAA Information Systems, Swiss Real Estate Institute
    Top Industries
    REVIEWERS
    Computer Software Company27%
    Media Company18%
    Wholesaler/Distributor18%
    Comms Service Provider9%
    VISITORS READING REVIEWS
    Financial Services Firm23%
    Computer Software Company13%
    Manufacturing Company8%
    Educational Organization6%
    REVIEWERS
    Healthcare Company19%
    Financial Services Firm19%
    Comms Service Provider11%
    Manufacturing Company11%
    VISITORS READING REVIEWS
    Financial Services Firm19%
    Computer Software Company13%
    Comms Service Provider12%
    Government7%
    Company Size
    REVIEWERS
    Small Business26%
    Midsize Enterprise26%
    Large Enterprise47%
    VISITORS READING REVIEWS
    Small Business17%
    Midsize Enterprise12%
    Large Enterprise72%
    REVIEWERS
    Small Business27%
    Midsize Enterprise31%
    Large Enterprise42%
    VISITORS READING REVIEWS
    Small Business21%
    Midsize Enterprise11%
    Large Enterprise68%
    Buyer's Guide
    Hadoop
    April 2024
    Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop. Updated: April 2024.
    767,995 professionals have used our research since 2012.

    Amazon EMR is ranked 3rd in Hadoop with 20 reviews while Pentaho Data Integration and Analytics is ranked 16th in Data Integration with 48 reviews. Amazon EMR is rated 7.8, while Pentaho Data Integration and Analytics is rated 8.0. The top reviewer of Amazon EMR writes "Provides efficient data processing features and has good scalability ". On the other hand, the top reviewer of Pentaho Data Integration and Analytics writes "It's flexible and can do almost anything I want it to do". Amazon EMR is most compared with Snowflake, Cloudera Distribution for Hadoop, Azure Data Factory, Amazon Redshift and Apache Spark, whereas Pentaho Data Integration and Analytics is most compared with Azure Data Factory, SSIS, Talend Open Studio, Oracle Data Integrator (ODI) and AWS Glue.

    We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.