We performed a comparison between Amazon EMR and Pentaho Data Integration and Analytics based on real PeerSpot user reviews.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop."The initial setup is straightforward."
"Amazon EMR's most valuable features are processing speed and data storage capacity."
"Amazon EMR is a good solution that can be used to manage big data."
"The solution is pretty simple to set up."
"When we grade big jobs from on-prem to the cloud, we do it in EMR with Spark."
"In Amazon EMR it is easy to rebuild anything, easy to upgrade and has good fault tolerance."
"The solution is scalable."
"The ability to resize the cluster is what really makes it stand out over other Hadoop and big data solutions."
"It's very simple compared to other products out there."
"It's my understanding that the product can scale."
"One of the most valuable features is the ability to create many API integrations. I'm always working with advertising agents and using Facebook and Instagram to do campaigns. We use Pentaho to get the results from these campaigns and to create dashboards to analyze the results."
"The graphical nature of the development interface is most useful because we've got people with quite mixed skills in the team. We've got some very junior, apprentice-level people, and we've got support analysts who don't have an IT background. It allows us to have quite complicated data flows and embed logic in them. Rather than having to troll through lines and lines of code and try and work out what it's doing, you get a visual representation, which makes it quite easy for people with mixed skills to support and maintain the product. That's one side of it."
"One of the valuable features is the ability to use PL/SQL statements inside the data transformations and jobs."
"It has a really friendly user interface, which is its main feature. The process of automating or combining SQL code with some databases and doing the automation is great and really convenient."
"We can schedule job execution in the BA Server, which is the front-end product we're using right now. That scheduling interface is nice."
"I can create faster instructions than writing with SQL or code. Also, I am able to do some background control of the data process with this tool. Therefore, I use it as an ELT tool. I have a station area where I can work with all the information that I have in my production databases, then I can work with the data that I created."
"Amazon EMR can improve by adding some features, such as megastore services and HiveServer2. Additionally, the user interface could be better, similar to what Apache service provides, cross-platform services."
"The initial setup was time-consuming."
"The dashboard management could be better. Right now, it's lacking a bit."
"There is room for improvement in pricing."
"As people are shifting from legacy solutions to other technologies, Amazon EMR needs to add more features that give more flexibility in managing user data."
"There is no need to pay extra for third-party software."
"The product's features for storing data in static clusters could be better."
"There were times where they would release new versions and it seemed to end up breaking old versions, which is very strange."
"If you're working with a larger data set, I'm not so sure it would be the best solution. The larger things got the slower it was."
"I work with different databases. I would like to work with more connectors to new databases, e.g., DynamoDB and MariaDB, and new cloud solutions, e.g., AWS, Azure, and GCP. If they had these connectors, that would be great. They could improve by building new connectors. If you have native connections to different databases, then you can make instructions more efficient and in a more natural way. You don't have to write any scripts to use that connector."
"As far as I remember, not all connectors worked very well. They can add more connectors and more drivers to the process to integrate with more flows."
"The support for the Enterprise Edition is okay, but what they have done in the last three or four years is move more and more things to that edition. The result is that they are breaking the Community Edition. That's what our impression is."
"I work with the Community Edition, therefore I do not have support. There was an issue that I could not resolve with community support."
"A big problem after deploying something that we do in Lumada is with Git. You get a binary file to do a code review. So, if you need to do a review, you have to take pictures of the screen to show each step. That is the biggest bug if you are using Git."
"I could not connect to our Hadoop environment in an easy and flexible way, and it was important to scale our data warehouse."
"The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have. A simple feature you would expect a reporting tool to have is the ability to search the repository for a report. It doesn't even have that capability. That's been a feature that we've been asking for since the beginning and it hasn't been implemented yet."
More Pentaho Data Integration and Analytics Pricing and Cost Advice →
Amazon EMR is ranked 3rd in Hadoop with 20 reviews while Pentaho Data Integration and Analytics is ranked 16th in Data Integration with 48 reviews. Amazon EMR is rated 7.8, while Pentaho Data Integration and Analytics is rated 8.0. The top reviewer of Amazon EMR writes "Provides efficient data processing features and has good scalability ". On the other hand, the top reviewer of Pentaho Data Integration and Analytics writes "It's flexible and can do almost anything I want it to do". Amazon EMR is most compared with Snowflake, Cloudera Distribution for Hadoop, Azure Data Factory, Amazon Redshift and Apache Spark, whereas Pentaho Data Integration and Analytics is most compared with Azure Data Factory, SSIS, Talend Open Studio, Oracle Data Integrator (ODI) and AWS Glue.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.