We performed a comparison between Amazon EMR and Pentaho Data Integration and Analytics based on real PeerSpot user reviews.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop."The solution is pretty simple to set up."
"This is the best tool for hosts and it's really flexible and scalable."
"The ability to resize the cluster is what really makes it stand out over other Hadoop and big data solutions."
"Amazon EMR's most valuable features are processing speed and data storage capacity."
"The solution helps us manage huge volumes of data."
"It has a variety of options and support systems."
"We are using applications, such as Splunk, Livy, Hadoop, and Spark. We are using all of these applications in Amazon EMR and they're helping us a lot."
"It allows users to access the data through a web interface."
"We use Lumada’s ability to develop and deploy data pipeline templates once and reuse them. This is very important. When the entire pipeline is automated, we do not have any issues in respect to deployment of code or with code working in one environment but not working in another environment. We have saved a lot of time and effort from that perspective because it is easy to build ETL pipelines."
"The graphical nature of the development interface is most useful because we've got people with quite mixed skills in the team. We've got some very junior, apprentice-level people, and we've got support analysts who don't have an IT background. It allows us to have quite complicated data flows and embed logic in them. Rather than having to troll through lines and lines of code and try and work out what it's doing, you get a visual representation, which makes it quite easy for people with mixed skills to support and maintain the product. That's one side of it."
"One of the most valuable features is the ability to create many API integrations. I'm always working with advertising agents and using Facebook and Instagram to do campaigns. We use Pentaho to get the results from these campaigns and to create dashboards to analyze the results."
"Provides a good open source option."
"The area where Lumada has helped us is in the commercial area. There are many extractions to compose reports about our sales team performance and production steps. Since we are using Lumada to gather data from each industry in each country. We can get data from Argentina, Chile, Brazil, and Colombia at the same time. We can then concentrate and consolidate it in only one place, like our data warehouse. This improves our production performance and need for information about the industry, production data, and commercial data."
"Pentaho Data Integration is quite simple to learn, and there is a lot of information available online."
"We're using the PDI and the repository function, and they give us the ability to easily generate reporting and output, and to access data. We also like the ability to schedule."
"The fact that it enables us to leverage metadata to automate data pipeline templates and reuse them is definitely one of the features that we like the best. The metadata injection is helpful because it reduces the need to create and maintain additional ETLs. If we didn't have that feature, we would have lots of duplicated ETLs that we would have to create and maintain. The data pipeline templates have definitely been helpful when looking at productivity and costs."
"There is room for improvement in pricing."
"Modules and strategies should be better handled and notified early in advance."
"We don't have much control. If we have multiple users, if they want to scale up, the cost will go and increase and we don't know how we can restrict that price part."
"There is no need to pay extra for third-party software."
"The product's features for storing data in static clusters could be better."
"The dashboard management could be better. Right now, it's lacking a bit."
"There were times where they would release new versions and it seemed to end up breaking old versions, which is very strange."
"The most complicated thing is configuring to the cluster and ensure it's running correctly."
"There is not a data quality or MDM solution in the Pentaho DI suite."
"The testing and quality could really improve. Every time that there is a major release, we are very nervous about what is going to get broken. We have had a lot of experience with that, as even the latest one was broken. Some basic things get broken. That doesn't look good for Hitachi at all. If there is one place I would advise them to spend some money and do some effort, it is with the quality. It is not that hard to start putting in some unit tests so basic things don't get broken when they do a new release. That just looks horrible, especially for an organization like Hitachi."
"I would like to see improvements made for real-time data processing."
"Since Hitachi took over, I don't feel that the documentation is as good within the solution. It used to have very good help built right in."
"I could not connect to our Hadoop environment in an easy and flexible way, and it was important to scale our data warehouse."
"Although it is a low-code solution with a graphical interface, often the error messages that you get are of the type that a developer would be happy with. You get a big stack of red text and Java errors displayed on the screen, and less technical people can get intimidated by that. It can be a bit intimidating to get a wall of red error messages displayed. Other graphical tools that are focused at the power user level provide a much more user-friendly experience in dealing with your exceptions and guiding the user into where they've made the mistake."
"I would like to see more improvements with AS400 DB2."
"The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have. A simple feature you would expect a reporting tool to have is the ability to search the repository for a report. It doesn't even have that capability. That's been a feature that we've been asking for since the beginning and it hasn't been implemented yet."
More Pentaho Data Integration and Analytics Pricing and Cost Advice →
Amazon EMR is ranked 3rd in Hadoop with 20 reviews while Pentaho Data Integration and Analytics is ranked 15th in Data Integration with 48 reviews. Amazon EMR is rated 7.8, while Pentaho Data Integration and Analytics is rated 8.0. The top reviewer of Amazon EMR writes "Provides efficient data processing features and has good scalability ". On the other hand, the top reviewer of Pentaho Data Integration and Analytics writes "It's flexible and can do almost anything I want it to do". Amazon EMR is most compared with Snowflake, Cloudera Distribution for Hadoop, Azure Data Factory, Amazon Redshift and Apache Spark, whereas Pentaho Data Integration and Analytics is most compared with SSIS, Azure Data Factory, Talend Open Studio, Oracle Data Integrator (ODI) and AWS Glue.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.