Coming October 25: PeerSpot Awards will be announced! Learn more
PeerSpot user
SOA Architect at a pharma/biotech company with 10,001+ employees
  • 5
  • 8

AWS EMR vs Hadoop

I do not see a big advantage of using Cloudera or Hortonworks Hadoop over AWS EMR.

I would like to know what are the key pain points that these vendors address which AWS EMR will not be able to support.


PeerSpot user
3 Answers
PeerSpot user
Sr. Program Manager at a tech services company with 51-200 employees
20 October 15

Here are the key points that differentiate EMR vs. packaged HADOOP software on a private cluster:

Amazon Web Services Elastic Map Reduce (EMR) is clearly a simple and fast
way to get started with Hadoop. As with any cloud offering the trade off is
control and security. With your corporate data in the cloud you are trusting
someone else and you are somewhat limited in terms of the types of things
you can do. AWS EMR is going to leverage open source Apache Hadoop
components almost exclusively.

Cheap but not as easy to use as some of the value add components in Hortonworks, Cloudera or IBM products.
If I leverage IBM InfoSphere BigInsights on my own cluster I gain
ease of use thru robust tools, security which I can control and standard SQL queries
thru BigSQL instead of HiveQL. Additionally the support would be superior.
Cost is of course more with a private cluster and purchasing SW and/or Support
So for these reasons, many people do get started with AWS EMR.

To summarize, the advantages of EMR are cost and open source components vs. flexibility, control, security, and convenience for a private HADOOP cluster.

Full disclosure: I work for an IBM Business Partner.

Search for a product comparison in Hadoop
PeerSpot user
Strategic Account Executve at Salesforce
Real User
20 October 15

In my opinion it is more about support and certification across Apache projects and vendor products. In open source, if you run into an issue,you fix the problem. You can build your own distribution and deploy it with EMR or you can take a certified distribution like CDH and HDP and have assistance.

PeerSpot user
Trainee at Collabera
20 October 15

If you Are interested to know why Hadoop is so important. Suggestion is visit this link once :

Find out what your peers are saying about Apache, Cloudera, IBM and others in Hadoop. Updated: September 2022.
635,987 professionals have used our research since 2012.
Related Questions
Ariel Lindenfeld - PeerSpot reviewer
Director of Content at PeerSpot (formerly IT Central Station)
Dec 31, 2016
Let the community know what you think. Share your opinions now!
See 2 answers
PeerSpot user
Learning Specialist 2 at VMware
24 November 15
The enterprise readiness of the distribution.
PeerSpot user
Software Engineer at a aerospace/defense firm with 1,001-5,000 employees
31 December 16
It depends...what is your endgame ? Hadoop these days mostly servers as a distributed clustering file system that specializes in storing very large files. If you are merely interested in writing software for distributed processing....Apache Spark, or NVIDIA CUDA are a much better choice....if you are interested in the distributed processing of large amounts of data, then the common practice is to use Apache Spark to write the code to process the data, and Hadoop for persistent file system storage.
Avigail Sugarman - PeerSpot reviewer
Community Manager at PeerSpot (formerly IT Central Station)
Sep 15, 2015
Hi,I am analyzing big data architecture. I would like to get a comparison of BigInsights and Cloudera versions of Hadoop.1. I am looking at the pros and cons of BigInsight compared to Cloudera.2. Also would like to know the pros and cons of BigInsight compared to Hortonworks.Your help is greatly appreciated.
See 1 answer
Business Unit technical Lead at a tech services company with 1,001-5,000 employees
15 September 15
Hi, I am a Netezza DBA currently. I am in the middle of working with a group on Biginsights move to production. I have done ALOT of integration testing between Netezza and BI. 1. BI (only tried V3 enterprise and VM trial) PRO - BIGSQL implementation is ANSI compliant SQL ! That is giant PRO in my mind. PRO- GPFS - seems like a decent improvement over HDFS. Very fault tolerant and allows for data updates as well PRO - Integration with Netezza is excellent for Fluid Query read PRO - Console has the ability to create linked tasks CON - Enterprise console - had some applications for data movement that didn't work. Have a ticket open CON - Enterprise Netezza data mover proc seems broken when using GPFS. Have a ticket open for this I have used Cloudera and Hortonworks a little PRO- Really like Ambari ! I think it is much easier to use than BI console interface CON- SQL implementation not 100% ansi compliant
Related Categories
Download Free Report
Download our free Hadoop Report and find out what your peers are saying about Apache, Cloudera, IBM, and more! Updated: September 2022.
635,987 professionals have used our research since 2012.