Amazon Redshift vs Apache Hadoop comparison

Cancel
You must select at least 2 products to compare!
Amazon Web Services (AWS) Logo
8,203 views|6,066 comparisons
87% willing to recommend
Apache Logo
2,630 views|2,223 comparisons
89% willing to recommend
Comparison Buyer's Guide
Executive Summary

We performed a comparison between Amazon Redshift and Apache Hadoop based on real PeerSpot user reviews.

Find out in this report how the two Data Warehouse solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI.
To learn more, read our detailed Amazon Redshift vs. Apache Hadoop Report (Updated: March 2024).
768,578 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"The solution has very competitive pricing.""The ability to reload data multiple times at different times.""The most valuable feature is that the solution is fully embedded in the AWS stack.""The processing of data is very fast.""The solution is scalable. It handles different loads very well.""It is quite simple to use and there are no issues with creating the tables.""Setup is easy. It's a fast solution with machine learning features, good integration, and a good API.""Though Amazon Redshift is good, it depends on what kind of business you're trying to do, what type of analytics you need, and how much data you have."

More Amazon Redshift Pros →

"The solution is easy to expand. We haven't seen any issues with it in that sense. We've added 10 servers, and we've added two nodes. We've been expanding since we started using it since we started out so small. Companies that need to scale shouldn't have a problem doing so.""The most valuable features are the ability to process the machine data at a high speed, and to add structure to our data so that we can generate relevant analytics.""Since both Apache Hadoop and Amazon EC2 are elastic in nature, we can scale and expand on demand for a specific PoC, and scale down when it's done.""The tool's stability is good.""What I like about Apache Hadoop is that it's for big data, in particular big data analysis, and it's the easier solution. I like the data processing feature for AI/ML use cases the most because some solutions allow me to collect data from relational databases, while Hadoop provides me with more options for newer technologies.""​​Data ingestion: It has rapid speed, if Apache Accumulo is used.""The most valuable feature is the database.""The most valuable feature is scalability and the possibility to work with major information and open source capability."

More Apache Hadoop Pros →

Cons
"The product could be improved by making it more flexible.""This solution lacks integration with non-AWS sources.""One area where Amazon Redshift could improve is in adopting the compute-separate, data-separate architecture, which Delta, Snowflake are adopting, and a few others in the cloud data warehouse spectrum.""For people who struggle with IAM or role-based management, the setup isn't easy.""The speed of the solution and its portability needs improvement.""We recently moved from the DC2 cluster to the RA3 cluster, which is a different node type and we are finding some issues with the RA3 cluster regarding connection and processing. There is room for improvement in this area. We are in talks with AWS regarding the connection issues.""The initial setup is a complex process, especially for someone who is not familiar with nodes and configuring terms like RPUs.""Amazon should provide more cloud-native tools that can integrate with Redshift like Microsoft's development tools for Azure."

More Amazon Redshift Cons →

"The key shortcoming is its inability to handle queries when there is insufficient memory. This limitation can be bypassed by processing the data in chunks.""The solution could use a better user interface. It needs a more effective GUI in order to create a better user environment.""I think more of the solution needs to be focused around the panel processing and retrieval of data.""The stability of the solution needs improvement.""The main thing is the lack of community support. If you want to implement a new API or create a new file system, you won't find easy support.""What could be improved in Apache Hadoop is its user-friendliness. It's not that user-friendly, but maybe it's because I'm new to it. Sometimes it feels so tough to use, but it could be because of two aspects: one is my incompetency, for example, I don't know about all the features of Apache Hadoop, or maybe it's because of the limitations of the platform. For example, my team is maintaining the business glossary in Apache Atlas, but if you want to change any settings at the GUI level, an advanced level of coding or programming needs to be done in the back end, so it's not user-friendly.""It could be more user-friendly.""The solution is not easy to use. The solution should be easy to use and suitable for almost any case connected with the use of big data or working with large amounts of data."

More Apache Hadoop Cons →

Pricing and Cost Advice
  • "Redshift is very cost effective for a cloud based solution if you need to scale it a lot. For smaller data sizes, I would think about using other products."
  • "If you want a fixed price, an to not worry about every query, but you need to manage your nodes personally, use Redshift."
  • "BI is sold to our customer base as a part of the initial sales bundle. A customer may elect to opt for a white labeled site for an up-charge."
  • "One of my customers went with Google Big Query over Redshift because it was significantly cheaper for their project."
  • "Per hour pricing is helpful to keep the costs of a pilot down, but long-term retention is expensive."
  • "It's around $200 US dollars. There are some data transfer costs but it's minimal, around $20."
  • "The best part about this solution is the cost."
  • "The part that I like best is that you only pay for what you are using."
  • More Amazon Redshift Pricing and Cost Advice →

  • "Do take into consider that data storage and compute capacity scale differently and hence purchasing a "boxed" / 'all-in-one" solution (software and hardware) might not be the best idea."
  • "​There are no licensing costs involved, hence money is saved on the software infrastructure​."
  • "This is a low cost and powerful solution."
  • "The price of Apache Hadoop could be less expensive."
  • "If my company can use the cloud version of Apache Hadoop, particularly the cloud storage feature, it would be easier and would cost less because an on-premises deployment has a higher cost during storage, for example, though I don't know exactly how much Apache Hadoop costs."
  • "We don't directly pay for it. Our clients pay for it, and they usually don't complain about the price. So, it is probably acceptable."
  • "The price could be better. Hortonworks no longer exists, and Cloudera killed the free version of Hadoop."
  • "We just use the free version."
  • More Apache Hadoop Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Data Warehouse solutions are best for your needs.
    768,578 professionals have used our research since 2012.
    Questions from the Community
    Top Answer:Amazon Redshift is very fast, has a very good response time, and is very user-friendly. The initial setup is very straightforward. This solution can merge and integrate well with many different… more »
    Top Answer:Redshift Spectrum is the most valuable feature.
    Top Answer:Tools like Apache Hadoop are knowledge-intensive in nature. Unlike other tools in the market currently, we cannot understand knowledge-intensive products straight away. To use Apache Hadoop, a person… more »
    Ranking
    4th
    Views
    8,203
    Comparisons
    6,066
    Reviews
    23
    Average Words per Review
    480
    Rating
    7.7
    5th
    out of 34 in Data Warehouse
    Views
    2,630
    Comparisons
    2,223
    Reviews
    11
    Average Words per Review
    532
    Rating
    8.0
    Comparisons
    Learn More
    Overview

    What is Amazon Redshift?

    Amazon Redshift is a fully administered, petabyte-scale cloud-based data warehouse service. Users are able to begin with a minimal amount of gigabytes of data and can easily scale up to a petabyte or more as needed. This will enable them to utilize their own data to develop new intuitions on how to improve business processes and client relations.

    Initially, users start to develop a data warehouse by initiating what is called an Amazon Redshift cluster or a set of nodes. Once the cluster has been provisioned, users can seamlessly upload data sets, and then begin to perform data analysis queries. Amazon Redshift delivers super-fast query performance, regardless of size, utilizing the exact SQL-based tools and BI applications that most users are already working with today.

    The Amazon Redshift service performs all of the work of setting up, operating, and scaling a data warehouse. These tasks include provisioning capacity, monitoring and backing up the cluster, and applying patches and upgrades to the Amazon Redshift engine.

    Amazon Redshift Functionalities

    Amazon Redshift has many valuable key functionalities. Some of its most useful functionalities include:

    • Cluster administration: The Amazon Redshift cluster is a group of nodes that contains a leader node and one (or more) compute node(s). The compute nodes needed are dependent on the data size, amount of queries needed, and the query execution functionality desired.
    • Cluster snapshots: Snapshots are backups of a cluster from an exact point in time. Amazon Redshift offers two types of snapshots: manual and automated. Amazon will store these snapshots internally in the Amazon Simple Storage Service (Amazon S3) utilizing an SSL connection. Whenever a Snapshot restore is needed, Amazon Redshift will create a new cluster and will import data from the snapshot as directed. 
    • Cluster access: Amazon Redshift provides several intuitive features to help define connectivity rules, encrypt data and connections, and control the overall access of your cluster.
    • IAM credentials and AWS accounts: The Amazon Redshift cluster is only accessible by the AWS account that created the cluster. This automatically secures the cluster and keeps it safe. Inside the AWS account, users access the AWS Identity and IAM protocol to create additional user accounts and manage permissions, granting specified users the desired access needed to control cluster performance.
    • Encryption: Users have the option to choose to encrypt the clusters for additional added security once the cluster is provisioned. When encryption is enabled, Amazon Redshift will store all the data in user-created tables in a secure encrypted format. To manage Amazon Redshift encryption keys, users will access AWS Key Management Service (AWS KMS).

    Reviews from Real Users

    Redshift's versioning and data security are the two most critical features. When migrating into the cloud, it's vital to secure the data. The encryption and security are there.” - Kundan A., Senior Consultant at Dynamic Elements AS

    “With the cloud version whenever you want to deploy, you can scale up, and down, and it has a data warehousing capability. Redshift has many features. They have enriched and elaborate documentation that is helpful.”- Aishwarya K., Solution Architect at Capgemini

    The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
    Sample Customers
    Liberty Mutual Insurance, 4Cite Marketing, BrandVerity, DNA Plc, Sirocco Systems, Gainsight, Blue 449
    Amazon, Adobe, eBay, Facebook, Google, Hulu, IBM, LinkedIn, Microsoft, Spotify, AOL, Twitter, University of Maryland, Yahoo!, Cornell University Web Lab
    Top Industries
    REVIEWERS
    Computer Software Company32%
    Comms Service Provider14%
    Manufacturing Company11%
    Retailer11%
    VISITORS READING REVIEWS
    Educational Organization50%
    Financial Services Firm9%
    Computer Software Company7%
    Manufacturing Company4%
    REVIEWERS
    Financial Services Firm38%
    Comms Service Provider25%
    Hospitality Company6%
    Consumer Goods Company6%
    VISITORS READING REVIEWS
    Financial Services Firm27%
    Computer Software Company10%
    Comms Service Provider6%
    University6%
    Company Size
    REVIEWERS
    Small Business40%
    Midsize Enterprise24%
    Large Enterprise37%
    VISITORS READING REVIEWS
    Small Business10%
    Midsize Enterprise54%
    Large Enterprise36%
    REVIEWERS
    Small Business34%
    Midsize Enterprise23%
    Large Enterprise43%
    VISITORS READING REVIEWS
    Small Business15%
    Midsize Enterprise11%
    Large Enterprise75%
    Buyer's Guide
    Amazon Redshift vs. Apache Hadoop
    March 2024
    Find out what your peers are saying about Amazon Redshift vs. Apache Hadoop and other solutions. Updated: March 2024.
    768,578 professionals have used our research since 2012.

    Amazon Redshift is ranked 4th in Cloud Data Warehouse with 58 reviews while Apache Hadoop is ranked 5th in Data Warehouse with 32 reviews. Amazon Redshift is rated 7.8, while Apache Hadoop is rated 7.8. The top reviewer of Amazon Redshift writes "Provides one place where we can store data, and allows us to easily connect to other services with AWS". On the other hand, the top reviewer of Apache Hadoop writes "A file system for data collection that contains needed information and files". Amazon Redshift is most compared with AWS Lake Formation, Snowflake, Teradata and Vertica, whereas Apache Hadoop is most compared with Azure Data Factory, Microsoft Azure Synapse Analytics, Oracle Exadata, Snowflake and Vertica. See our Amazon Redshift vs. Apache Hadoop report.

    See our list of best Data Warehouse vendors and best Cloud Data Warehouse vendors.

    We monitor all Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.