it_user396519 - PeerSpot reviewer
Director at a tech company with 1,001-5,000 employees
Real User
Columnar-storage databases leverage the Massively Parallel Processing (MPP) capabilities of its data warehouse architecture.

What is most valuable?

  • Performance: Very fast query performance due to columnar-storage databases that leverage the Massively Parallel Processing (MPP) capabilities of its data warehouse architecture.
  • Petabyte-scale data warehouse, without any loss in performance and low cost: One of our existing customers stores more than 500 terabytes of data in an AWS Redshift database and the warehouse performance was good. We want to highlight that even if the warehouse size increases to petabytes, Redshift would still work fine and there wouldn’t be any performance issues and would cost less also.

How has it helped my organization?

The end users were able to have access to real-time analytics.

What needs improvement?

We would really like to see a few more connectors included that would enable connecting with other databases and services. We have faced some difficulties pulling data from Teradata and storing it in Redshift. There is no direct connector available between Teradata and Redshift.

For how long have I used the solution?

We are working with this product for the past 24 months.

Buyer's Guide
Amazon Redshift
December 2023
Learn what your peers think about Amazon Redshift. Get advice and tips from experienced pros sharing their opinions. Updated: December 2023.
745,775 professionals have used our research since 2012.

What do I think about the stability of the solution?

We have not faced any stability related issue so far.

What do I think about the scalability of the solution?

We did not encounter any scalability issues in the last 24 months that we have been working with Redshift.

How are customer service and support?

We actually had to reach out to technical support a few times and they were really helpful and solved our problems. We would give it 4/5.

Which solution did I use previously and why did I switch?

We were using an on-premise MySQL data warehouse. To reduce the cost and improve scalability, we switched to a cloud version of data warehouse databases.

How was the initial setup?

Initial setup and configuration was pretty straightforward. First, we needed to create a Redshift cluster. Once the cluster was created, we created a database schema based on our need in the Redshift cluster.

What's my experience with pricing, setup cost, and licensing?

AWS Redshift is one of the fastest and most cost-effective cloud-based databases. They have charged $3330 per TB/year for the ds2.8x large instances which have 244 GB RAM, 36-core CPU, 10Gbps network and 16 TB HDD.

What other advice do I have?

You need to design the database structure with best sort and distribution keys, along with primary and foreign keys.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Padmanesh NC - PeerSpot reviewer
Padmanesh NCBig Data Solution Architect - Spatial Data Specialist at SCIERA, INC
Top 5LeaderboardReseller

Hey Aju Mathew,

It was a nice review about Amazon Redshift. I am also using this for last 3 year. I would like to understand a bit in terms of pricing. I am using there instance based on $0.25/per node/per hour. So If I am using 100 node cluster I have bane 25$ per hour. But as you said something like $3330 / year/ tb. Can you please elaborate the same. Is that based on node size or storage size?

it_user576450 - PeerSpot reviewer
Data Science Lead at a tech services company with 51-200 employees
Consultant
The PostgreSQL interface is good because you can play with big data with just SQL.

What is most valuable?

The valuable features are:

  • PostgreSQL Interface
  • Scalability
  • Pricing/Maintenance/Setup

The PostgreSQL interface is good because you can play with big data with just SQL. This is one of the reasons why they made Hive.

However, Hive’s SQL is still not as standard as what Redshift provides:
http://docs.aws.amazon.com/red...

How has it helped my organization?

Redshift has been the data warehouse in at least three of my previous companies. The impact is huge to anyone who uses data in any way.

What needs improvement?

I would like to see improvements in the database integrations. Currently, Amazon does not provide real-time/near real-time integration with other products like RDS or DynamoDB out-of-the-box.

We need to either build the integrations ourselves, or rely on third-party services which are not always the best.

For how long have I used the solution?

We have been using this solution for over three years.

What do I think about the stability of the solution?

There were stability issues in the beginning. However, the product has improved quite a lot in the last two years in term of stability.

What do I think about the scalability of the solution?

Redshift can scale up to a petabyte with a few simple clicks.

How are customer service and technical support?

Technical support is good, but similar to any other Amazon Web Service, you have to pay for a good level of technical support.

Which solution did I use previously and why did I switch?

We did not have a previous solution. Redshift worked for us the first time we tried. The pricing could not be beaten by anything else in the market at that time.

How was the initial setup?

The installation was straightforward and only required a few clicks.

What's my experience with pricing, setup cost, and licensing?

Pricing was quite a strong point of Redshift when it was first released. Nowadays, quite a number of other services are very competitive in pricing, such as BigQuery.

What other advice do I have?

Redshift, like any other big data technology, isn’t a silver bullet for everything. The most important thing is to understand your data and your requirements before you make any decision to use any technology.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Amazon Redshift
December 2023
Learn what your peers think about Amazon Redshift. Get advice and tips from experienced pros sharing their opinions. Updated: December 2023.
745,775 professionals have used our research since 2012.
it_user576444 - PeerSpot reviewer
Rails Developer at a recruiting/HR firm with 51-200 employees
Vendor
It's based on PostgreSQL, is a managed solution, and has low price per terabyte per year.

What is most valuable?

  • It is based on PostgreSQL.
  • It’s managed. Meaning, AWS takes care of handling infrastructure, deployments, encryption, and uptime for you.
  • It’s cheap when you consider the price per terrabyte per year.
  • It’s integrated into the AWS stack.

How has it helped my organization?

At my previous company that does mobile analytics as its core product, we moved all the analytics backend from MongoDB to Redshift. Where I currently work, we use it as our main data lake/data warehouse.

What needs improvement?

While It's probably the best product of its category (managed SQL-based data warehouse at scale), it has a few shortcomings, although very few.

The main issue people complain about, and I agree with the claim, is that it's hard to load your data into it. You need to first export your data on S3 as CSV, JSON or AVRO. Then you can load it into Redshift. And even then, you have to make sure your data is properly formatted. (you can use the copy options: TRUNCATECOLUMNS to load fields that are too big, and MAXERROR to allow for a given number of errors while loading). In general, ETL and data cleaning is a hurdle in data engineering, and Redshift suffers from it.

For how long have I used the solution?

I have used Redshift for three years.

What do I think about the stability of the solution?

I once had an issue because my data contained a Unicode NULL character in a VARCHAR field ("\u0000"). The AWS support has been very quick and helpful to respond. Other than that, I have had no issues whatsoever.

What do I think about the scalability of the solution?

No scalability issues whatsoever.

How are customer service and technical support?

Technical support is very good.

Which solution did I use previously and why did I switch?

At my previous company, we switched from MongoDB to Redshift. The main reason was price and performance. At my current company, we started a data warehouse (greenfield project). The choice was between Google BigQuery and AWS Redshift. The main criteria was that Redshift was PostgreSQL-based and supports CTE and Window functions (PostgreSQL features).

How was the initial setup?

The big part when using Redshift is setting up the ETLs and doing the data cleaning. It was very hard when moving from MongoDB, because I had to re-discover our data schema (that had no spec). With that said, in both cases (moving from MongoDB and starting from scratch), I had a prototype up in about a day. By that I mean that I had the most important parts of my data loaded into Redshift and I could query it.

What's my experience with pricing, setup cost, and licensing?

The pricing page is explicit. Choose what suits your needs in terms of storage and performance.

Which other solutions did I evaluate?

For setting up a data warehouse, BigQuery was a serious contender. BigQuery is simpler to setup and scale. It's also more of a black box: you worry less what's inside and how it scales and you get charged for what you consume (which is both a pro and a con). With Redshift, you choose in advance the type of machine you want, like EC2 (resizing your cluster is easy).

What other advice do I have?

If you evaluate Redshift, chances are that you should evaluate BigQuery too. So take the time to weigh the pro and cons of each (plenty has been written online about that).

Take a look at the reserved instances pricing. It is very advantageous if you know you will stick with Redshift for some time.

Take the time to learn PostgreSQL (eg: https://www.pgexercises.com/). Redshift, while based on PostgreSQL 8.0, supports a good number of advanced Postgres features.

Do not be afraid of joins. PostgreSQL is performs very well in this regard.
If you need performance, have a look at the suggested optimizations in the official documentation (such as setting up the correct distkeys, sortkeys and compression schemes).

Understand that Redshift has no indexes.

Understand that Redshift is an analytical database with columnar storage, and that it does not enforce constraints.

Redshift plays very well with a PostgreSQL instance in RDS linked to it via DBLINK (see this guide: https://aws.amazon.com/blogs/big-data/join-amazon-redshift-and-amazon-rds-postgresql-with-dblink/). I've used this in production at my current company, and this is tremendously useful. You can have your raw data in Redshift and aggregate it directly into RDS. To do this, insert into RDS what you select from Redshift through the dblink.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user576456 - PeerSpot reviewer
Manager BI Development at a comms service provider with 1,001-5,000 employees
Vendor
The fact that it stores data using a columnar approach allows us to use columns in join conditions.

What is most valuable?

Redshift gives extremely fast response involving large tables. This is the most important feature I look for in data warehouse solutions. Often you came across use cases where it is not possible to distribute data on a certain column, yet you need this column in join conditions. Redshift stores data using a columnar approach, which is useful for data aggregation.

All this at an extremely low price makes it possible for small to medium sized organizations to use Redshift’s power to get business insights.

How has it helped my organization?

One of my clients required large amounts of data but had a low budget. Amazon Redshift was the perfect choice for my client. We joined two tables containing billions of rows each and got results back in 27 seconds with a relatively small cluster of nodes.

What needs improvement?

Amazon should bring more SQL functions that are required in data warehouse implementations. It lacks SQL functions for complex data processing. A very small example is recursive queries. However, Amazon is developing the product at a fast pace and bringing new features with every release.

For how long have I used the solution?

I’ve been using Redshift for more than two years. I created one traditional data warehouse with 3-tier architecture and one big data solution.

What do I think about the stability of the solution?

We have not really had stability problems. The product is mature and can be utilized for production systems.

What do I think about the scalability of the solution?

Since Redshift is on AWS cloud, scalability is not an issue. With a few clicks, cluster size can be increased or reduced. This is useful especially when you expect a large amount of data processing temporarily. For example, on Black Friday retail organizations expect large amounts of data flow/processing. Redshift can be scaled up for few days to accommodate the surge of data and then scaled back to normal cluster size to save OPEX.

How are customer service and technical support?

The AWS team gives special focus to customer support. This is a very big benefit of going to the cloud. You get a reply from AWS in small time frame.

Which solution did I use previously and why did I switch?

I worked on Teradata and IBM solutions. Redshift gives performance similar to these solutions and costs a fraction of the amount.

How was the initial setup?

Your Redshift can be up and running with few clicks and in less than 5 minutes. A big benefit when you shift to cloud.

Which other solutions did I evaluate?

We analyzed Microsoft, Oracle, AWS RDS and Mango DB for our requirements.

What other advice do I have?

Redshift is based on PostgreSQL and adds MPP/columnar features to make it a data warehouse product. It is very easy for developers to adopt this solution. Your existing team can easily work on Redshift with no extra cost of learning.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user572622 - PeerSpot reviewer
BI Architect & Developer (contract) at a retailer with 501-1,000 employees
Vendor
You can configure tables to live in the memory of all of the available cores.

What is most valuable?

Column store and distributed processing is optimized for read access. We grew to 3000+ users with no impact.

Column store is a data compression technique for relational data. I’m using it now in SQL Server 2016. We configured a 16-core VM for handling requests on the DB. The recommendation was to separate inbound data packets into related chunks, which were 1/16th of the size.

This way, the import process could make full use of parallelization, and it worked. We imported 20 million rows of sales facts in less than 15 seconds, and the content was query-able immediately. I’ve never seen that before. This was impressive. This meant that we could completely rebuild the data warehouse to “current” from "scratch" within minutes, assuming that the data was in S3 already.

Tables that would typically be 2GB in size are now about 250MB. This means more data in memory. You can also configure the tables to live in the memory of all of the available cores. This is good for small dimension tables. You can also fragment them across all cores, for the larger fact tables. This allows for distributed query processing. Once you set it up, it just worked. It was all specified in the PG-SQL table statements.

There were two data centers in Sydney that were guaranteeing us a distributed solution. We really didn’t notice this. It was more of a check box situation. At one point, there was an outage at AWS, but it didn’t impact our operations directly.

How has it helped my organization?

This has given us the ability to provide metrics to the large number of company staff on their performance without impacting core systems.

What needs improvement?

I’d like to see these RedShift features arrive in other languages, such as SQL's ColumnStore index.

.

For how long have I used the solution?

I have used this solution for three years.

What do I think about the stability of the solution?

There have been no stability issues.

How are customer service and technical support?

Technical support always met my expectations.

Which solution did I use previously and why did I switch?

I was on a team that was using AWS tools for Dick Smith Electronics (now liquidated). The tools ceased use in February of 2016.

Prior to that, we were using them fully for about 3 years. We loaded data to Redshift according to the best practices included in the online docs and through consultation with the AWS staff. The combination of S3 and Redshift for this purpose was very high in performance. Redshift was used to provide the data model to an instance of MicroStrategy for BI reporting.

We were using MicroStrategy, which generated all the SQL that our reporting services needed.

As such, I could only comment on the data engineering phase. Technically, this was so impressive that I don’t know what to add. I don’t recall feeling that it missed anything. If anything, I was not using all the available features. AWS documentation is great in this regard. You can tell they have put a lot of thought into it.

A lot of the future direction in database technology has to do with memory optimization and concurrency (VoltDB). This is more targeted towards transactional processing, and not data warehousing.

Memory-only data warehousing solves a lot of access issues without having to think too hard about the problem from the consumers' point of view. I am sure that you can already configure this.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user583371 - PeerSpot reviewer
BI Architect at a comms service provider with 5,001-10,000 employees
Vendor
Columnar storage technology is valuable.

What is most valuable?

Columnar storage technology is the most valuable feature of this solution.

How has it helped my organization?

We can get the SLS/SLAs in our daily processes.

What needs improvement?

Some improvements can be brought about in:

Restore table:

I would like to use this option to move data across different clusters. Right now, you can only restore a table from the same cluster.

Right now, the feature only permits bringing the table back in the same cluster, based on the snapshot taken. I would like to have a similar option to move data across different clusters, right now I have to UNLOAD from cluster A and then COPY in cluster B. I would like to use the snapshots taken to bring the data in the cluster I need.
Maybe current design cannot be used, because it is based on nodes and data distribution.

But, our real scenario is: if we lose the data and we need to recover it in other cluster, we have to do:

1) Restore table in current table with a different name

2) Unload data to s3

3) Copy data to a new cluster. When we are talking about billions of records is complex to do.

Vacuum process: The vacuum needs to be segmented. For example, after 24 hours of execution, I had to cancel the process and 0% was sorted (big table).


Vacuum process:

The vacuum needs to be segmented, example after 24 hr of execution, I had to cancel the process and 0 % was sorted (big table)"

For big tables (billions of records). if the table is 100% unsorted, the vacuum can take more than 24hrs. If we don't have this timeframe, we have to work around taking out the data to additional tables and run vacuum by batches in the main table.

Why, because If I run the vacuum directly over the main table, and I stop it after 5 hrs, 0 records will be sorted. I would like to run the vacuum over the main table, stop when I need but get vacuumed some records. Like incremental process.

For how long have I used the solution?

I have used this solution for around three years.

What do I think about the stability of the solution?

We did encounter stability issues, i.e., if you are using more than 25 nodes (ds2.xlarge), the cluster is totally unstable.

What do I think about the scalability of the solution?

I have not experienced any scalability issues.

How are customer service and technical support?

I would rate the technical support a 9/10 for normal issues.

However, for advanced issues, I would give it a 5/10 since I had to go directly with the AWS engineers support.

Which solution did I use previously and why did I switch?

Initially, we were using the Microsoft SQL solution. We decided to move over to this product due to the DWH volume and performance.

How was the initial setup?

In my opinion, the setup was normal.

What's my experience with pricing, setup cost, and licensing?

Based on quality of the product and its price, it is the one of the best options available in the market now.

Which other solutions did I evaluate?

We also looked at the Oracle solution.

What other advice do I have?

You need to make sure that the space used in DWH has to be a maximum of 50% of the total space.

You must create processes to vacuum and analyze tables frequently. Also, before creating the tables, you should choose the right encoding, DISTKEY and sort keys.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user576441 - PeerSpot reviewer
Senior Software Engineer [Redshift Programmer] at a tech services company with 1,001-5,000 employees
Consultant
It supports SCD1 and SCD2, and the star schema. Improvement is needed in the scope of data types and complex RDBMS functionalities.

What is most valuable?

The most valuable features of this product are:

  • Processing huge data in petabytes
  • Massively Parallel Processing (MPP)
  • Concept of data compression
  • The way it stores the data in drives especially with the distribution key
  • Supports BI tools like MicroStrategy (MSTR) and Tableau
  • Supports all the data warehouse core features such as SCD1 and SCD2, and different schemas like the star schema

How has it helped my organization?

It has helped us to understand the response and interest of the customers and the user conversion rate in this competitive world. Thus, it has helped us in the decision-making process.

What needs improvement?

In most of the scenarios, the data source for Redshift will be traditional RDBMS like MySQL, PostgreSQL, SQL server, etc. After migrating to Redshift, we will find few disconnects for w.r.t data types, the stored procedures and other complex functionalities. There is a need for improvement in these aspects, mainly in the scope of data types and some complex functionalities which we can perform in RDBMS.

For how long have I used the solution?

I have used this solution for more than a year.

What do I think about the stability of the solution?

I have not encountered any issues with stability. In terms of performance, Redshift is highly stable.

What do I think about the scalability of the solution?

I have not encountered any issues with scalability. We can easily scale the nodes in AWS only with a few clicks.

How are customer service and technical support?

I would give the technical support a 6 out of 10 rating.

Which solution did I use previously and why did I switch?

We have not used any other solution.

How was the initial setup?

The setup was straightforward for those who know AWS.

What's my experience with pricing, setup cost, and licensing?

The Redshift pricing policy is easy to understand.

Which other solutions did I evaluate?

We did not evaluate other options prior to selecting this solution.

What other advice do I have?

As of now, Redshift is far better than the other products in the market.

Lastly, I would like to mention that Redshift is more about scaling and stabilizing your data. One should also focus on data modeling from time to time.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Senior Engineer, Big-Data/Data-Warehousing at a manufacturing company with 501-1,000 employees
Vendor
We create different-sized clusters and orchestrate them using the SDK.

What is most valuable?

The most valuable features to us are: speed, DML, the fact that it is cloud-based, the management console, and Boto3.

Because we are dealing with a lot of data, speed is always important. Redshift is blistering fast when doing "deep" copies and inserts. Conceptually, my data-transformation pipelines are a series of proprietary "waves" that leverage Redshift's DML/"deep" copy/insert strengths. Doing all this in the cloud allows us to easily test alternatives. We create different sized Redshift clusters and orchestrate them by using the SDK (Python Boto3). We go beyond the traditional DWH to "infrastructure-as-software".

How has it helped my organization?

Redshift has helped to transform Makerbot into a data-driven company.

What needs improvement?

Integrating database security/access rights with AWS IAM would be great. I would also like to see more DML features that might aid in processing unstructured or log-file data. This would allow us to avoid having to use EMR/Hadoop.

For how long have I used the solution?

We’ve used Amazon Redshift for 3 years.

What was my experience with deployment of the solution?

We did not encounter any deployment issues.

What do I think about the stability of the solution?

We did not encounter any issues with stability.

What do I think about the scalability of the solution?

We did not encounter any issues with scalability.

How are customer service and technical support?

Customer Service:

I think the customer services is adequate.

Technical Support:

The level of technical support is good.

Which solution did I use previously and why did I switch?

We tried prior solutions, but they had limited or no scalability/agility.

How was the initial setup?

The initial setup was straightforward.

What was our ROI?

It took less than a year for the product to pay for itself.

What's my experience with pricing, setup cost, and licensing?

Regarding pricing and licensing, I advise to start small and have your developers/DBA use table compression and partitioning from the start.

Which other solutions did I evaluate?

We have used different options over the last 20 years. We found AWS Redshift to be the leader in capability and provides an ecosystem of related services from AWS, many of which are free.

What other advice do I have?

My advice to other is to prototype, prototype, prototype! Everything depends on your data and what you need to do to it. No two projects are the same.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Amazon Redshift Report and get advice and tips from experienced pros sharing their opinions.
Updated: December 2023
Product Categories
Cloud Data Warehouse
Buyer's Guide
Download our free Amazon Redshift Report and get advice and tips from experienced pros sharing their opinions.