it_user576444 - PeerSpot reviewer
Rails Developer at a recruiting/HR firm with 51-200 employees
Vendor
It's based on PostgreSQL, is a managed solution, and has low price per terabyte per year.

What is most valuable?

  • It is based on PostgreSQL.
  • It’s managed. Meaning, AWS takes care of handling infrastructure, deployments, encryption, and uptime for you.
  • It’s cheap when you consider the price per terrabyte per year.
  • It’s integrated into the AWS stack.

How has it helped my organization?

At my previous company that does mobile analytics as its core product, we moved all the analytics backend from MongoDB to Redshift. Where I currently work, we use it as our main data lake/data warehouse.

What needs improvement?

While It's probably the best product of its category (managed SQL-based data warehouse at scale), it has a few shortcomings, although very few.

The main issue people complain about, and I agree with the claim, is that it's hard to load your data into it. You need to first export your data on S3 as CSV, JSON or AVRO. Then you can load it into Redshift. And even then, you have to make sure your data is properly formatted. (you can use the copy options: TRUNCATECOLUMNS to load fields that are too big, and MAXERROR to allow for a given number of errors while loading). In general, ETL and data cleaning is a hurdle in data engineering, and Redshift suffers from it.

For how long have I used the solution?

I have used Redshift for three years.

Buyer's Guide
Amazon Redshift
April 2024
Learn what your peers think about Amazon Redshift. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,886 professionals have used our research since 2012.

What do I think about the stability of the solution?

I once had an issue because my data contained a Unicode NULL character in a VARCHAR field ("\u0000"). The AWS support has been very quick and helpful to respond. Other than that, I have had no issues whatsoever.

What do I think about the scalability of the solution?

No scalability issues whatsoever.

How are customer service and support?

Technical support is very good.

Which solution did I use previously and why did I switch?

At my previous company, we switched from MongoDB to Redshift. The main reason was price and performance. At my current company, we started a data warehouse (greenfield project). The choice was between Google BigQuery and AWS Redshift. The main criteria was that Redshift was PostgreSQL-based and supports CTE and Window functions (PostgreSQL features).

How was the initial setup?

The big part when using Redshift is setting up the ETLs and doing the data cleaning. It was very hard when moving from MongoDB, because I had to re-discover our data schema (that had no spec). With that said, in both cases (moving from MongoDB and starting from scratch), I had a prototype up in about a day. By that I mean that I had the most important parts of my data loaded into Redshift and I could query it.

What's my experience with pricing, setup cost, and licensing?

The pricing page is explicit. Choose what suits your needs in terms of storage and performance.

Which other solutions did I evaluate?

For setting up a data warehouse, BigQuery was a serious contender. BigQuery is simpler to setup and scale. It's also more of a black box: you worry less what's inside and how it scales and you get charged for what you consume (which is both a pro and a con). With Redshift, you choose in advance the type of machine you want, like EC2 (resizing your cluster is easy).

What other advice do I have?

If you evaluate Redshift, chances are that you should evaluate BigQuery too. So take the time to weigh the pro and cons of each (plenty has been written online about that).

Take a look at the reserved instances pricing. It is very advantageous if you know you will stick with Redshift for some time.

Take the time to learn PostgreSQL (eg: https://www.pgexercises.com/). Redshift, while based on PostgreSQL 8.0, supports a good number of advanced Postgres features.

Do not be afraid of joins. PostgreSQL is performs very well in this regard.
If you need performance, have a look at the suggested optimizations in the official documentation (such as setting up the correct distkeys, sortkeys and compression schemes).

Understand that Redshift has no indexes.

Understand that Redshift is an analytical database with columnar storage, and that it does not enforce constraints.

Redshift plays very well with a PostgreSQL instance in RDS linked to it via DBLINK (see this guide: https://aws.amazon.com/blogs/big-data/join-amazon-redshift-and-amazon-rds-postgresql-with-dblink/). I've used this in production at my current company, and this is tremendously useful. You can have your raw data in Redshift and aggregate it directly into RDS. To do this, insert into RDS what you select from Redshift through the dblink.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user583371 - PeerSpot reviewer
BI Architect at a comms service provider with 5,001-10,000 employees
Vendor
Columnar storage technology is valuable.

What is most valuable?

Columnar storage technology is the most valuable feature of this solution.

How has it helped my organization?

We can get the SLS/SLAs in our daily processes.

What needs improvement?

Some improvements can be brought about in:

Restore table:

I would like to use this option to move data across different clusters. Right now, you can only restore a table from the same cluster.

Right now, the feature only permits bringing the table back in the same cluster, based on the snapshot taken. I would like to have a similar option to move data across different clusters, right now I have to UNLOAD from cluster A and then COPY in cluster B. I would like to use the snapshots taken to bring the data in the cluster I need.
Maybe current design cannot be used, because it is based on nodes and data distribution.

But, our real scenario is: if we lose the data and we need to recover it in other cluster, we have to do:

1) Restore table in current table with a different name

2) Unload data to s3

3) Copy data to a new cluster. When we are talking about billions of records is complex to do.

Vacuum process: The vacuum needs to be segmented. For example, after 24 hours of execution, I had to cancel the process and 0% was sorted (big table).


Vacuum process:

The vacuum needs to be segmented, example after 24 hr of execution, I had to cancel the process and 0 % was sorted (big table)"

For big tables (billions of records). if the table is 100% unsorted, the vacuum can take more than 24hrs. If we don't have this timeframe, we have to work around taking out the data to additional tables and run vacuum by batches in the main table.

Why, because If I run the vacuum directly over the main table, and I stop it after 5 hrs, 0 records will be sorted. I would like to run the vacuum over the main table, stop when I need but get vacuumed some records. Like incremental process.

For how long have I used the solution?

I have used this solution for around three years.

What do I think about the stability of the solution?

We did encounter stability issues, i.e., if you are using more than 25 nodes (ds2.xlarge), the cluster is totally unstable.

What do I think about the scalability of the solution?

I have not experienced any scalability issues.

How are customer service and technical support?

I would rate the technical support a 9/10 for normal issues.

However, for advanced issues, I would give it a 5/10 since I had to go directly with the AWS engineers support.

Which solution did I use previously and why did I switch?

Initially, we were using the Microsoft SQL solution. We decided to move over to this product due to the DWH volume and performance.

How was the initial setup?

In my opinion, the setup was normal.

What's my experience with pricing, setup cost, and licensing?

Based on quality of the product and its price, it is the one of the best options available in the market now.

Which other solutions did I evaluate?

We also looked at the Oracle solution.

What other advice do I have?

You need to make sure that the space used in DWH has to be a maximum of 50% of the total space.

You must create processes to vacuum and analyze tables frequently. Also, before creating the tables, you should choose the right encoding, DISTKEY and sort keys.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Amazon Redshift
April 2024
Learn what your peers think about Amazon Redshift. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,886 professionals have used our research since 2012.
it_user576441 - PeerSpot reviewer
Senior Software Engineer [Redshift Programmer] at a tech services company with 1,001-5,000 employees
Consultant
It supports SCD1 and SCD2, and the star schema. Improvement is needed in the scope of data types and complex RDBMS functionalities.

What is most valuable?

The most valuable features of this product are:

  • Processing huge data in petabytes
  • Massively Parallel Processing (MPP)
  • Concept of data compression
  • The way it stores the data in drives especially with the distribution key
  • Supports BI tools like MicroStrategy (MSTR) and Tableau
  • Supports all the data warehouse core features such as SCD1 and SCD2, and different schemas like the star schema

How has it helped my organization?

It has helped us to understand the response and interest of the customers and the user conversion rate in this competitive world. Thus, it has helped us in the decision-making process.

What needs improvement?

In most of the scenarios, the data source for Redshift will be traditional RDBMS like MySQL, PostgreSQL, SQL server, etc. After migrating to Redshift, we will find few disconnects for w.r.t data types, the stored procedures and other complex functionalities. There is a need for improvement in these aspects, mainly in the scope of data types and some complex functionalities which we can perform in RDBMS.

For how long have I used the solution?

I have used this solution for more than a year.

What do I think about the stability of the solution?

I have not encountered any issues with stability. In terms of performance, Redshift is highly stable.

What do I think about the scalability of the solution?

I have not encountered any issues with scalability. We can easily scale the nodes in AWS only with a few clicks.

How are customer service and technical support?

I would give the technical support a 6 out of 10 rating.

Which solution did I use previously and why did I switch?

We have not used any other solution.

How was the initial setup?

The setup was straightforward for those who know AWS.

What's my experience with pricing, setup cost, and licensing?

The Redshift pricing policy is easy to understand.

Which other solutions did I evaluate?

We did not evaluate other options prior to selecting this solution.

What other advice do I have?

As of now, Redshift is far better than the other products in the market.

Lastly, I would like to mention that Redshift is more about scaling and stabilizing your data. One should also focus on data modeling from time to time.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Consultant at a tech services company with 51-200 employees
Consultant
High performance, efficient, and helpful support
Pros and Cons
  • "The most valuable features of Amazon Redshift are that its fast and efficient. We have lots of TBs of data and it's very fast."
  • "Amazon Redshift could improve the user interface support."

What is our primary use case?

We are using Amazon Redshift services to query the data and to perform certain data science operations on that data, such as applying a machine learning algorithm or doing an analysis.

What is most valuable?

The most valuable features of Amazon Redshift are that its fast and efficient. We have lots of TBs of data and it's very fast.

What needs improvement?

Amazon Redshift could improve the user interface support.

For how long have I used the solution?

I have been using Amazon Redshift for approximately one year.

What do I think about the stability of the solution?

Amazon Redshift is a stable solution. However, there are many times the environment configuration changes very quickly without any intimidation and it creates a lot of problems for running our codes.

What do I think about the scalability of the solution?

The scalability of Amazon Redshift is good. The solution is best suited for larger-scale businesses because the price is affordable for them and they need the complexity.

How are customer service and support?

The support from Amazon Redshift is very good.

How was the initial setup?

Amazon Redshift is somewhat complex to deploy. The process could improve.

What's my experience with pricing, setup cost, and licensing?

Amazon Redshift is an expensive solution. Larger organizations can afford this solution, but smaller businesses would struggle to afford it.

What other advice do I have?

I rate Amazon Redshift an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer: partner
PeerSpot user
it_user576456 - PeerSpot reviewer
Manager BI Development at a comms service provider with 1,001-5,000 employees
Vendor
The fact that it stores data using a columnar approach allows us to use columns in join conditions.

What is most valuable?

Redshift gives extremely fast response involving large tables. This is the most important feature I look for in data warehouse solutions. Often you came across use cases where it is not possible to distribute data on a certain column, yet you need this column in join conditions. Redshift stores data using a columnar approach, which is useful for data aggregation.

All this at an extremely low price makes it possible for small to medium sized organizations to use Redshift’s power to get business insights.

How has it helped my organization?

One of my clients required large amounts of data but had a low budget. Amazon Redshift was the perfect choice for my client. We joined two tables containing billions of rows each and got results back in 27 seconds with a relatively small cluster of nodes.

What needs improvement?

Amazon should bring more SQL functions that are required in data warehouse implementations. It lacks SQL functions for complex data processing. A very small example is recursive queries. However, Amazon is developing the product at a fast pace and bringing new features with every release.

For how long have I used the solution?

I’ve been using Redshift for more than two years. I created one traditional data warehouse with 3-tier architecture and one big data solution.

What do I think about the stability of the solution?

We have not really had stability problems. The product is mature and can be utilized for production systems.

What do I think about the scalability of the solution?

Since Redshift is on AWS cloud, scalability is not an issue. With a few clicks, cluster size can be increased or reduced. This is useful especially when you expect a large amount of data processing temporarily. For example, on Black Friday retail organizations expect large amounts of data flow/processing. Redshift can be scaled up for few days to accommodate the surge of data and then scaled back to normal cluster size to save OPEX.

How are customer service and technical support?

The AWS team gives special focus to customer support. This is a very big benefit of going to the cloud. You get a reply from AWS in small time frame.

Which solution did I use previously and why did I switch?

I worked on Teradata and IBM solutions. Redshift gives performance similar to these solutions and costs a fraction of the amount.

How was the initial setup?

Your Redshift can be up and running with few clicks and in less than 5 minutes. A big benefit when you shift to cloud.

Which other solutions did I evaluate?

We analyzed Microsoft, Oracle, AWS RDS and Mango DB for our requirements.

What other advice do I have?

Redshift is based on PostgreSQL and adds MPP/columnar features to make it a data warehouse product. It is very easy for developers to adopt this solution. Your existing team can easily work on Redshift with no extra cost of learning.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user705738 - PeerSpot reviewer
Senior Solutions Engineer, West at a tech vendor with 5,001-10,000 employees
Vendor
It helped my customers migrate off on-premise platforms
Pros and Cons
  • "Redshift COPY command, because much of my work involved helping customers migrate large amounts of data into Redshift."
  • "Migrating data from other data sources can be challenging when you are working with multibyte character sets."

What is most valuable?

Redshift COPY command, because much of my work involved helping customers migrate large amounts of data into Redshift.

How has it helped my organization?

It helped my customers migrate off on-premise platforms such as Teradata to Redshift, at a fraction of the cost.

What needs improvement?

There are challenges with dealing with character set mismatches. Migrating data from other data sources can be challenging when you are working with multibyte character sets.

For how long have I used the solution?

Two years.

What do I think about the stability of the solution?

No.

What do I think about the scalability of the solution?

I personally haven’t hit scalability issues but at dinner a year ago with a few of my existing customers (all Fortune 500 companies), I was told there are scalability issues once you get to 32-nodes.

One of my previous customers told me they were migrating off Redshift because they hit the ceiling and had scalability issues. They told me the responsiveness they were getting was inferior to alternative solutions once your Redshift gets to a specific size.

How are customer service and technical support?

I never utilized AWS technical support.

Which solution did I use previously and why did I switch?

I’ve helped customers migrate off Teradata, SQL Server , Oracle Exadata, Greenplum, and ParAccel Matrix to Redshift. Some due to cost savings, others because of the EOL of the product.

How was the initial setup?

Setup of Redshift infrastructure is pretty straightforward. I’ve been told that setting up partitions can be tricky in order to ensure good performance.

What's my experience with pricing, setup cost, and licensing?

I have nothing to add here as I wasn’t involved in this part of the process. However, one of my customers went with Google Big Query over Redshift because it was significantly cheaper for their project.

Which other solutions did I evaluate?

I only provided advice to my customers, but some looked at Azure SQL DW , Greenplum, Netezza, and Google Big Query as possible alternatives

What other advice do I have?

Be careful with vendor lock-in! You cannot move your Redshift environment to a different cloud provider or to an on-premise solution.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
BI Manager at jfrog
Real User
You can copy JSON to the column and have it analyzed using simple functions
Pros and Cons
  • "You can copy JSON to the column and have it analyzed using simple functions."
  • "It lacks a few features which can be very useful, such as stored procedures"

What is most valuable?

The features I find valuable in Redshift are JSON format support. You can copy JSON to the column and have it analyzed using simple functions. Second, is the parallel off/on where you can choose if you want it to unload to split files or into one file.

How has it helped my organization?

Since we have lots of data sources and high volumes, we needed a unified and organized DB that can handle these amounts and will be our single source of truth for the organization. Therefore, Redshift is the best solution.

What needs improvement?

It lacks a few features which can be very useful, such as stored procedures, Also, one needs to perform Vacuum in order to manage this DB. It would be nice not to worry about that and have this manageable.

For how long have I used the solution?

Three years.

What do I think about the stability of the solution?

Yes. Sometimes, for some reason, Redshift is down (not due to maintenance).

What do I think about the scalability of the solution?

No, cause we know how to use Redshift. We have a cluster of both HDD and SSD for which we keep the maximum data in each, so it would be scalable.

How is customer service and technical support?

Great. They are available and very helpful.

How was the initial setup?

Initial setup is very straightforward, very easy. No need of any side help.

What's my experience with pricing, setup cost, and licensing?

If you want to think of every query you make but want to know that your nodes are fully managed, then use BigQuery Data Analytics. If you want a fixed price, an to not worry about every query, but you need to manage your nodes personally, use Redshift.

Which other solutions did I evaluate?

I did not. we did consider using BigQuery Data Analytics, but eventually, we decided to use Redshift.

What other advice do I have?

My rating would be 8.5. This a great product, but one still needs to know how to manage clusters and nodes.

In order to make your DB scalable and reliable. it has the greatest benefit of build on PostgreSQL, so any data specialist that has SQL experience can handle Redshift.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user572622 - PeerSpot reviewer
BI Architect & Developer (contract) at a retailer with 501-1,000 employees
Vendor
You can configure tables to live in the memory of all of the available cores.

What is most valuable?

Column store and distributed processing is optimized for read access. We grew to 3000+ users with no impact.

Column store is a data compression technique for relational data. I’m using it now in SQL Server 2016. We configured a 16-core VM for handling requests on the DB. The recommendation was to separate inbound data packets into related chunks, which were 1/16th of the size.

This way, the import process could make full use of parallelization, and it worked. We imported 20 million rows of sales facts in less than 15 seconds, and the content was query-able immediately. I’ve never seen that before. This was impressive. This meant that we could completely rebuild the data warehouse to “current” from "scratch" within minutes, assuming that the data was in S3 already.

Tables that would typically be 2GB in size are now about 250MB. This means more data in memory. You can also configure the tables to live in the memory of all of the available cores. This is good for small dimension tables. You can also fragment them across all cores, for the larger fact tables. This allows for distributed query processing. Once you set it up, it just worked. It was all specified in the PG-SQL table statements.

There were two data centers in Sydney that were guaranteeing us a distributed solution. We really didn’t notice this. It was more of a check box situation. At one point, there was an outage at AWS, but it didn’t impact our operations directly.

How has it helped my organization?

This has given us the ability to provide metrics to the large number of company staff on their performance without impacting core systems.

What needs improvement?

I’d like to see these RedShift features arrive in other languages, such as SQL's ColumnStore index.

.

For how long have I used the solution?

I have used this solution for three years.

What do I think about the stability of the solution?

There have been no stability issues.

How are customer service and technical support?

Technical support always met my expectations.

Which solution did I use previously and why did I switch?

I was on a team that was using AWS tools for Dick Smith Electronics (now liquidated). The tools ceased use in February of 2016.

Prior to that, we were using them fully for about 3 years. We loaded data to Redshift according to the best practices included in the online docs and through consultation with the AWS staff. The combination of S3 and Redshift for this purpose was very high in performance. Redshift was used to provide the data model to an instance of MicroStrategy for BI reporting.

We were using MicroStrategy, which generated all the SQL that our reporting services needed.

As such, I could only comment on the data engineering phase. Technically, this was so impressive that I don’t know what to add. I don’t recall feeling that it missed anything. If anything, I was not using all the available features. AWS documentation is great in this regard. You can tell they have put a lot of thought into it.

A lot of the future direction in database technology has to do with memory optimization and concurrency (VoltDB). This is more targeted towards transactional processing, and not data warehousing.

Memory-only data warehousing solves a lot of access issues without having to think too hard about the problem from the consumers' point of view. I am sure that you can already configure this.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Amazon Redshift Report and get advice and tips from experienced pros sharing their opinions.
Updated: April 2024
Product Categories
Cloud Data Warehouse
Buyer's Guide
Download our free Amazon Redshift Report and get advice and tips from experienced pros sharing their opinions.