Try our new research platform with insights from 80,000+ expert users

Share your experience using Cask

The easiest route - we'll conduct a 15 minute phone interview and write up the review for you.

Use our online form to submit your review. It's quick and you can post anonymously.

Your review helps others learn about this solution
The PeerSpot community is built upon trust and sharing with peers.
It's good for your career
In today's digital world, your review shows you have valuable expertise.
You can influence the market
Vendors read their reviews and make improvements based on your feedback.
Examples of the 102,000+ reviews on PeerSpot:

Kemal Duman - PeerSpot reviewer
Team Lead, Data Engineering at a recreational facilities/services company with 201-500 employees
Real User
Top 5Leaderboard
Jan 21, 2026
Data pipelines have run faster and support flexible batch and streaming transformations

What is our primary use case?

Spark SQL has been in our stack for less than one year, though some of our colleagues are using it. It is a useful product for transformation jobs.

We generally use Spark SQL for batch processing. We use it for general batch operations and have used it for some streaming jobs.

How has it helped my organization?

We did not spend much additional effort on Spark jobs.

It is a useful product for transformation jobs.

What is most valuable?

Speed is the major benefit of using Spark SQL.

Spark SQL is interoperable with Hive. While migrating from HDFS to Iceberg, we did not need to change our Spark SQL job configurations, only the location or type of connection from Hive to Trino or S3. This interoperability proved to be very useful.

What needs improvement?

We do not have any performance problems, but we do have some resource problems. Spark SQL consumes so many resources that we migrated our streaming job from Spark to Apache Flink.

Resource management in Spark SQL should be better. It consumes more resources, which is normal. The main reason we switched from Spark is memory and CPU consumption. The major reason is the resource problem because the number of streaming jobs has been increasing in our company. That is why we considered resource management as a priority.

Because of the resource consumption, I would say the development of Spark SQL is better. For development purposes, it is a top product and not difficult to work with, but resources are the major problem. We changed to Flink regardless of development time. Development time is less in Spark compared with Flink.

Which solution did I use previously and why did I switch?

We first migrated from Spark to NiFi for some streaming tasks. It is a good alternative for us, but we changed our decision from NiFi to Flink because NiFi had some problems while ingesting data to Iceberg or S3.

Which other solutions did I evaluate?

We are not looking for any other options. We generally use open-source products and are supporters of the open-source movement. Our tendency is to use open-source products, so Apache products are at the top of our list.

We are not paying any money for Apache products. We always use the free versions, the open-source versions.

We are currently dealing with Iceberg, Flink, Airflow, and Trino.

What other advice do I have?

Regarding the Catalyst query optimizer, I think we are using it. We were using it in the past, but I am not certain if we use it now. We used it a long time ago.

I rate my experience with Spark SQL as an eight out of ten. I have been working in this field for eight years.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Last updated: Jan 21, 2026
Flag as inappropriate
reviewer2043696 - PeerSpot reviewer
Senior Technical Engineer at a transportation company with 5,001-10,000 employees
Real User
Top 10
Dec 26, 2025
Data pipelines have simplified complex transformations and provide faster insights for users
Pros and Cons
  • "We are using Amazon EMR to clean the data and transform the data in such a way that the end-user can get the insights faster."
  • "I feel some lack of functionality in Amazon EMR."

What is our primary use case?

I use Amazon EMR primarily for data processing.

What is most valuable?

The features at Amazon EMR that I have found most valuable are fully customizable functions.

I am using Amazon EMR for data transformation, where we load the data. The entire ETL process is encompassed within it. The first thing is the egress part, where we pull the data from various sources, which include our main core databases, the systems that include Oracle, MySQL, SQL Server, and Postgres. We bring data from all these sources and use Amazon EMR specifically to combine all this data, clean the data, and combine it at a particular level. We bring everything to a particular level to make it more production-ready, and we simplify the data because every system has its own problems. We are using it to clean the data and transform the data in such a way that the end-user can get the insights faster.

What needs improvement?

I feel some lack of functionality in Amazon EMR. I have thoughts on what would be great to see in the product, such as AI/ML features or additional options.

For how long have I used the solution?

I have been working with Amazon EMR for five years.

What do I think about the stability of the solution?

I would rate the stability of Amazon EMR as eight out of ten.

What do I think about the scalability of the solution?

Customizable cluster configurations have been utilized in my organization.

How are customer service and support?

I would rate the technical support from Amazon as ten out of ten. They are always there to help.

How would you rate customer service and support?

Positive

How was the initial setup?

The setup process for Amazon EMR may have some issues and could be more simplified for some basic users.

What's my experience with pricing, setup cost, and licensing?

I would rate the price for Amazon EMR, where one is high and ten is low, as a good one.

Which other solutions did I evaluate?

I could compare Amazon EMR with a product from another vendor, but I would need more context.

What other advice do I have?

I find it easy to integrate Amazon EMR with other AWS services like S3 or EC2 for data processing needs. I would rate this review as eight out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Disclosure: My company has a business relationship with this vendor other than being a customer. Partner
Last updated: Dec 26, 2025
Flag as inappropriate