Try our new research platform with insights from 80,000+ expert users
CTO at Dokument IT d.o.o.
Real User
Aug 19, 2023
If implemented well, the solution is highly compatible and great for data analysis
Pros and Cons
  • "I find the Thrift connection valuable."
  • "I've experienced some incompatibilities when using the Delta Lake format."

What is our primary use case?

We used the solution for analytics of data and statistical reports from content management platforms.

What is most valuable?

I find the Thrift connection valuable.

What needs improvement?

I'm using DBeaver to connect Spark with external tools. I've experienced some incompatibilities when using the Delta Lake format. It is compatible when you're using Databricks on the cloud, but when I'm using Spark on-premise, there are some incompatibility issues. We expect interactive queries with Dremio to provide better results. We issue a query but see that it's a batch process in the background. The documentation is also limited, especially in the setup for Thrift servers.

For how long have I used the solution?

I have been using the solution for a year.

Buyer's Guide
Spark SQL
February 2026
Learn what your peers think about Spark SQL. Get advice and tips from experienced pros sharing their opinions. Updated: February 2026.
884,933 professionals have used our research since 2012.

What do I think about the stability of the solution?

I rate Spark SQL's stability between nine and ten out of ten because I didn't have any problems with it.

What do I think about the scalability of the solution?

Spark SQL's scalability is excellent. In production, there will only be a few users who are analysts using statistical reports. Queries have many joints, and one query, on average, has seven to 12 joints. There are no plans to increase usage because I'm working on creating markets. There are higher management staff analysts for the data platform, and we have plans to expand the business with data platforms.

How was the initial setup?

The setup process for Spark is not well-documented, but that's expected because the solution is open-source. You must sneak around various blocks, but this is usual for an open-source solution. You could hire guys from the Databricks center, and they can fix nearly anything.

When you learn all the tricks, you can deploy the solution very fast in one hour. But that applies just to the development environment. We are not in production right now. I tested it on Windows and tested it on Ubuntu, and everything works well. But you have to reinvent the wheel because documentation is incomplete.

The deployment process is based on bash scripts. I was considering making Ansible playbooks and custom roles in Ansible, but I didn't have the time, though this is the plan. I moved from the bash scripts on Ansible because I prefer a declarative approach in software engineering. I have plans to totally automate the deployment, where one experienced engineer would be enough. The solution's final deployment would be on the Kubernetes cluster, and the infrastructure would be set up with Terraform on Ansible. Everything will be heavily optimized.

What's my experience with pricing, setup cost, and licensing?

We don't have to pay for licenses with this solution because we are working in a small market, and we rely on open-source because the budgets of projects are very small.

What other advice do I have?

I recommend Spark SQL, but I will need to see what the results will be of our evaluation of Dremio. I'm especially expecting good performance because of the reflection mechanisms, which are actually materials used. But the open question is issues with the refresh rate. I don't know how bad or good that is.

I rate Spark SQL a ten out of ten with the correct implementation.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
reviewer1724670 - PeerSpot reviewer
Engineering Manager/Solution architect at a computer software company with 201-500 employees
Vendor
Dec 2, 2021
Useful tool within a distributed ecosystem
Pros and Cons
  • "This solution is useful to leverage within a distributed ecosystem."
  • "This solution could be improved by adding monitoring and integration for the EMR."

What is our primary use case?

The primary use case of this solution is to function within a distributed ecosystem. Spark is part of EMR, a Hadoop distribution, and is one of the tools in the ecosystem. You are not working with Hadoop in a vacuum—you leverage Spark, Hive, HBase—because it is just a distributed ecosystem. It has no value within itself. 

This solution can be deployed both on the cloud and on Cloudera distributions. 

What is most valuable?

This solution is useful to leverage within a distributed ecosystem. 

What needs improvement?

This solution could be improved by adding monitoring and integration for the EMR. 

For how long have I used the solution?

We have been working with Spark SQL for a few years. We are an outsourcing and consulting company, so it's not for our use—we mostly work with clients. 

What do I think about the stability of the solution?

This solution is stable. 

What do I think about the scalability of the solution?

This solution is scalable. 

How was the initial setup?

The installation is straightforward because it's a cloud-based solution. 

What about the implementation team?

We implement this solution for customers ourselves. 

What's my experience with pricing, setup cost, and licensing?

There is no license or subscription for this solution. 

What other advice do I have?

I rate this solution an eight out of ten and would recommend it to others. 

Disclosure: My company has a business relationship with this vendor other than being a customer. Partner
PeerSpot user
Buyer's Guide
Spark SQL
February 2026
Learn what your peers think about Spark SQL. Get advice and tips from experienced pros sharing their opinions. Updated: February 2026.
884,933 professionals have used our research since 2012.
reviewer1488372 - PeerSpot reviewer
Associate Manager at a consultancy with 501-1,000 employees
Real User
May 30, 2021
Easy to use, reliable, and useful data validation
Pros and Cons
  • "Data validation and ease of use are the most valuable features."
  • "There should be better integration with other solutions."

What is our primary use case?

I am using this solution for data validation and writing queries.

What is most valuable?

Data validation and ease of use are the most valuable features.

What needs improvement?

There should be better integration with other solutions.

For how long have I used the solution?

I have been using the solution for approximately two years.

What do I think about the stability of the solution?

The solution has been stable.

What do I think about the scalability of the solution?

I have found the solution to be scalable. We have 20 people using the solution in my organization and we plan to increase usage.

What's my experience with pricing, setup cost, and licensing?

The solution is open-sourced and free.

What other advice do I have?

I rate Spark SQL a ten out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
reviewer1427205 - PeerSpot reviewer
Corporate Sales at a financial services firm with 10,001+ employees
Real User
Sep 30, 2020
It is stable, but its partitioning feature isn't that easy to use
Pros and Cons
  • "It is a stable solution."
  • "We use it to gather all the transaction data."
  • "Being a new user, I am not able to find out how to partition it correctly. I probably need more information or knowledge. In other database solutions, you can easily optimize all partitions. I haven't found a quicker way to do that in Spark SQL. It would be good if you don't need a partition here, and the system automatically partitions in the best way. They can also provide more educational resources for new users."
  • "Being a new user, I am not able to find out how to partition it correctly."

What is our primary use case?

We use it to gather all the transaction data. We have Hadoop and Spark in our system, and we use some easy process flows for transport. 

What is most valuable?

It is a stable solution. 

What needs improvement?

Being a new user, I am not able to find out how to partition it correctly. I probably need more information or knowledge. In other database solutions, you can easily optimize all partitions. I haven't found a quicker way to do that in Spark SQL. It would be good if you don't need a partition here, and the system automatically partitions in the best way. They can also provide more educational resources for new users.

For how long have I used the solution?

I have been using this solution for two months.

What do I think about the scalability of the solution?

Its scalability is okay. We are a big organization. 

What other advice do I have?

Being a new user, I would rate Spark SQL a four out of ten. 

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Analytics and Reporting Manager at a financial services firm with 1,001-5,000 employees
Real User
Mar 23, 2020
GUI could be improved. Useful for speedily processing big data.
Pros and Cons
  • "The speed of getting data."
  • "The speed of getting data, as our TBs are big and it's a lot of data."
  • "Anything to improve the GUI would be helpful."
  • "The initial setup is a bit complex."

What is our primary use case?

We do have some use cases, like analysis and risk-based use cases, that we've provided and prepared for companies in order to evaluate, but not many. The business units have so many things that we don't know how to help formulate into another tool and utilize as a use case. They also have so many requirements and costs.

I work for a financial institution, so every solution that they need to consider has to be on-premise.

I'm actually just evaluating and up scaling my skill sets with this solution right now.

What is most valuable?

The speed of getting data, as our TBs are big and it's a lot of data. 

What needs improvement?

Anything to improve the GUI would be helpful.

We have experienced a lot of issues, but nothing in the production environment.

For how long have I used the solution?

For a couple of months. However, we have not implemented in a production environment yet.

What do I think about the stability of the solution?

The solution has not been implemented yet. When it is implemented into the real world and production, that is when I expect to see some challenges.

How are customer service and technical support?

We have worked with the Cloudera support for this solution. They are average.

Which solution did I use previously and why did I switch?

I have an experience with other database tools for the span of more than 10 years.

How was the initial setup?

The initial setup is a bit complex.

Which other solutions did I evaluate?

We are also planning to use Informatica since there is a way in which you can use Spark in Informatica. You can use the Spark within Informatica because there is an an option to tie in a big data addition.

What other advice do I have?

We will have a lot of big data, which is why we need it. Otherwise, the solution is not needed. The solution really depends on the size of your data, its complexity, and the analysis that you are doing. Spark is good, but it is not mandatory.

Since I don't have experience in production with the solution, the best I can rate it now is a five (out of 10). 

Which deployment model are you using for this solution?

On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer. Implementer
PeerSpot user
Data Analytics Practice head at bse
Real User
Feb 11, 2020
An excellent solution that continues to mature but needs graphing capabilities
Pros and Cons
  • "Overall the solution is excellent."
  • "This solution is a much more scalable and adventurous solution."
  • "The solution needs to include graphing capabilities. Including financial charts would help improve everything overall."
  • "The solution needs to include graphing capabilities. Including financial charts would help improve everything overall."

What is our primary use case?

We primarily use the solution as our data warehouse. We use it for data science.

What is most valuable?

Overall the solution is excellent.

The solution is continuing to evolve and mature over time.

What needs improvement?

The service is complex. This is due to the fact that it's a combination of a lot of technology.

The solution needs to include graphing capabilities. Including financial charts would help improve everything overall.

For how long have I used the solution?

I've been using the solution since 2013.

What do I think about the stability of the solution?

The solution is relatively stable. We haven't had issues with bugs or glitches.

What do I think about the scalability of the solution?

The solution is scalable. We've found it easy to expand as necessary.

Right now, we have about 42 users on the solution. They include IT and ETL staff as well as a business analyst.

How are customer service and technical support?

I've never been in touch with technical support. I can't speak to any experience our company has had with them.

Which solution did I use previously and why did I switch?

I've also worked with Apache SQL and SAP. This solution is a much more scalable and adventurous solution. It's also faster than the others. We used to use IQ, but at the time it couldn't scale well, so we switched to IBM Appliance. Then we switched to Spark. IBM was good, but it also had issues with scalability and it cost us a lot of money. 

How was the initial setup?

The initial setup is straightforward. We found it quite easy.

What about the implementation team?

We handled the implementation ourselves with our in-house team.

What's my experience with pricing, setup cost, and licensing?

The pricing of Apache is much more competitive than IBM.

What other advice do I have?

We use both the on-premises and cloud deployment models.

We have a relationship with Cloudera and use their distribution channels. We don't have a relationship with Apache.

Spark SQL is a good product. However, users need to have the capability of implementing the correct tools and efficiencies.

I'd rate the solution seven out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
it_user986637 - PeerSpot reviewer
Project Manager - Senior Software Engineer at a tech services company with 11-50 employees
Real User
Jul 17, 2019
A good stable and scalable solution for processing big data
Pros and Cons
  • "The stability was fine. It behaved as expected."
  • "The scalability of the solution is good."
  • "In the next release, maybe the visualization of some command-line features could be added."
  • "In the next release, maybe the visualization of some command-line features could be added."

What is our primary use case?

The primary use is to process big data. We were connecting into and we were applying sentiment analysis via hardware.

What needs improvement?

In the next release, maybe the visualization of some command-line features could be added.

For how long have I used the solution?

I've been using the solution for two to three weeks.

What do I think about the stability of the solution?

The stability was fine. It behaved as expected.

What do I think about the scalability of the solution?

The scalability of the solution is good.

How are customer service and technical support?

Technical support has been fine.

Which solution did I use previously and why did I switch?

We previously used Apache Hadoop.

How was the initial setup?

The initial setup was fine. If somebody knows what to expect it's okay.

What other advice do I have?

We've just started using this solution. We were using it until recently on a research basis, just to measure the performance, the cost, and so on and so forth. Many things could be improved, but are okay up till now, I'm happy with. I would recommend the product.

I would rate this solution eight out of ten.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Download our free Spark SQL Report and get advice and tips from experienced pros sharing their opinions.
Updated: February 2026
Product Categories
Hadoop
Buyer's Guide
Download our free Spark SQL Report and get advice and tips from experienced pros sharing their opinions.