No more typing reviews! Try our Samantha, our new voice AI agent.

Apache Spark Valuable Features

Devindra Weerasooriya - PeerSpot reviewer

Devindra Weerasooriya

Data Architect at Devtech

The in-memory computation feature is certainly helpful for my processing tasks.

It is helpful because while using structures that could be held in memory rather than stored during the period of computation, I go for the in-memory option, though there are limitations related to holding it in memory that need to be addressed, but I have a preference for in-memory computation.

The solution is beneficial in that it provides a base-level long-held understanding of the framework that is not variant day by day, which is very helpful in my prototyping activity as an architect trying to assess Apache Spark, Great Expectations, and Vault-based solutions versus those proposed by clients like TIBCO or Informatica.

View full review »

ML

Michael Lierheimer

Consultant, Chief Engineer, Teamleiter at infoteam Software AG

The best features in Apache Spark that I appreciate are the fast database access, the data transformation, and the data exchange.

I see that very good integration with other platforms, including interfaces that can connect to other vendors and technologies, and integration of the MCP protocol of one of AA systems, would be an interesting direction for me personally and as a company to integrate the technology in our customer projects. The most important part is that everything can be connected, and the data exchange across overseas connections is fast and reliable.

View full review »

Omar Khaled - PeerSpot reviewer

Omar Khaled

Data Engineer at a tech company with 10,001+ employees

I can improve the organization's functions by taking less time to make decisions. To make the right decision, you need the right data, and a solution can provide this by hiring talent and employees who can consolidate data from different sources and organize it. Not all solutions can make this data fast enough to be used, except for solutions such as Apache Spark Structured Streaming. To make the right decision, you should have both accurate and fast data.

Apache Spark itself is similar to the Python programming language. Python is a language with many libraries for mathematics and machine learning. Apache Spark is the solution, and within it, you have PySpark, which is the API for Apache Spark to write and run Python code. Within it, there are many APIs, including SQL APIs, allowing you to write SQL code within a Python function in Apache Spark. You can also use Apache Spark Structured Streaming and machine learning APIs.

View full review »

Free Report: Apache Spark Reviews and More

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: June 2026.

902,270 professionals have used our research since 2012.

Dunstan Matekenya - PeerSpot reviewer

Dunstan Matekenya

Data Scientist at a financial services firm with 10,001+ employees

Apache Spark is known for its ease of use. Compared to other available data processing frameworks, it is user-friendly. While many choices now exist, Spark remains easy to use, particularly with Python. You can utilize familiar programming styles similar to Pandas in Python, including object-oriented programming.

Another advantage is its portability. I can prototype and perform some initial tasks on my laptop using Spark without needing to be on Databricks or any cloud platform. I can transfer it to Databricks or other platforms, such as AWS. This flexibility allows me to improve processing even on my laptop. For instance, if I'm processing large amounts of data and find my laptop becoming slow, I can quickly switch to Spark. It handles small and large datasets efficiently, making it a versatile tool for various data processing needs.

View full review »

reviewer2534727 - PeerSpot reviewer

reviewer2534727

Manager Data Analytics at a outsourcing company with 5,001-10,000 employees

I like Apache Spark's flexibility the most. Before, we had one server that would choke up. With the solution, we can easily add more nodes when needed. The machine learning models are also really helpful. We use them to predict energy theft and find infrastructure problems.

The tool's real-time processing has had a big impact. We used to get data from sensors after a month. We get it in less than 10 minutes, which helps us take quick action.

We use Apache Spark to map our data pipelines using MapReduce technology. We're also working on integrating tools like Hive with Apache Spark to distribute our data processing. We can also integrate other tools like Apache Kafka and Hadoop.

We faced some challenges when integrating the solution into our existing system, but good documentation helped solve them.

View full review »

KamleshPant - PeerSpot reviewer

KamleshPant

Senior Software Architect at USEReady

Apache Spark's ability to handle both batch and streaming data is the most valuable feature for me. It is beneficial for consuming real-time data. It offers solid real-time processing capability, making it more efficient in managing data analytics. It is beneficial as it allows processing of both batch and streaming data seamlessly. View full review »

Bharghava Raghavendra Beesa - PeerSpot reviewer

Bharghava Raghavendra Beesa

Senior Developer at Infosys

Spark is faster and distributed. Previously, everything relied on MapReduce, which was slower. With Spark, multiple computations and transformations hold in memory for faster processing.

Real-time communication is possible, connecting with platforms like Kafka for real-time data import and compute. We implemented Spark and NiFi for integration. Spark replaced other costly products, reducing costs by thirty-eight percent.

View full review »

SS

Sachin Shukre

Sr Manager at a transportation company with 10,001+ employees

There is no other platform that can challenge its features. Apart from the restrictions that come with its in-memory implementation.

View full review »

AM

Aleksandr Motuzov

Head of Data Science center of excellence at Ameriabank CJSC

The most significant advantage of Spark 3.0 is its support for DataFrame UDF Pandas UDF features. This allows running Pandas code distributed by using the Spark engine, which is a crucial feature. The integration with Pandas syntax in distributed mode, along with the user-defined functions in PySpark, is particularly valuable.

View full review »

SurjitChoudhury - PeerSpot reviewer

SurjitChoudhury

Data engineer at Cocos pt

Spark supports real-time data processing through Spark Streaming. It allows for batch processing of data. If you have immediate data, like chat information, that needs to be processed in real-time, Spark Streaming is used.

For data that can be evaluated later, batch processing with Apache Spark is suitable. Mostly, batch processing is utilized in our organization, but for streaming data processing, tools like Kafka are often integrated.

In-memory processing in Spark greatly enhances performance, making it a hundred times faster than the previous MapReduce methods. This improvement is achieved through optimization techniques like caching, broadcasting, and partitioning, which help in optimizing queries for faster processing.

View full review »

Miodrag Milojevic - PeerSpot reviewer

Miodrag Milojevic

Senior Data Archirect at Yettel

One of the reasons we use Spark is so we can use parallelism in data lakes. So in our case, we can get many data nodes, and the main power of Hadoop and big data solutions is the number of nodes usable for different operations. It's easy to prepare parallelism in Spark, run the solution with specific parameters, and get good performance. Also, Spark has an option for near real-time loading and processing. We use micro batches of Spark.

View full review »

Ilya Afanasyev - PeerSpot reviewer

Ilya Afanasyev

Senior Software Development Engineer at Yahoo!

We use batch processing. It works well with our formats and file versions. There's a lot of functionality.

In our pipeline each hour, we make a copy of data from MongoDB, of the changes from MongoDB to some specific file. Each time pipeline copied all of the data, it would do it each time without changes to all of the tables. Tables have a lot of data, and in the last MongoDB version, there is a possibility to read only changed data. This reduced the cost and configuration of the cluster, and we saved about $150,000.

The solution is scalable.

It's a stable product.

View full review »

Anshuman Kishore - PeerSpot reviewer

Anshuman Kishore

Director Product Development at Mycom Osi

Overall, my company likes the product since it is a good tool.

View full review »

UjjwalGupta - PeerSpot reviewer

UjjwalGupta

Module Lead at Mphasis

The tool's most valuable feature is its speed and efficiency. It's much faster than other tools and excels in parallel data processing. Unlike tools like Python or JavaScript, which may struggle with parallel processing, it allows us to handle large volumes of data with more power easily.

View full review »

VM

Vineeth Marar

Cloud solution architect at 0

What I liked about the solution was its uniqueness. We provided the customer with a solution that hadn't been offered by anyone else before.

It involved multiple components, such as Spark cluster, CMAX, a backend VM, and a Linux VM for mapping the service processes to the backend, which is running on-premises where the Kafka service was running.

It was challenging for people to understand how to send traffic through the private link between all these services. Ensuring the traffic was sent to the correct destination with the correct source header without any operation issues was complex, but we achieved it.

We had multiple instances of fault tolerance and scalability.

View full review »

Atif Tariq - PeerSpot reviewer

Atif Tariq

Cloud and Big Data Engineer | Developer at Huawei

The most valuable feature of Apache Spark is its memory processing because it processes data over RAM rather than disk, which is much more efficient and fast.

View full review »

Suriya Senthilkumar - PeerSpot reviewer

Suriya Senthilkumar

Analyst at Deloitte

The product’s most valuable features are lazy evaluation and workload distribution.

View full review »

Lucas Dreyer - PeerSpot reviewer

Lucas Dreyer

Data Engineer at BBD

It is highly scalable, allowing you to efficiently work with extensive datasets that might be problematic to handle using traditional tools that are memory-constrained.

View full review »

reviewer1759647 - PeerSpot reviewer

reviewer1759647

Information Technology Business Analyst at a aerospace/defense firm with 10,001+ employees

We use it as an ETL tool to gather information from different systems. The product is useful for analytics.

View full review »

SB

SlavenBatnozic

CTO at Hammerknife

Apache Spark provides a very high-quality implementation of distributed data processing. I rate it 20 on a scale of one to ten.

View full review »

MA

Marco Amhof

PLC Programmer at Alzero

The solution, as a package, excels across the board. I appreciate everything, not just one or two specific features.

View full review »

Lokesh Jayanna - PeerSpot reviewer

Lokesh Jayanna

Vice President at Goldman Sachs at a computer software company with 10,001+ employees

The product’s most valuable feature is the SQL tool. It enables us to create a database and publish it. It is a useful feature for us.

View full review »

Jagannadha Rao - PeerSpot reviewer

Jagannadha Rao

Lead Data Scientist at International School of Engineering

The most valuable feature of Apache Spark is its flexibility.

View full review »

FK

Farzam Khodaei

Data Engineer at Berief Food GmbH

The data processing framework is good. The product is very useful.

View full review »

reviewer2208003 - PeerSpot reviewer

reviewer2208003

Quantitative Developer at a marketing services firm with 11-50 employees

The distribution of tasks, like the seamless map-reduce functionality, is quite impressive. For the user, it appears as simple single-line data manipulations, but behind the scenes, the executor pool intelligently distributes the map and reduce functions.

View full review »

Armando Becerril - PeerSpot reviewer

Armando Becerril

Partner / Head of Data & Analytics at Intelligence Software Consulting

Apache provides a lot of good documentation compared to other solutions.

View full review »

AmitMataghare - PeerSpot reviewer

AmitMataghare

Associate Director at a consultancy with 10,001+ employees

One of Apache Spark's most valuable features is that it supports in-memory processing, the execution of jobs compared to traditional tools is very fast.

View full review »

PE

Peter-Paul Eijkenboom

Senior Test Automation Specialist at APG

It is useful for handling large amounts of data. It is very useful for scientific purposes.

View full review »

Suresh_Srinivasan - PeerSpot reviewer

Suresh_Srinivasan

Co-Founder at FORMCEPT Technologies

Apache Spark can do large volume interactive data analysis.

View full review »

Oscar Estorach - PeerSpot reviewer

Oscar Estorach

Chief Data Strategist And Director at theworkshop.es

Overall, it's a very nice tool.

It is great for transforming data and doing micro-streamings or micro-batching.

The product offers an open-source version.

The solution has been very stable.

The scalability is good.

Apache Spark is a huge tool. It has many use cases and is very flexible. You can use it with so many other platforms.

Spark, as a tool, is easy to work with as you can work with Python, Scala, and Java.

View full review »

KK

Kürşat Kurt

Software Architect at Akbank

AI libraries are the most valuable. They provide extensibility and usability. Spark has a lot of connectors, which is a very important and useful feature for AI. You need to connect a lot of points for AI, and you have to get data from those systems. Connectors are very wide in Spark. With a Spark cluster, you can get fast results, especially for AI.

View full review »

Suresh_Srinivasan - PeerSpot reviewer

Suresh_Srinivasan

Co-Founder at FORMCEPT Technologies

We use Spark to process data from different data sources.

View full review »

reviewer1283880 - PeerSpot reviewer

reviewer1283880

CEO International Business at a tech services company with 1,001-5,000 employees

The most crucial feature for us is the streaming capability. It serves as a fundamental aspect that allows us to exert control over our operations.

View full review »

Salvatore Campana - PeerSpot reviewer

Salvatore Campana

CEO & Founder at Xautomata

The most valuable feature is the grid computing.

View full review »

Mahdi Sharifmousavi - PeerSpot reviewer

Mahdi Sharifmousavi

Lecturer at Amirkabir University of Technology

This solution provides a clear and convenient syntax for our analytical tasks.

View full review »

NK

NitinKumar

Director of Enginnering at Sigmoid

Its scalability and speed are very valuable. You can scale it a lot. It is a great technology for big data. It is definitely better than a lot of earlier warehouse or pipeline solutions, such as Informatica.

Spark SQL is very compliant with normal SQL that we have been using over the years. This makes it easy to code in Spark. It is just like using normal SQL. You can use the APIs of Spark or you can directly write SQL code and run it. This is something that I feel is useful in Spark.

View full review »

reviewer1185906 - PeerSpot reviewer

reviewer1185906

Manager - Data Science Competency at a tech services company with 201-500 employees

One of the key features is that Apache Spark is a distributed computing framework. You can have multiple slaves and distribute the workload between them.

Another feature is memory-based computing. This is unlike Hadoop, which relies on storage. As it uses in-memory data processing, Spark is very fast.

View full review »

Onur Tokat - PeerSpot reviewer

Onur Tokat

Big Data Engineer Consultant at Collective[i]

The most valuable feature is that Spark uses Scala, which has good data evaluation functions. Spark also supports good distribution on the clusters and provides optimization on the APIs.

View full review »

reviewer1535340 - PeerSpot reviewer

reviewer1535340

Senior Solutions Architect at a retailer with 10,001+ employees

I like that it can handle multiple tasks parallelly. I also like the automation feature. JavaScript also helps with the parallel streaming of the library.

View full review »

RV

Rajendran Veerappan

Director at Nihil Solutions

The memory processing engine is the solution's most valuable aspect. It processes everything extremely fast, and it's in the cluster itself. It acts as a memory engine and is very effective in processing data correctly.

View full review »

KK

KamleshKhollam

Managing Consultant at a computer software company with 501-1,000 employees

The most valuable features are the storage engine, the memory engine, and the processing engine.

View full review »

it_user1223676 - PeerSpot reviewer

it_user1223676

Lead Consultant at a tech services company with 51-200 employees

The main feature that we find valuable is that it is very fast. In terms of big data, the main feature is that the data is in so many different nodes. It goes through many data nodes so whenever we use the data, it enables us to parse the data from different data nodes.

View full review »

Suresh_Srinivasan - PeerSpot reviewer

Suresh_Srinivasan

Co-Founder at FORMCEPT Technologies

We use all the features. We use it for end-to-end. All of our data analysis and execution happens through Spark.

The features we find most valuable are the:

Machine learning
Data learning
Spark Analytics.

View full review »

reviewer879201 - PeerSpot reviewer

reviewer879201

Technical Consultant at a tech services company with 1-10 employees

I have worked with Hadoop a lot in my career and you need to do a lot of things to get it to Hello World. But in Spark it is easy. You could say it's an umbrella to do everything under the one shelf. It also has Spark Streaming. I feel the streaming is its best feature because I have extracted to enter data and analysis within Spark Stream.

View full review »

MG

Mohamed Ghorbel

Director of BigData Offer at IVIDATA

It is a very fast solution. It's very easy to use. There are many RPis with many languages like Scala, Java, R, and Python. The greatest advantage of Spark is that we can initiate many kinds of analytics including SQL analytics, graphics analytics, etc.

View full review »

reviewer1046250 - PeerSpot reviewer

reviewer1046250

Senior Consultant & Training at a tech services company with 51-200 employees

The most valuable feature of this solution is its capacity for processing large amounts of data.

This solution makes it easy to do a lot of things. It's easy to read data, process it, save it, etc.

View full review »

it_user946074 - PeerSpot reviewer

it_user946074

Principal Architect at a financial services firm with 1,001-5,000 employees

The fast performance is the most valuable aspect of the solution.

View full review »

LC

Snrsecengin567

Snr Security Engineer at a tech vendor with 201-500 employees

The scalability has been the most valuable aspect of the solution.

View full review »

it_user1059558 - PeerSpot reviewer

it_user1059558

Portfolio Manager, Enterprise Solutions Architect at Capgemini

It supports streaming and micro-batch.

View full review »

SP

Sumanth Punyamurthula

Director - Data Management, Governance and Quality at Hilton Worldwide

Powerful language.

View full review »

reviewer894894 - PeerSpot reviewer

reviewer894894

Solutions Architect at a computer software company with 51-200 employees

Machine learning, real time streaming, and data processing are fantastic, as well as the resilient or fault tolerant feature.

View full review »

it_user786777 - PeerSpot reviewer

it_user786777

Manager | Data Science Enthusiast | Management Consultant at a consultancy with 5,001-10,000 employees

Distributed in memory processing. Some of the algorithms are resource heavy and executing this requires a lot of RAM and CPU. With Hadoop-related technologies, we can distribute the workload with multiple commodity hardware.

View full review »

it_user746943 - PeerSpot reviewer

it_user746943

Big Data and Cloud Solution Consultant at a financial services firm with 10,001+ employees

DataFrame: Spark SQL gives the leverage to create applications more easily and with less coding effort.

View full review »

it_user746673 - PeerSpot reviewer

it_user746673

Sr. Software Engineer at a tech vendor with 1-10 employees

The most valuable feature is the Fault Tolerance and easy binding with other processes like Machine Learning, graph analytics. The community is growing and hence executing ML in a distributed fashion is quite good.

View full review »

it_user326142 - PeerSpot reviewer

it_user326142

Architect at a healthcare company with 51-200 employees

ETL and streaming capabilities.

View full review »

it_user372393 - PeerSpot reviewer

it_user372393

Big Data Consultant at a tech services company with 501-1,000 employees

The good performance. The nice graphical management console. The long list of ML algorithms.

View full review »

it_user371832 - PeerSpot reviewer

it_user371832

Chief System Architect at a marketing services firm with 501-1,000 employees

With spark SQL we've now the capabilities to analyse very large quantities of data located in S3 on Amazon at very low cost comparing other solution we checked.

We also use our own Spark cluster to aggregate data on near real time and save the result on MySQL database.

We've started new projects using the machine learning library ML.

View full review »

it_user365304 - PeerSpot reviewer

it_user365304

Software Consultant at a tech services company with 10,001+ employees

The most important feature of Apache Spark is that it provides large scale data processing with negligible latency at the cost of commodity hardwares. Spark framework is just a blessings over Hadoop, as the later does not allow fast processing of data, which is accomplished by the in-memory data processing of Spark.

View full review »

it_user374028 - PeerSpot reviewer

it_user374028

Core Engine Engineer at a computer software company with 51-200 employees

RDDs
DataFrames
Machine learning libraries

View full review »

it_user374040 - PeerSpot reviewer

it_user374040

Systems Engineering Lead, Mid-Atlantic at a tech company with 10,001+ employees

Spark Streaming, which allows you to construct event-driven information systems and respond to the events in near-real time.

View full review »

it_user373173 - PeerSpot reviewer

it_user373173

Lead Big Data Engineer at a non-profit with 51-200 employees

Spark is relatively easy to deploy, with rich features in handling big data. Spark Core, Spark SQL, Spark MLlib are used mostly in our applications.

View full review »

it_user74256 - PeerSpot reviewer

it_user74256

Engineer at a tech vendor with 10,001+ employees

Streaming data processing

View full review »

it_user371334 - PeerSpot reviewer

it_user371334

CEO at a tech consulting company with 51-200 employees

There are several valuable features.

Interactive data access (low latency)
Batch ETL-style processing
Schema-free data models
Algorithms

View full review »

it_user371325 - PeerSpot reviewer

it_user371325

Data Scientist at a tech vendor with 10,001+ employees

It allows the loading and investigation of very lard data sets, has MLlib for machine learning, Spark streaming, and both the new and old dataframe API.

View full review »

it_user365301 - PeerSpot reviewer

it_user365301

Software Developer (Product Engineering) at a computer software company with 501-1,000 employees

\Spark Streaming, Spark SQL and MLib in that order.

View full review »

reviewer1904019 - PeerSpot reviewer

reviewer1904019

Chief Technology Officer at a tech services company with 11-50 employees

The most valuable feature of Apache Spark is its ease of use.

View full review »

Free Report: Apache Spark Reviews and More

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: June 2026.

902,270 professionals have used our research since 2012.