Apache Spark Valuable Features
We use batch processing. It works well with our formats and file versions. There's a lot of functionality.
In our pipeline each hour, we make a copy of data from MongoDB, of the changes from MongoDB to some specific file. Each time pipeline copied all of the data, it would do it each time without changes to all of the tables. Tables have a lot of data, and in the last MongoDB version, there is a possibility to read only changed data. This reduced the cost and configuration of the cluster, and we saved about $150,000.
The solution is scalable.
It's a stable product.
View full review »Spark supports real-time data processing through Spark Streaming. It allows for batch processing of data. If you have immediate data, like chat information, that needs to be processed in real-time, Spark Streaming is used.
For data that can be evaluated later, batch processing with Apache Spark is suitable. Mostly, batch processing is utilized in our organization, but for streaming data processing, tools like Kafka are often integrated.
In-memory processing in Spark greatly enhances performance, making it a hundred times faster than the previous MapReduce methods. This improvement is achieved through optimization techniques like caching, broadcasting, and partitioning, which help in optimizing queries for faster processing.
View full review »SS
Sachin Shukre
Sr Manager at a transportation company with 10,001+ employees
There is no other platform that can challenge its features. Apart from the restrictions that come with its in-memory implementation.
View full review »Buyer's Guide
Apache Spark
April 2024
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
767,319 professionals have used our research since 2012.
The product’s most valuable features are lazy evaluation and workload distribution.
View full review »One of the reasons we use Spark is so we can use parallelism in data lakes. So in our case, we can get many data nodes, and the main power of Hadoop and big data solutions is the number of nodes usable for different operations. It's easy to prepare parallelism in Spark, run the solution with specific parameters, and get good performance. Also, Spark has an option for near real-time loading and processing. We use micro batches of Spark.
View full review »It is highly scalable, allowing you to efficiently work with extensive datasets that might be problematic to handle using traditional tools that are memory-constrained.
View full review »Overall, my company likes the product since it is a good tool.
VM
Vineeth Marar
Cloud solution architect at 0
What I liked about the solution was its uniqueness. We provided the customer with a solution that hadn't been offered by anyone else before.
It involved multiple components, such as Spark cluster, CMAX, a backend VM, and a Linux VM for mapping the service processes to the backend, which is running on-premises where the Kafka service was running.
It was challenging for people to understand how to send traffic through the private link between all these services. Ensuring the traffic was sent to the correct destination with the correct source header without any operation issues was complex, but we achieved it.
We had multiple instances of fault tolerance and scalability.
View full review »One of Apache Spark's most valuable features is that it supports in-memory processing, the execution of jobs compared to traditional tools is very fast.
View full review »The most valuable feature of Apache Spark is its memory processing because it processes data over RAM rather than disk, which is much more efficient and fast.
View full review »The product’s most valuable feature is the SQL tool. It enables us to create a database and publish it. It is a useful feature for us.
View full review »The tool's most valuable feature is its speed and efficiency. It's much faster than other tools and excels in parallel data processing. Unlike tools like Python or JavaScript, which may struggle with parallel processing, it allows us to handle large volumes of data with more power easily.
View full review »ML
reviewer1759647
Information Technology Business Analyst at a aerospace/defense firm with 10,001+ employees
We use it as an ETL tool to gather information from different systems. The product is useful for analytics.
View full review »Overall, it's a very nice tool.
It is great for transforming data and doing micro-streamings or micro-batching.
The product offers an open-source version.
The solution has been very stable.
The scalability is good.
Apache Spark is a huge tool. It has many use cases and is very flexible. You can use it with so many other platforms.
Spark, as a tool, is easy to work with as you can work with Python, Scala, and Java.
View full review »Apache provides a lot of good documentation compared to other solutions.
View full review »KK
Kürşat Kurt
Software Architect at Akbank
AI libraries are the most valuable. They provide extensibility and usability. Spark has a lot of connectors, which is a very important and useful feature for AI. You need to connect a lot of points for AI, and you have to get data from those systems. Connectors are very wide in Spark. With a Spark cluster, you can get fast results, especially for AI.
We use Spark to process data from different data sources.
View full review »SB
SlavenBatnozic
CTO at Hammerknife
Apache Spark provides a very high-quality implementation of distributed data processing. I rate it 20 on a scale of one to ten.
View full review »MA
Marco Amhof
PLC Programmer at Alzero
The solution, as a package, excels across the board. I appreciate everything, not just one or two specific features.
The most valuable feature of Apache Spark is its flexibility.
View full review »FK
Farzam Khodaei
Data Engineer at Berief Food GmbH
The data processing framework is good. The product is very useful.
View full review »JK
reviewer2208003
Quantitative Developer at a marketing services firm with 11-50 employees
The distribution of tasks, like the seamless map-reduce functionality, is quite impressive. For the user, it appears as simple single-line data manipulations, but behind the scenes, the executor pool intelligently distributes the map and reduce functions.
View full review »This solution provides a clear and convenient syntax for our analytical tasks.
View full review »Apache Spark can do large volume interactive data analysis.
View full review »We use all the features. We use it for end-to-end. All of our data analysis and execution happens through Spark.
The features we find most valuable are the:
- Machine learning
- Data learning
- Spark Analytics.
NB
reviewer1283880
CEO International Business at a tech services company with 1,001-5,000 employees
The most crucial feature for us is the streaming capability. It serves as a fundamental aspect that allows us to exert control over our operations.
View full review »SP
Sumanth Punyamurthula
Director - Data Management, Governance and Quality at Hilton Worldwide
Powerful language.
View full review »The most important feature of Apache Spark is that it provides large scale data processing with negligible latency at the cost of commodity hardwares. Spark framework is just a blessings over Hadoop, as the later does not allow fast processing of data, which is accomplished by the in-memory data processing of Spark.
View full review »The most valuable feature is the grid computing.
View full review »With spark SQL we've now the capabilities to analyse very large quantities of data located in S3 on Amazon at very low cost comparing other solution we checked.
We also use our own Spark cluster to aggregate data on near real time and save the result on MySQL database.
We've started new projects using the machine learning library ML.
View full review »The most valuable feature is that Spark uses Scala, which has good data evaluation functions. Spark also supports good distribution on the clusters and provides optimization on the APIs.
View full review »RV
Rajendran Veerappan
Director at Nihil Solutions
The memory processing engine is the solution's most valuable aspect. It processes everything extremely fast, and it's in the cluster itself. It acts as a memory engine and is very effective in processing data correctly.
View full review »SA
reviewer879201
Technical Consultant at a tech services company with 1-10 employees
I have worked with Hadoop a lot in my career and you need to do a lot of things to get it to Hello World. But in Spark it is easy. You could say it's an umbrella to do everything under the one shelf. It also has Spark Streaming. I feel the streaming is its best feature because I have extracted to enter data and analysis within Spark Stream.
View full review »Distributed in memory processing. Some of the algorithms are resource heavy and executing this requires a lot of RAM and CPU. With Hadoop-related technologies, we can distribute the workload with multiple commodity hardware.
View full review »NK
NitinKumar
Director of Enginnering at Sigmoid
Its scalability and speed are very valuable. You can scale it a lot. It is a great technology for big data. It is definitely better than a lot of earlier warehouse or pipeline solutions, such as Informatica.
Spark SQL is very compliant with normal SQL that we have been using over the years. This makes it easy to code in Spark. It is just like using normal SQL. You can use the APIs of Spark or you can directly write SQL code and run it. This is something that I feel is useful in Spark.
View full review »PE
reviewer1792824
Senior Test Automation Consultant / Architect at a tech services company with 11-50 employees
It is useful for handling large amounts of data. It is very useful for scientific purposes.
View full review »GA
reviewer1535340
Senior Solutions Architect at a retailer with 10,001+ employees
I like that it can handle multiple tasks parallelly. I also like the automation feature. JavaScript also helps with the parallel streaming of the library.
The fast performance is the most valuable aspect of the solution.
View full review »AR
reviewer1185906
Manager - Data Science Competency at a tech services company with 201-500 employees
One of the key features is that Apache Spark is a distributed computing framework. You can have multiple slaves and distribute the workload between them.
Another feature is memory-based computing. This is unlike Hadoop, which relies on storage. As it uses in-memory data processing, Spark is very fast.
View full review »The good performance. The nice graphical management console. The long list of ML algorithms.
View full review »Streaming data processing
View full review »DataFrame: Spark SQL gives the leverage to create applications more easily and with less coding effort.
Spark is relatively easy to deploy, with rich features in handling big data. Spark Core, Spark SQL, Spark MLlib are used mostly in our applications.
View full review »There are several valuable features.
- Interactive data access (low latency)
- Batch ETL-style processing
- Schema-free data models
- Algorithms
AD
reviewer1046250
Senior Consultant & Training at a tech services company with 51-200 employees
The most valuable feature of this solution is its capacity for processing large amounts of data.
This solution makes it easy to do a lot of things. It's easy to read data, process it, save it, etc.
View full review »ETL and streaming capabilities.
View full review »Spark Streaming, which allows you to construct event-driven information systems and respond to the events in near-real time.
View full review »SK
reviewer1904019
Chief Technology Officer at a tech services company with 11-50 employees
The most valuable feature of Apache Spark is its ease of use.
View full review »It supports streaming and micro-batch.
View full review »- RDDs
- DataFrames
- Machine learning libraries
KK
KamleshKhollam
Managing Consultant at a computer software company with 501-1,000 employees
The most valuable features are the storage engine, the memory engine, and the processing engine.
View full review »MG
Mohamed Ghorbel
Director of BigData Offer at IVIDATA
It is a very fast solution. It's very easy to use. There are many RPis with many languages like Scala, Java, R, and Python. The greatest advantage of Spark is that we can initiate many kinds of analytics including SQL analytics, graphics analytics, etc.
View full review »Machine learning, real time streaming, and data processing are fantastic, as well as the resilient or fault tolerant feature.
View full review »The most valuable feature is the Fault Tolerance and easy binding with other processes like Machine Learning, graph analytics. The community is growing and hence executing ML in a distributed fashion is quite good.
View full review »It allows the loading and investigation of very lard data sets, has MLlib for machine learning, Spark streaming, and both the new and old dataframe API.
View full review »\Spark Streaming, Spark SQL and MLib in that order.
View full review »LC
Snrsecengin567
Snr Security Engineer at a tech vendor with 201-500 employees
The scalability has been the most valuable aspect of the solution.
View full review »The main feature that we find valuable is that it is very fast. In terms of big data, the main feature is that the data is in so many different nodes. It goes through many data nodes so whenever we use the data, it enables us to parse the data from different data nodes.
View full review »Buyer's Guide
Apache Spark
April 2024
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
767,319 professionals have used our research since 2012.