Apache Spark Room for Improvement
Stream processing needs to be developed more in Spark. I have used Flink previously. Flink is better than Spark at stream processing.View full review »
Its UI can be better. Maintaining the history server is a little cumbersome, and it should be improved. I had issues while looking at the historical tags, which sometimes created problems. You have to separately create a history server and run it. Such things can be made easier. Instead of separately installing the history server, it can be made a part of the whole setup so that whenever you set it up, it becomes available.View full review »
Manager - Data Science Competency at a tech services company with 201-500 employees
When you are working with large, complex tasks, the garbage collection process is slow and affects performance. This is an area where they need to improve because your job may fail if it is stuck for a long time while memory garbage collection is happening. This is the main problem that we have.View full review »
If you are developing projects, and you need to not put them in a production scenario, you might need more than a cluster of servers, as it requires distributed computing.
It's not easy to install. You are typically dealing with a big data system.
It's not a simple, straightforward architecture.View full review »
Co-Founder at a tech vendor with 11-50 employees
Apache Spark is very difficult to use. It would require a data engineer. It is not available for every engineer today because they need to understand the different concepts of Spark, which is very, very difficult and it is not easy to learn.View full review »
Apache Spark could improve the connectors that it supports. There are a lot of open-source databases in the market. For example, cloud databases, such as Redshift, Snowflake, and Synapse. Apache Spark should have connectors present to connect to these databases. There are a lot of workarounds required to connect to those databases, but it should have inbuilt connectors.View full review »
Senior Test Automation Consultant / Architect at a tech services company with 11-50 employees
There are some difficulties that we are working on. It is useful for scientific purposes, but for commercial use of big data, it gives some trouble.
They should improve the stability of the product. We use Spark Executors and Spark Drivers to link to our own environment, and they are not the most stable products. Its scalability is also an issue.
We are building our own queries on Spark, and it can be improved in terms of query handling.View full review »
Spark could be improved by adding support for other open-source storage layers than Delta Lake. The UI could also be enhanced to give more data on resource management.View full review »
An area for improvement is that when we start the solution and declare the maximum number of nodes, the process is shared, which is a problem in some cases. It would be useful to be able to change this parameter in real-time rather than having to stop the solution and restart with a higher number of nodes.View full review »
Senior Solutions Architect at a retailer with 10,001+ employees
The logging for the observability platform could be better.View full review »
Chief Technology Officer at a tech services company with 11-50 employees
Apache Spark can improve the use case scenarios from the website. There is not any information on how you can use the solution across the relational databases toward multiple databases.View full review »