2018-06-27T19:19:00Z
Miriam Tover - PeerSpot reviewer
Service Delivery Manager at PeerSpot (formerly IT Central Station)
  • 1
  • 28

What needs improvement with Apache Spark?

Please share with the community what you think needs improvement with Apache Spark.

What are its weaknesses? What would you like to see changed in a future version?

23
PeerSpot user
23 Answers
Ilya Afanasyev - PeerSpot reviewer
Senior Software Development Engineer at Yahoo!
Real User
Top 5Leaderboard
2022-08-03T04:09:48Z
Aug 3, 2022

The primary language for developers on Spark is Scala. Now it's also about Java. I prefer Java versus Scala, and since they are supported, it is good. I know there is always discussion about which language to write applications in, and some people do love Scala. However, I don't like it. They use currently have a JDK version which is a little bit old. Not all features are on it. Maybe they should pull support of the JDK version.

Search for a product comparison
SK
Chief Technology Officer at a tech services company with 11-50 employees
Real User
2022-07-04T15:18:53Z
Jul 4, 2022

Apache Spark can improve the use case scenarios from the website. There is not any information on how you can use the solution across the relational databases toward multiple databases.

AmitMataghare - PeerSpot reviewer
Associate Director at PwC
Real User
Top 20
2022-04-27T08:19:23Z
Apr 27, 2022

Apache Spark could improve the connectors that it supports. There are a lot of open-source databases in the market. For example, cloud databases, such as Redshift, Snowflake, and Synapse. Apache Spark should have connectors present to connect to these databases. There are a lot of workarounds required to connect to those databases, but it should have inbuilt connectors.

Salvatore Campana - PeerSpot reviewer
CEO & Founder at XAUTOMATA TECHNOLOGY GmbH
Real User
Top 5
2022-04-27T08:19:19Z
Apr 27, 2022

An area for improvement is that when we start the solution and declare the maximum number of nodes, the process is shared, which is a problem in some cases. It would be useful to be able to change this parameter in real-time rather than having to stop the solution and restart with a higher number of nodes.

Onur Tokat - PeerSpot reviewer
Big Data Engineer Consultant at Collective[i]
Consultant
Top 20
2022-02-15T16:44:00Z
Feb 15, 2022

Spark could be improved by adding support for other open-source storage layers than Delta Lake. The UI could also be enhanced to give more data on resource management.

SS
Co-Founder at a tech vendor with 11-50 employees
Real User
Top 5
2021-12-28T09:52:00Z
Dec 28, 2021

Apache Spark is very difficult to use. It would require a data engineer. It is not available for every engineer today because they need to understand the different concepts of Spark, which is very, very difficult and it is not easy to learn.

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: November 2022.
655,465 professionals have used our research since 2012.
Oscar Estorach - PeerSpot reviewer
Chief Data-strategist and Director at theworkshop.es
Real User
Top 5Leaderboard
2021-08-18T14:51:07Z
Aug 18, 2021

If you are developing projects, and you need to not put them in a production scenario, you might need more than a cluster of servers, as it requires distributed computing. It's not easy to install. You are typically dealing with a big data system. It's not a simple, straightforward architecture.

GA
Senior Solutions Architect at a retailer with 10,001+ employees
Real User
2021-03-27T15:39:24Z
Mar 27, 2021

The logging for the observability platform could be better.

NitinKumar - PeerSpot reviewer
Director of Enginnering at Sigmoid
Real User
Top 5Leaderboard
2021-02-01T12:04:16Z
Feb 1, 2021

Its UI can be better. Maintaining the history server is a little cumbersome, and it should be improved. I had issues while looking at the historical tags, which sometimes created problems. You have to separately create a history server and run it. Such things can be made easier. Instead of separately installing the history server, it can be made a part of the whole setup so that whenever you set it up, it becomes available.

Kürşat Kurt - PeerSpot reviewer
Software Architect at Akbank
Real User
2020-10-28T02:27:29Z
Oct 28, 2020

Stream processing needs to be developed more in Spark. I have used Flink previously. Flink is better than Spark at stream processing.

RV
Director at Nihil Solutions
Real User
2020-07-23T07:58:35Z
Jul 23, 2020

There are lots of items coming down the pipeline in the future. I don't know what features are missing. From my point of view, everything looks good. The graphical user interface (UI) could be a bit more clear. It's very hard to figure out the execution logs and understand how long it takes to send everything. If an execution is lost, it's not so easy to understand why or where it went. I have to manually drill down on the data processes which takes a lot of time. Maybe there could be like a metrics monitor, or maybe the whole log analysis could be improved to make it easier to understand and navigate. There should be more information shared to the user. The solution already has all the information tracked in the cluster. It just needs to be accessible or searchable.

Gopi Krishnan - PeerSpot reviewer
User at Ideas2IT Technologies
Real User
2020-06-10T05:44:05Z
Jun 10, 2020

There is still enough space of improvement on Apache Spark in term of integration and improving speed. Apache spark community can use Rust, C++ implementation to improve performance.

KamleshKhollam - PeerSpot reviewer
Managing Consultant at a computer software company with 501-1,000 employees
Real User
Top 20
2020-02-02T10:42:14Z
Feb 2, 2020

I would like to see integration with data science platforms to optimize the processing capability for these tasks.

it_user1223676 - PeerSpot reviewer
Lead Consultant at a tech services company with 51-200 employees
Consultant
2020-01-29T11:22:00Z
Jan 29, 2020

We use big data manager but we cannot use it as conditional data so whenever we're trying to fetch the data, it takes a bit of time. There is some latency in the system and latency in the data caching. The main issue is that we need to design it in a way that data will be available to us very quickly. It takes a long time and the latest data should be available to us much quicked.

SS
Co-Founder at a tech vendor with 11-50 employees
Real User
Top 5
2020-01-29T11:22:00Z
Jan 29, 2020

We've had problems using a Python process to try to access something in a large volume of data. It crashes if somebody gives me the wrong code because it cannot handle a large volume of data.

SA
Technical Consultant at a tech services company with 1-10 employees
Consultant
2019-12-23T07:05:00Z
Dec 23, 2019

I think for IT people it is good. The whole idea is that Spark works pretty easily, but a lot of people, including me, struggle to set things up properly. I like contributions and if you want to connect Spark with Hadoop its not a big thing, but other things, such as if you want to use Sqoop with Spark, you need to do the configuration by hand. I wish there would be a solution that does all these configurations like in Windows where you have the whole solution and it does the back-end. So I think that kind of solution would help. But still, it can do everything for a data scientist. Spark's main objective is to manipulate and calculate. It is playing with the data. So it has to keep doing what it does best and let the visualization tool do what it does best. Overall, it offers everything that I can imagine right now.

Mohamed Ghorbel - PeerSpot reviewer
Director of BigData Offer at IVIDATA
Real User
2019-12-09T10:58:00Z
Dec 9, 2019

The solution needs to optimize shuffling between workers.

AD
Senior Consultant & Training at a tech services company with 51-200 employees
Consultant
2019-10-13T05:48:00Z
Oct 13, 2019

When you first start using this solution, it is common to run into memory errors when you are dealing with large amounts of data. Once you are experienced, it is easier and more stable. When you are trying to do something outside of the normal requirements in a typical project, it is difficult to find somebody with experience.

LC
Snr Security Engineer at Securonix Solutions
Real User
2019-07-14T10:21:00Z
Jul 14, 2019

The management tools could use improvement. Some of the debugging tools need some work as well. They need to be more descriptive.

it_user946074 - PeerSpot reviewer
Principal Architect at a financial services firm with 1,001-5,000 employees
Real User
2019-07-10T12:01:00Z
Jul 10, 2019

The search could be improved. Usually, we are using other tools to search for specific stuff. We'll be using it how I use other tools - to get the details, but if there any way to search for little things that will be better. It needs a new interface and a better way to get some data. In terms of writing our scripts, some processes could be faster. In the next release, if they can add more analytics, that would be useful. For example, for data, built data, if there was one port where you put the high one then you can pull any other close to you, and then maybe a log for the right script.

it_user1059558 - PeerSpot reviewer
Portfolio Manager, Enterprise Solutions Architect at Capgemini
Real User
2019-04-08T13:04:00Z
Apr 8, 2019

Better data lineage support.

SP
Director - Data Management, Governance and Quality at Hilton
Real User
2019-03-17T03:12:00Z
Mar 17, 2019

It is like going back to the '80s for the complicated coding that is required to write efficient programs.

2018-06-27T19:19:00Z
Jun 27, 2018

I would suggest for it to support more programming languages, and also provide an internal scheduler to schedule spark jobs with monitoring capability.

Related Questions
it_user1272297 - PeerSpot reviewer
Special Adviser Strategy at a university with 501-1,000 employees
Apr 19, 2020
I currently am working as a Special Strategic Adviser. I am involved in strategic risk management analysis and mitigation actions. We are currently evaluating SQream Technologies SQream DB. Does anybody have experience with them and can attest to them being the best RDBMS vendor for big data of 30TB+? Are there any other RDBMS solutions for big data that I should be evaluating? Thanks! I ap...
2 out of 4 answers
Russell Rothstein - PeerSpot reviewer
CEO at PeerSpot (formerly IT Central Station)
Jan 27, 2020
Morten, the most popular comparisons of SQream can be found here: https://www.itcentralstation.com/products/sqream-db-alternatives-and-competitors The top ones include Cassandra, MemSQL, MongoDB, and Vertica.
CD
Data Architect at a tech services company with 201-500 employees
Jan 27, 2020
I haven't used SQream personally. However, if you are only considering GPU based rdbms's please check the following https://hackernoon.com/which-gpu-database-is-right-for-me-6ceef6a17505
Miriam Tover - PeerSpot reviewer
Service Delivery Manager at PeerSpot (formerly IT Central Station)
Aug 3, 2022
Hi, We all know it's really hard to get good pricing and cost information. Please share what you can so you can help your peers.
2 out of 10 answers
SA
Technical Consultant at a tech services company with 1-10 employees
Dec 23, 2019
I would suggest not to try to do everything at once. Identify the area where you want to solve the problem, start small and expand it incrementally, slowly expand your vision. For example, if I have a problem where I need to do streaming, just focus on the streaming and not on the machine learning that Spark offers. It offers a lot of things but you need to focus on one thing so that you can learn. That is what I have learned from the little experience I have with Spark. You need to focus on your objective and let the tools help you rather than the tools drive the work. That is my advice.
KamleshKhollam - PeerSpot reviewer
Managing Consultant at a computer software company with 501-1,000 employees
Feb 2, 2020
The initial setup is straightforward. It took us around one week to set it up, and then the requirements and creation of the project flow and design needed to be done. The design stage took three to four weeks, so in total, it required between four and five weeks to set up.
Related Articles
Netanya Carmi - PeerSpot reviewer
Content Manager at PeerSpot (formerly IT Central Station)
May 11, 2022
PeerSpot’s crowdsourced user review platform helps technology decision-makers around the world to better connect with peers and other independent experts who provide advice without vendor bias. Our users have ranked these solutions according to their valuable features, and discuss which features they like most and why. You can read user reviews for the Top 5 Compute Service Tools to help you ...
Related Articles
Netanya Carmi - PeerSpot reviewer
Content Manager at PeerSpot (formerly IT Central Station)
May 11, 2022
Top 5 Compute Service Solutions 2022
PeerSpot’s crowdsourced user review platform helps technology decision-makers around the world to...
Download Free Report
Download our free Apache Spark Report and get advice and tips from experienced pros sharing their opinions. Updated: November 2022.
DOWNLOAD NOW
655,465 professionals have used our research since 2012.