We changed our name from IT Central Station: Here's why

Apache Spark without Hadoop -- Is this recommended?

Hi community, 

I'm aware that we can use Apache Spark with/without Hadoop. 

But I am sure that the majority of people are using Apache Spark with Hadoop, and I read one article that states how using Apache Spark without Hadoop is not good for deployment, and can be usable for the development environment. 

Is that true? 

I'd greatly appreciate if anyone can elaborate on this.


ITCS user
33 Answers

author avatar
Top 5LeaderboardReal User

I don't think using Apache Spark without Hadoop has any major drawbacks or issues. I have used Apache Spark quite successfully with AWS S3 on many projects which are batch based. Yes for very high performance system HDFS is a better option. 

The main problem with Apache Spark with object storage like S3 has been the consistency problem of these object storage systems. You can read this post which will help you understand the issue and how to avoid it. Hope this helps you.


author avatar
Top 5Real User

I mean we can configure Spark without Hadoop as well like using WinUtils.exe . Is that recommended for Deployment ? Or would like to understand difference between Spark Hadoop Environment and Spark Without Hadoop?

author avatar

Can you elaborate on the information you've been told about how using Apache Spark without Hadoop isn't good for deployment?

This insight would help many of our users.

Find out what your peers are saying about IBM, Broadcom, Compuware and others in Software Configuration Management. Updated: January 2022.
564,997 professionals have used our research since 2012.