


Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop.
| Product | Mindshare (%) |
|---|---|
| Apache Spark | 12.9% |
| Cloudera Distribution for Hadoop | 13.8% |
| Outerthought Lily | 3.4% |
| Other | 69.9% |

| Company Size | Count |
|---|---|
| Small Business | 28 |
| Midsize Enterprise | 16 |
| Large Enterprise | 32 |
| Company Size | Count |
|---|---|
| Small Business | 16 |
| Midsize Enterprise | 9 |
| Large Enterprise | 31 |
Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory