


Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop.
| Product | Market Share (%) |
|---|---|
| Apache Spark | 13.9% |
| Cloudera Distribution for Hadoop | 15.1% |
| Outerthought Lily | 2.4% |
| Other | 68.6% |


| Company Size | Count |
|---|---|
| Small Business | 28 |
| Midsize Enterprise | 15 |
| Large Enterprise | 32 |
| Company Size | Count |
|---|---|
| Small Business | 16 |
| Midsize Enterprise | 9 |
| Large Enterprise | 31 |
Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory