Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop.
Product | Market Share (%) |
---|---|
Apache Spark | 19.3% |
Cloudera Distribution for Hadoop | 22.1% |
Outerthought Lily | 2.1% |
Other | 56.49999999999999% |
Company Size | Count |
---|---|
Small Business | 27 |
Midsize Enterprise | 15 |
Large Enterprise | 32 |
Company Size | Count |
---|---|
Small Business | 16 |
Midsize Enterprise | 9 |
Large Enterprise | 31 |
Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory