Find out what your peers are saying about Cloudera, Apache, Amazon Web Services (AWS) and others in Hadoop.
Product | Market Share (%) |
---|---|
Cloudera Distribution for Hadoop | 21.9% |
Apache Spark | 19.0% |
Outerthought Lily | 2.3% |
Other | 56.800000000000004% |
Company Size | Count |
---|---|
Small Business | 27 |
Midsize Enterprise | 15 |
Large Enterprise | 32 |
Company Size | Count |
---|---|
Small Business | 16 |
Midsize Enterprise | 9 |
Large Enterprise | 31 |
Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory