11 Facts About Apache Spark

Apache Spark is an open-source unified analytics engine for large-scale data processing.

FactSnippet No. 1,425,525

Apache Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance.

FactSnippet No. 1,425,526

Apache Spark has its architectural foundation in the resilient distributed dataset, a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way.

FactSnippet No. 1,425,527

Apache Spark requires a cluster manager and a distributed storage system.

FactSnippet No. 1,425,528

Besides the RDD-oriented functional style of programming, Apache Spark provides two restricted forms of shared variables: broadcast variables reference read-only data that needs to be available on all nodes, while accumulators can be used to program reductions in an imperative style.

FactSnippet No. 1,425,529

11 Facts About Apache Spark

People also liked

Related searches