11 Facts About Apache Spark


Apache Spark is an open-source unified analytics engine for large-scale data processing.

FactSnippet No. 1,425,525

Apache Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance.

FactSnippet No. 1,425,526

Apache Spark has its architectural foundation in the resilient distributed dataset, a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way.

FactSnippet No. 1,425,527

Apache Spark requires a cluster manager and a distributed storage system.

FactSnippet No. 1,425,528

Besides the RDD-oriented functional style of programming, Apache Spark provides two restricted forms of shared variables: broadcast variables reference read-only data that needs to be available on all nodes, while accumulators can be used to program reductions in an imperative style.

FactSnippet No. 1,425,529

Related searches

SQL Java BSD license

Apache Spark SQL is a component on top of Apache Spark Core that introduced a data abstraction called DataFrames, which provides support for structured and semi-structured data.

FactSnippet No. 1,425,530

Apache Spark SQL provides a domain-specific language to manipulate DataFrames in Scala, Java, Python or.

FactSnippet No. 1,425,531

Apache Spark Streaming uses Apache Spark Core's fast scheduling capability to perform streaming analytics.

FactSnippet No. 1,425,532

Apache Spark can be deployed in a traditional on-premises data center as well as in the cloud.

FactSnippet No. 1,425,533

Apache Spark has built-in support for Scala, Java, R, and Python with 3rd party support for the.

FactSnippet No. 1,425,534

Apache Spark was initially started by Matei Zaharia at UC Berkeley's AMPLab in 2009, and open sourced in 2010 under a BSD license.

FactSnippet No. 1,425,535