A Spark Cluster in Standalone Mode comprises of one Master and multiple Spark Worker processes. Standalone mode can be used both on a single local machine or on a cluster. This mode does not require any external resource manager such as Mesos.
To deploy a Spark Cluster in Standalone mode, the following steps need to be executed on any one of the nodes.
1. Download the spark-0.7.x setup from:
2. Extract the Spark setup
tar -xzvf spark-0.7.x-sources.tgz
3. Spark requires Scala's bin directory to be present in the PATH variable of the linux machine. Scala 2.9.3 for Linux can be downloaded from:
4. Extract the Scala setup
tar -xzvf scala-2.9.3.tgz
5. Export the Scala home by appending the following line into "~/.bashrc" (for CentOS) or "/etc/environment" (for Ubuntu)
6. Spark can be compiled "sbt" or can be built using Maven. This module states the former method, because of it's simplicity of execution. To compile change directory to the extracted Spark setup and execute the following command:
7. Create a file (if not already present) called "spark-env.sh" in Sparkâ€™s "conf" directory, by copying "conf/spark-env.sh.template", and add the SCALA_HOME variable declaration to it as described below:
export SCALA_HOME=<path to Scala directory>
The Web UI port for the Spark Master and Worker can also be optionally specified by appending the following to "spark-env.sh"
8. To specify the nodes which would behave as the Workers, the IP of the nodes are to mentioned in "conf/slaves". For a cluster containing two worker nodes with IP 18.104.22.168 and 22.214.171.124, the "conf/slaves" would contain:
This completes the setup process on one node.
For setting up Spark on the other nodes of the cluster, the Spark and Scala Setup should be copied on same locations on the rest of the nodes of the cluster.
Lastly, edit the /etc/hosts file on all the nodes to add the "IP HostName" entries of all the other nodes in the cluster.
Hope that helps !!