Understand Spark Execution Modes

Understand Spark Execution Modes

Understanding Spark Execution Modes — Local, Client and Cluster —is critical. From the perspective of a Big Data system, this is a simple yet critical aspect to grasp.

In this article, I’ll describe these concepts in basic and straightforward terms.

The Local Mode

This is similar to executing a program on a single JVM on someone’s laptop or desktop. It could be a program written in any language, such as Java, Scala or Python.

However, you should have defined and used a spark context object in these apps, as well as imported spark libraries and processed data from your local system files.

This is the local mode because everything is done locally, there is no concept of a node, and nothing is done in a distributed manner. A single JVM process is used to produce both the driver and the executor.

For example, launching a spark-shell on your laptop is an example of a local mode of execution.

The Client Mode

Consider a Spark Cluster with 3 Executors. The driver starts in client mode on the local machine (laptop/ desktop). i.e. driver is not part of the cluster. The executors, on the other hand, will run within the cluster.

Keep in mind that, in this situation, the entire program is dependant on the local machine since the driver is located there. In case of a problem with the local machine, the driver will go off. Following that, the entire application will shut down.

This mode is unsuitable for production environments. It remains useful for debugging and testing purposes since the outputs can be thrown on the driver termina (the local machine).

The Cluster Mode

Consider a Spark cluster with 3 executors. The driver and executor both run inside the cluster in this mode.

The spark job is submitted from your local machine to a cluster machine within the cluster — such machines are usually called edge node — .

Turning off the local computer will have no impact on the running computation. This is the strategy used in production environments.

In the 2 previous modes (client and cluster), the architecture is created and linked following some steps. This is how things are done:

The client mode

In the client mode

  1. The spark-submit will create the driver program locally
  2. The driver will communicate with the cluster manager to create the executors (2 executors in this case with 4 cores CPU and 20Gb of memory each)
  3. The cluster manager will ask the node managers to create the executors
  4. The executors will be linked to the local driver process

The cluster mode

In the cluster mode

  1. The spark-submit will send a query to the cluster manager to ask for the driver program launch
  2. Once received, the cluster manager will ask a node manager for the driver’s requirement configurations (in this case 4 cores CPU and 20Gb of memory)
  3. The node manager will create the driver process
  4. Once running, the driver will ask the cluster manager for executors creation based on their configurations (2 executors of 4 CPU cores and 20GB of memory each)
  5. The cluster manager will send a query to the node managers to create the executors and link them to the driver

Conclusion

We discovered the difference between the 3 execution modes of Spark. I hope you got the differences between the 3 execution modes and you understood the use case of each of them.

Did you find this article valuable?

Support Omar LARAQUI by becoming a sponsor. Any amount is appreciated!