How to Install Apache Spark on Ubuntu 22.04


Apache Spark is a powerful distributed computing system that’s great for handling big data processing and analytics and can be installed easily. For the corresponding purpose, the steps are easy to follow. In this guide. you’ll learn how to update your system, install Java, download Apache Spark, extract the files, and configure the necessary environment variables.

You will also find out how to check that Apache Spark is installed correctly and how to uninstall it if you need to. With this tutorial, you’ll be able to use Apache Spark for all your data-intensive tasks on Ubuntu 22.04.

Installing Apache Spark on Ubuntu 22.04

In this part, we will learn the method for installing Apache Spark on Ubuntu 22.04.

Step 1: Update the System

Before installing Spark on Ubuntu, update your system:

sudo apt update

System packages have been updated:

Step 2: Install Java

As Apache Spark is Java-based, so install Java first:

sudo apt install default-jdk

Hit “Y”, when asked to confirm the process:

As you can see, Java has been installed without any errors:

Step 3: Download Apache Spark

Then, use this command to get the Apache Spark file on your system:

wget https://archive.apache.org/dist/spark/spark-3.0.3/spark-3.0.3-bin-hadoop2.7.tgz

 

The downloading process will take some time:

Step 4: Extract Apache Spark File

Once downloaded, extract the Apache Spark file by utilizing this command:

tar xvf spark-3.0.3-bin-hadoop2.7.tgz

Step 5: Move the folder

After the file is extracted, move the folder to the directory “/opt/”:

sudo mv spark-3.0.3-bin-hadoop2.7/ /opt/spark

How to Configure Apache Spark on Ubuntu 22.04?

Once you are done with downloading and extracting the Apache Spark file, now will discuss the procedure for its configuration.

Step 1: Open the Configuration File

Firstly, utilize “nano” or any other text editor to open the Apache Spark configuration file:

sudo nano ~/.profile

Step 2: Set Environment Variables

Next, for setting the environment variables, utilize the below command:

export SPARK_HOME=/opt/spark

export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

export PYSPARK_PYTHON=/usr/bin/python3

Then, press “CTRL+O” to save, and “CTRL+X” to exit the configuration file.

Step 3: Load the file

The changes will take place after loading the file:

source ~/.profile

How to Verify Apache Spark Installation on Ubuntu 22.04?

Make use of the command below for verifying Apache Spark installation on your system:

spark-shell --version

How to Remove/Uninstall Apache Spark on Ubuntu 22.04?

The uninstallation of Apache Spark involves a few steps. If you have installed Apache Spark by downloading and extracting its file then, use the “rm” command to delete the Spark installation directory:

sudo rm -rf /opt/spark

Also, remove the environment variable you set earlier:

That was all from this effective tutorial about installing and configuring Apache Spark on Ubuntu 22.04.

Conclusion

To install Apache Spark on Ubuntu 22.04, update your system first. Then, install Java using the “sudo apt install default-jdk” command. Next, download the Apache Spark file “wget https://archive.apache.org/dist/spark/spark-3.0.3/spark-3.0.3-bin-hadoop2.7.tgz”. Extract this downloaded file, by typing the “tar xvf spark-3.0.3-bin-hadoop2.7.tgz” command. Finally, move the extracted folder to the “/opt/” directory.

Print Friendly, PDF & Email
Categories