HomeBlogBig DataHow to Install Spark on Ubuntu: An Instructional Guide

How to Install Spark on Ubuntu: An Instructional Guide

Published
02nd May, 2024
Views
view count loader
Read it in
4 Mins
In this article
    How to Install Spark on Ubuntu: An Instructional Guide

    Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python, and R and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools, including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

    In this article, we will cover the installation procedure of Apache Spark on the Ubuntu operating system. Also, check how to Install Jenkins on Ubuntu.

    Prerequisites

    This guide assumes that you are using Ubuntu and that Hadoop 2.7 is installed in your system.

    1. Java8 should be installed on your Machine.
    2. Hadoop should be installed on your Machine.

    System requirements

    How to Install Spark on Ubuntu

    • Ubuntu OS Installed.
    • Minimum of 8 GB RAM.
    • At least 20 GB of free space.

    Installation Procedure

    1. Making system ready

    Before installing Spark, ensure that you have installed Java8 in your Ubuntu Machine. If not installed, please follow the below process to install java8 in your Ubuntu System.

    1. Install java8 using the below command.

    sudo apt-get install oracle-java8-installer

    The above command creates a java-8-oracle Directory in /usr/lib/jvm/ directory in your machine. It looks like below

    How to Install Spark on Ubuntu

    Now we need to configure the JAVA_HOME path in .bashrc file.

    .bashrc file executes whenever we open the terminal.

    2. Configure JAVA_HOME and PATH  in .bashrc file and save. To edit/modify .bashrc file, use the below command.

    vi .bashrc 

    Then press i(for insert) -> then Enter the below the line at the bottom of the file.

    export JAVA_HOME= /usr/lib/jvm/java-8-oracle/
    export PATH=$PATH:$JAVA_HOME/bin

    Below is the screenshot of that.

    How to Install Spark on Ubuntu

    Then Press Esc -> wq! (For save the changes) -> Enter.

    3. Now, test whether Java is installed properly or not by checking the version of Java. The below command should show the java version.

    java -version

    Below is the screenshot

    How to Install Spark on Ubuntu

    2. Installing Spark on the System

    Go to the official download page of Apache Spark below and choose the latest release. For the package type, choose ‘Pre-built for Apache Hadoop’.

    https://spark.apache.org/downloads.html

    The page will look like below

    How to Install Spark on Ubuntu

    Or You can use a direct link to download.

    https://www.apache.org/dyn/closer.lua/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz

    3. Creating Spark directory

    Create a directory called spark under /usr/ directory. Use the below command to create a spark directory

    sudo mkdir /usr/spark

    The above command asks password to create a spark directory under the /usr directory; you can give the password. Then check spark directory is created or not in the /usr directory using the below command

    ll /usr/

    It should give the below results with the ‘spark’ directory

    How to Install Spark on Ubuntu

    Go to /usr/spark directory. Use the below command to go spark directory.

    cd /usr/spark

    4. Download Spark version

    Download spark2.3.3 in the spark directory using the below command: https://www.apache.org/dyn/closer.lua/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz

    If you use ll or ls command, you can see spark-2.4.0-bin-hadoop2.7.tgz in the spark directory.

    5. Extract Spark file

    Then extract spark-2.4.0-bin-hadoop2.7.tgz using the below command.

    sudo tar xvzf spark-2.4.0-bin-hadoop2.7

    Now spark-2.4.0-bin-hadoop2.7.tgz file is extracted as spark-2.4.0-bin-hadoop2.7

    Check whether it extracted or not using ll command. It should give the below results.

    How to Install Spark on Ubuntu

    6. Configuration

    Configure the SPARK_HOME path in the .bashrc file by following the below steps.

    Go to the home directory using the below command

    cd ~

    Open the .bashrc file using the below command

    vi .bashrc

    Now we will configure SPARK_HOME and PATH

    press i for insert the enter SPARK_HOME and PATH  like below

    SPARK_HOME=/usr/spark/spark-2.4.0-bin-hadoop2.7

    PATH=$PATH:$SPARK_HOME/bin

    It looks like below

    How to Install Spark on Ubuntu

    Then save and exit by entering the below commands.

    Press Esc -> wq! -> Enter

    Test Installation

    Now we can verify whether the spark is successfully installed in our Ubuntu Machine. To verify, use the below command, then enter.

    spark-shell 

    The above command should show below the screen:

    How to Install Spark on Ubuntu

    Now we have successfully installed spark on Ubuntu System. Let’s create RDD and Dataframe then we will end up.

    a. We can create RDD in 3 ways, we will use one way to create RDD.

    Define any list then parallelize it. It will create RDD. Below are the codes. Copy paste it one by one on the command line.

    val nums = Array(1,2,3,5,6)
    val rdd = sc.parallelize(nums)

    Above will create RDD.

    b. Now we will create a Data frame from RDD. Follow the below steps to create Dataframe.

    import spark.implicits._
    val df = rdd.toDF("num")

    Above code will create Dataframe with num as a column.

    To display the data in Dataframe use below command

    df.show()

    Below is the screenshot of the above code.

    How to Install Spark on Ubuntu

    How to Uninstall Spark from Ubuntu System

    You can follow the below steps to uninstall spark on Windows 10.

    1. Remove SPARK_HOME from the .bashrc file.

    To remove SPARK_HOME variable from the .bashrc please follow below steps

    2. Go to the home directory. To go to the home directory, use the below command.

    cd ~

    3. Open .bashrc file. To open .bashrc file use the below command.

    vi .bashrc

    4. Press i for edit/delete SPARK_HOME from .bashrc file. Then find SPARK_HOME the delete SPARK_HOME=/usr/spark/spark-2.4.0-bin-hadoop2.7 line from .bashrc file and save. To do follow the below commands

    Then press Esc -> wq! -> Press Enter

    5. We will also delete downloaded and extracted spark installers from the system. Please follow the below command.

    rm -r ~/spark

    The above command will delete the spark directory from the system.

    6. Open Command Line Interface then, type spark-shell,  then press enter, and now we get an error.

    Now we can confirm that Spark is successfully uninstalled from the Ubuntu System.

    Unleash the Power of Data Science: Dive into our Basic Data Science Course and unlock a world of endless possibilities. Start your journey today!

    Conclusion 

    For the analysis of big data in cluster computing systems, Apache Spark is a framework. Because of how simple it is to use and how much faster it processes data than Hadoop, this platform has gained a lot of popularity. 

    To analyze huge amounts of data more, Apache Spark can split a job over a number of machines in a cluster. This open-source engine supports several programming languages. This contains Python, R, Scala, and Java. 

    This blog has covered all the details of installing Spark on Ubuntu. Follow the guide above to set up and get started with Spark easily. If you are facing any issues while proceeding with the installation, let us know in the comments below.  

    Frequently Asked Questions (FAQs)

    1How to install Spark 2.4 on Ubuntu?

    The steps to install Spark on Ubuntu are as follows:   

    • Check to see if all of your system packages are current.  
    • Next, you need to Install Java as specified in this article.   
    • Further, Install Apache Spark on Ubuntu 18.04 LTS (Detail steps are provided above)   
    • Finally, access Apache Spark.  
    2How do I launch Spark in Ubuntu?

    To start the Spark master and slave services, use the following command.  

    $ start-master.sh  

    $ start-workers.sh spark://localhost:7077  

    After the service begins, open a browser and enter the URL to visit the spark page.  

    http://localhost:8080/  

    OR   

    http://127.0.0.1:8080  

    The spark-shell command may be used to see if spark-shell is functioning properly.  

    $ spark-shell  

    3How do I get Spark in the terminal?

    You can run Spark shell and communicate with Spark in Scala by typing bin/spark-shell at the command line's Apache Spark Installation directory and pressing Enter.   

    Enter spark-shell in the command line or terminal if you have PATH set up for Spark (mac users). 

    Profile

    Dr. Manish Kumar Jain

    International Corporate Trainer

    Dr. Manish Kumar Jain is an accomplished author, international corporate trainer, and technical consultant with 20+ years of industry experience. He specializes in cutting-edge technologies such as ChatGPT, OpenAI, generative AI, prompt engineering, Industry 4.0, web 3.0, blockchain, RPA, IoT, ML, data science, big data, AI, cloud computing, Hadoop, and deep learning. With expertise in fintech, IIoT, and blockchain, he possesses in-depth knowledge of diverse sectors including finance, aerospace, retail, logistics, energy, banking, telecom, healthcare, manufacturing, education, and oil and gas. Holding a PhD in deep learning and image processing, Dr. Jain's extensive certifications and professional achievements demonstrate his commitment to delivering exceptional training and consultancy services globally while staying at the forefront of technology.

    Share This Article
    Ready to Master the Skills that Drive Your Career?

    Avail your free 1:1 mentorship session.

    Select
    Your Message (Optional)

    Upcoming Big Data Batches & Dates

    NameDateFeeKnow more
    Course advisor icon
    Course Advisor
    Whatsapp/Chat icon