top
April flash sale

Search

Apache Spark Tutorial

IntroductionNow that we have seen howmost of the concepts and internals of Apache Spark work, we will take a look at how to install Apache Spark on our local machines (desktops/laptops).Apache Spark is easy to install on Unix/Linux/Mac operating systems. It can be installed on a standalone machine and the steps are most common across the operating system. Let us look at the steps to install Apache Spark on a Mac machine as currently, I am on a Mac laptop. Verifying Java Installation: The first step is to verify the Java installation. Since Apache Spark is developed in Scala which works on JVM, we definitely need Java installation to go ahead with any other installation. To verify Java installation on a Mac please perform the below step.If you can't find Java installation please go ahead and install any of Oracle Java or OpenJDK version 8 or above. Verifying Scala installation: The next step is to install Scala on your machine. Please verify the Scala installation on your machine using the following step:If you don’t have Scala installed on your machine, you need to install Scala first before proceeding with the Spark installation.  Downloading Scala: Scala can be downloaded from the following link: https://www.scala-lang.org/download/ Please install the latest version. My version in the above screenshot might be different than what you might see when you click on the link above. That should not matter. Download the Scala binaries as per your operating system. Installing Scala: After the binaries are downloaded, please install Scala from the downloaded binary. On MacOS Scala can also be installed using brew update brew install ScalaAfter the installation is complete please verify again by running the “Scala -version” command to confirm the installation is properly completed. Downloading Apache Spark: Now we are ready to install Apache Spark. Apache Spark can be downloaded from the Apache Spark website https://spark.apache.org/downloads.html Please select the latest stable release of Spark and the corresponding Hadoop version build can also be chosen. The Hadoop version is important if you are installing Spark and you have a HDFS setup locally installed. Please note that we do not need HDFS to be installed locally for Spark to work on our local machine to get started with Spark. Installing Spark: After the download is complete, please install Spark from the binary. On MacOS, Spark can also be installed using Homebrew: > brew install apache-spark Verifying the Spark Installation: After all the above steps are done, please verify the Spark installation as below:The version installed on my Mac is 2.4.3 which is the latest version of Spark at the time of writing this tutorial. This is all it takes to install Spark and get started with getting your hands dirty with Spark. The shell above is a Scala interactive shell that can be used for running spark commands interactively. The shell can also be used to write small programs in Scala and run examples of Spark code. If you want to work on Python and use pyspark you can install Python and then pyspark using > pip install pyspark You can verify the pyspark installation as per below. If your installation of Python and pyspark is proper you will see something like below: You can use the Scala shell or Pyspark shell to start learning Spark in the language of your choice. For Java there is no Spark shell available, so you need to start working on IntelliJ or Eclipse with Scala compiler added to your IDE. Both Eclipse and IntelliJ have very good support for Scala. For Python programming, you can use PyCharm or any IDE of your choice. While programming for Spark on IDE you might need to download Spark artifacts. They are available in hosted in Maven Central. You can add  Maven dependency as below: groupId: org.apache.spark  artifactId: spark-core_2.11  version: 2.4.3 This is all about the Spark installation. ConclusionThis module showed us how we can install Spark. I have used Mac installation, but the process is very similar and easy for other operating systems like Windows/Linux/Unix etc.
logo

Apache Spark Tutorial

Apache Spark Installation

Introduction

Now that we have seen howmost of the concepts and internals of Apache Spark work, we will take a look at how to install Apache Spark on our local machines (desktops/laptops).

Apache Spark is easy to install on Unix/Linux/Mac operating systems. It can be installed on a standalone machine and the steps are most common across the operating system. Let us look at the steps to install Apache Spark on a Mac machine as currently, I am on a Mac laptop. 

Verifying Java Installation: The first step is to verify the Java installation. Since Apache Spark is developed in Scala which works on JVM, we definitely need Java installation to go ahead with any other installation. To verify Java installation on a Mac please perform the below step.

Verifying Java Installation code

If you can't find Java installation please go ahead and install any of Oracle Java or OpenJDK version 8 or above. 

Verifying Scala installation: The next step is to install Scala on your machine. Please verify the Scala installation on your machine using the following step:

Verifying Scala installation Code

If you don’t have Scala installed on your machine, you need to install Scala first before proceeding with the Spark installation.  

Downloading Scala: Scala can be downloaded from the following link: 

https://www.scala-lang.org/download/ 

Please install the latest version. My version in the above screenshot might be different than what you might see when you click on the link above. That should not matter. Download the Scala binaries as per your operating system. 

Installing Scala: After the binaries are downloaded, please install Scala from the downloaded binary. On MacOS Scala can also be installed using 

  • brew update 
  • brew install Scala

After the installation is complete please verify again by running the “Scala -version” command to confirm the installation is properly completed. 

Downloading Apache Spark: Now we are ready to install Apache Spark. Apache Spark can be downloaded from the Apache Spark website https://spark.apache.org/downloads.html 

Please select the latest stable release of Spark and the corresponding Hadoop version build can also be chosen. The Hadoop version is important if you are installing Spark and you have a HDFS setup locally installed. Please note that we do not need HDFS to be installed locally for Spark to work on our local machine to get started with Spark. 

Installing Spark: After the download is complete, please install Spark from the binary. On MacOS, Spark can also be installed using Homebrew: 

> brew install apache-spark 

Verifying the Spark Installation: After all the above steps are done, please verify the Spark installation as below:

Verifying the Spark Installation Code

The version installed on my Mac is 2.4.3 which is the latest version of Spark at the time of writing this tutorial. This is all it takes to install Spark and get started with getting your hands dirty with Spark. The shell above is a Scala interactive shell that can be used for running spark commands interactively. The shell can also be used to write small programs in Scala and run examples of Spark code. 

If you want to work on Python and use pyspark you can install Python and then pyspark using 

> pip install pyspark 

You can verify the pyspark installation as per below. If your installation of Python and pyspark is proper you will see something like below: Spark Code

You can use the Scala shell or Pyspark shell to start learning Spark in the language of your choice. For Java there is no Spark shell available, so you need to start working on IntelliJ or Eclipse with Scala compiler added to your IDE. Both Eclipse and IntelliJ have very good support for Scala. For Python programming, you can use PyCharm or any IDE of your choice. 

While programming for Spark on IDE you might need to download Spark artifacts. They are available in hosted in Maven Central. You can add  Maven dependency as below: 

groupId: org.apache.spark 
artifactId: spark-core_2.11 
version: 2.4.3 

This is all about the Spark installation. 

Conclusion

This module showed us how we can install Spark. I have used Mac installation, but the process is very similar and easy for other operating systems like Windows/Linux/Unix etc.

Leave a Reply

Your email address will not be published. Required fields are marked *

Comments

alvi

I feel very grateful that I read this. It is very helpful and very informative, and I really learned a lot from it.

alvi

I would like to thank you for the efforts you have made in writing this post. I wanted to thank you for this website! Thanks for sharing. Great website!

alvi

I feel very grateful that I read this. It is very helpful and informative, and I learned a lot from it.

sandipan mukherjee

yes you are right...When it comes to data and its management, organizations prefer a free-flow rather than long and awaited procedures. Thank you for the information.

liana

thanks for info