All Courses

Apache Spark Programming with RDD

Updated on Oct 7, 2025

20,633 Views

Introduction

In this section we will look at a concrete example of an RDD transformation function and try to see the output by executing it on the Spark shell.

We have seen above the functions we can use with RDDs. These could be Transformations which produce another RDD or Actions which produce anything other than RDDs and send the result to the Driver or write to the disk or stable storage.

Implementations of RDD Transformations and Actions with an example:

Let us look at a concrete example of executing RDD transformation and action on real data. There are many examples available in Scala, Python and Java which are readily available with Apache Spark installation and they can be executed on the Spark shell. The examples are available in Spark Github at: https://github.com/apache/spark/tree/master/examples/src/main/scala/org/apache/spark/examples

- Procedure for executing [the example]: All of these examples can be executed by submitting the examples.jar provided with Spark installation. We can also execute these interactively on the Spark shell. Let us execute a simple one Word Count example on the Spark shell to understand in detail.

- Open Spark-Shell: The first step is to open the spark-shell on your machine where Spark is installed. Please execute the following command on the command line

> spark-shell

This should open the Spark shell as below:

Spark Code

- Create an RDD: The next step is to create an RDD by reading a text file for which we are going to count the words.

I have a file called “Spark.txt”. You can similarly have any .txt file and note the location. The first step is to create an RDD by reading the file as below:

Create an RDD

- Execute Word count Transformation: The next step is to execute the steps of the word count transformations:

- Each of the lines in the file is split into words using flatMap RDD transformation. flatMap applies a function that returns a sequence for each element in the list, and it flattens the results into the original list.

Execute Word count Transformation

- Each word is read and key-value pairs are used to create the map transformation. This assigns the value ‘1’ to each of the word-keys.

Execute Word count Transformation

- In the last step the values of matching keys are added to get the final count of each of the words using reduceByKey function.

Execute Word count Transformation

Please note that I have executed .collect() step only for demonstration purpose to show the intermediate to get the understanding better. This is not required in actual programming.

- Current RDD: In our example above we have different RDDs at the different steps. If we want to know about the current RDD, we can execute the following command and get more details about the RDD.

> counts.toDebugString

This gives the whole dependencies of the RDD for debugging purposes.

Current RDD

- Caching the Transformations: If we look at the 3 step execution of our word count example in detail, we note that each time I executed .collect(), the execution started from reading the file. So every time an action was called, it re-computed all the steps in my execution which is not what we would like. So we can avoid this by persisting or caching the RDD. This can be done by persist or cache methods. This caches the RDD in memory after the action is called and then the next iterative step will not re-compute the same steps but will use the cache and will perform better.

Current RDD Caching the Transformations

- Applying the Action: As we already know, all Spark transformations are executed only when an action is called. This results into the actual computation of the whole dependencies and gets the result for the computation.

We can execute the following code to save our output.

Applying the Action

- Checking the Output: We can check the output of our program by opening another terminal and running the command:

> ls -l /Users/home/Downloads/output/

Applying the Action

Output: The output can be seen by running the below command on the 2 part files created as the output of our program.

Spark Code

This is not the full output as my screen could not capture the full terminal window as results span across windows due to bigger input file.

The output also contains _SUCCESS file which shows that the program execution is completed successfully. This comes from the similar MapReduce concept in Hadoop.

Conclusion

We saw above how to work with a transformation on the Spark shell. Working with other set of transformations and actions is very similar.

Full Name*

Email*

+91

Phone Number*

United States +1

India +91

Canada +1

Australia +61

Singapore +65

New Zealand +64

Germany +49

United Arab Emirates +971

Hong Kong +852

Ireland +353

Afghanistan +93

Aland Islands +358

Albania +355

Algeria +213

AmericanSamoa +1684

Andorra +376

Angola +244

Anguilla +1264

Antarctica +672

Antigua and Barbuda +1268

Argentina +54

Armenia +374

Aruba +297

Ascension Island +247

Austria +43

Azerbaijan +994

Bahamas +1242

Bahrain +973

Bangladesh +880

Barbados +1246

Belarus +375

Belgium +32

Belize +501

Benin +229

Bermuda +1441

Bhutan +975

Bolivia +591

Bosnia and Herzegovina +387

Botswana +267

Brazil +55

British Indian Ocean Territory +246

Brunei Darussalam +673

Bulgaria +359

Burkina Faso +226

Burundi +257

Cambodia +855

Cameroon +237

Cape Verde +238

Cayman Islands +1345

Central African Republic +236

Chad +235

Chile +56

China +86

Christmas Island +61

Cocos (Keeling) Islands +61

Colombia +57

Comoros +269

Congo +242

Cook Islands +682

Costa Rica +506

Cote d'Ivoire +225

Croatia +385

Cuba +53

Cyprus +357

Czech Republic +420

Democratic Republic of the Congo +243

Denmark +45

Djibouti +253

Dominica +1767

Dominican Republic +1849

Ecuador +593

Egypt +20

El Salvador +503

Equatorial Guinea +240

Eritrea +291

Estonia +372

Eswatini +268

Ethiopia +251

Falkland Islands (Malvinas) +500

Faroe Islands +298

Fiji +679

Finland +358

France +33

French Guiana +594

French Polynesia +689

Gabon +241

Gambia +220

Georgia +995

Ghana +233

Gibraltar +350

Greece +30

Greenland +299

Grenada +1473

Guadeloupe +590

Guam +1671

Guatemala +502

Guernsey +44

Guinea +224

Guinea-Bissau +245

Guyana +592

Haiti +509

Holy See (Vatican City State) +379

Honduras +504

Hungary +36

Iceland +354

Indonesia +62

Iran +98

Iraq +964

Isle of Man +44

Israel +972

Italy +39

Jamaica +1876

Japan +81

Jersey +44

Jordan +962

Kazakhstan +77

Kenya +254

Kiribati +686

Korea, Democratic People's Republic of Korea +850

Korea, Republic of South Korea +82

Kosovo +383

Kyrgyzstan +996

Laos +856

Latvia +371

Lebanon +961

Lesotho +266

Liberia +231

Libya +218

Liechtenstein +423

Lithuania +370

Luxembourg +352

Macau +853

Madagascar +261

Malawi +265

Malaysia +60

Maldives +960

Mali +223

Malta +356

Marshall Islands +692

Martinique +596

Mauritania +222

Mauritius +230

Mayotte +262

Mexico +52

Micronesia, Federated States of Micronesia +691

Moldova +373

Monaco +377

Mongolia +976

Montenegro +382

Montserrat +1664

Morocco +212

Mozambique +258

Myanmar +95

Namibia +264

Nauru +674

Nepal +977

Netherlands +31

New Caledonia +687

Nicaragua +505

Niger +227

Nigeria +234

Niue +683

Norfolk Island +672

North Macedonia +389

Northern Mariana Islands +1670

Norway +47

Oman +968

Pakistan +92

Palau +680

Palestine +970

Papua New Guinea +675

Paraguay +595

Peru +51

Philippines +63

Pitcairn +872

Poland +48

Portugal +351

Puerto Rico +1939

Qatar +974

Reunion +262

Romania +40

Russia +7

Rwanda +250

Saint Barthelemy +590

Saint Helena, Ascension and Tristan Da Cunha +290

Saint Kitts and Nevis +1869

Saint Lucia +1758

Saint Martin +590

Saint Pierre and Miquelon +508

Saint Vincent and the Grenadines +1784

Samoa +685

San Marino +378

Sao Tome and Principe +239

Saudi Arabia +966

Senegal +221

Serbia +381

Seychelles +248

Sierra Leone +232

Sint Maarten +1721

Slovakia +421

Slovenia +386

Solomon Islands +677

Somalia +252

South Africa +27

South Georgia and the South Sandwich Islands +500

South Sudan +211

Spain +34

Sri Lanka +94

Sudan +249

Suriname +597

Svalbard and Jan Mayen +47

Sweden +46

Switzerland +41

Syrian Arab Republic +963

Taiwan +886

Tajikistan +992

Tanzania, United Republic of Tanzania +255

Thailand +66

Timor-Leste +670

Togo +228

Tokelau +690

Tonga +676

Trinidad and Tobago +1868

Tunisia +216

Turkey +90

Turkmenistan +993

Turks and Caicos Islands +1649

Tuvalu +688

Uganda +256

Ukraine +380

United Kingdom +44

Uruguay +598

Uzbekistan +998

Vanuatu +678

Venezuela, Bolivarian Republic of Venezuela +58

Vietnam +84

Virgin Islands, British +1284

Virgin Islands, U.S. +1340

Wallis and Futuna +681

Yemen +967

Zambia +260

Zimbabwe +263

By Signing up, you agree to ourTerms & Conditionsand ourPrivacy and Policy

10% OFF

Coupon Code "SELF10"

Coupon Expires 09/03

Copy

Get your free handbook for CSM!!

Recommended Courses