All Courses

Python - XML

Updated on Sep 3, 2025

39,232 Views

The Extensible Markup Language (XML) is a markup language much like HTML. It is a portable and. It is useful for handling small to medium amounts of data without using any SQL database.

Python's standard library contains xml package. This package has following modules that define XML processing APIs.

xml.etree.ElementTree: a simple and lightweight XML processor API
xml.dom: the DOM API definition
xml.sax: SAX2 base classes and convenience functions

ElementTree module

XML is a tree like hierarchical data format. This module has two classes for this purpose - 'ElementTree' treats the whole XML document as a tree, and 'Element' represents a single node in this tree. Reading and writing operations on XML files are done on the ElementTree level. Interactions with a single XML element and its sub-elements are done on the Element level.

To create XML file using ElementTree:

The tree is a hierarchical structure of elements starting with root followed by other elements. Each element is created by using Element() function of this module.

import xml.etree.ElementTree as xml
e=xml.Element('name')

Each element is characterized by a tag and attrib attribute which is a dict object. For tree's starting element, attrib is an empty dictionary

>>> root=xml.Element('students')
>>> root.tag
'students'
>>> root.attrib
{}

You may now set up one or more child elements to be added under root element. Each child may have one or more subelements. Add them using Subelement() function and define it's text attribute.

child=xml.Element("student")
nm = xml.SubElement(child, "name")
nm.text = student.get('name')
age = xml.SubElement(child, "age")
age.text = str(student.get('age'))

Each child is added to root by append() function as:

root.append(child)

After adding required number of child elements, construct a tree object by elementTree() function:

tree = xml.ElementTree(root)

The entire tree structure is written to a binary file by tree object's write() function:

f=open('mytest.xml', "wb")
tree.write(f)

In following example tree is constructed out of list of dictionary items. Each dictionary item holds key-value pairs describing a student data structure. The tree so constructed is written to 'myfile.xml'

import xml.etree.ElementTree as xml
students=[{'name':'aaa','age':21,'marks':50},{'name':'bbb','age':22,'marks':60}]
root = xml.Element("students")
for student in students:
       child=xml.Element("student")
       root.append(child)
       nm = xml.SubElement(child, "name")
       nm.text = student.get('name')
       age = xml.SubElement(child, "age")
       age.text = str(student.get('age'))
       mark=xml.SubElement(child, "marks")
       mark.text=str(student.get('marks'))
tree = xml.ElementTree(root)
with open('myfile.xml', "wb") as fh:
       tree.write(fh)

The 'myfile.xml' is stored in current working directory.

<students><student><name>aaa</name><age>21</age><marks>50</marks></student><student><name>bbb</name><age>22</age><marks>60</marks></student></students>

To parse XML file:

Let us now read back the 'myfile.xml' created in above example. For this purpose following functions in ElementTree module will be used:

ElementTree(): This function is overloaded to read the hierarchical structure of elements to a tree objects.

tree = xml.ElementTree(file='myfile.xml')

getroot(): This function returns root element of the tree

root = tree.getroot()

getchildren(): This function returns the list of sub-elements one level below of an element.

children = root.getchildren()

In following example, elements and sub-elements of the 'myfile.xml' are parsed into a list of dictionary items.

import xml.etree.ElementTree as xml
tree = xml.ElementTree(file='myfile.xml')
root = tree.getroot()
students=[]
children = root.getchildren()
for child in children:
       student={}
       pairs = child.getchildren()
       for pair in pairs:
           student[pair.tag]=pair.text
       students.append(student)
print (students)

Output:

[{'name': 'aaa', 'age': '21', 'marks': '50'}, {'name': 'bbb', 'age': '22', 'marks': '60'}]

To modify XML file

We shall use iter() function of Element. It creates a tree iterator for given tag with the current element as the root. The iterator iterates over this element and all elements below it, in document (depth first) order.

Let us build iterator for all 'marks' subelements and increment text of each marks tag by 10.

import xml.etree.ElementTree as xml
tree = xml.ElementTree(file='myfile.xml')
root = tree.getroot()
for x in root.iter('marks'):
       mark=int (x.text)
       mark=mark+10
       x.text=str(mark)
with open("myfile.xml", "wb") as fh:
       tree.write(fh)

Our 'myfile.xml' will now be modified accordingly.

We can also use set() to update value of a certain key.

x.set(marks, str(mark))

The DOM API

The Document Object Model is a cross-language API recommended by World Wide Web Consortium (W3C) for accessing and modifying the XML documents. It is extremely useful for random-access applications. The easiest way to load an XML document xml.dom module.

Minimal implementation of the Document Object Model interface is done by xml.dom.minidom with an API that is available in other languages. It is simpler than the full DOM and also significantly smaller.

The xml.dom.pulldom module provides a “pull parser” that generates DOM-accessible fragments of the document.

The minidom object provides a parser method that creates a DOM tree from the XML file.

xml.dom.minidom.parse(filename)

The getElementsByTagName() function lets access to individual elements in the DOM tree.

SAX API

SAX is a standard interface for event-driven XML parsing. You need to subclass xml.sax.ContentHandler and obtain ContentHandler. The ContentHandler handles tags and attributes of XML.Methods for handling parsing events are defined in ContentHandler class.

The ContentHandler class provides startElement() and endElement() methods which get called when an element starts and ends respectively.

make_parser() function creates a a SAX XMLReader object.

parser = xml.sax.make_parser()

Then set the contenthandler to user-defined class subclassed from SAX.ContentHandler.

Handler = MovieHandler()
parser.setContentHandler( Handler )

Now you can use above parser object to parse an XML file.

parser.parse('myfile.xml')

Following method creates a SAX parser and uses it to parse a document.

xml.sax.parse( xmlfile, contenthandler[, errorhandler])

The parseString() method to create a SAX parser and to parse the specified XML string.

xml.sax.parseString(xmlstring,contenthandler[,errorhandler])

Comparison of DOM vs SAX

SAX

you register callbacks for events and let the parser proceed through the document
useful for large documents or of there are have memory limitations
parses the file as it reads it from disk
entire file is never stored in the memory

DOM

entire file is read into the memory and stored in a hierarchical (tree-based) form to represent all the features of an XML document
SAX not as fast as DOM, with large files
DOM can kill resources if used on many small files

SAX is read-only, while DOM allows changes to the XML file.

Full Name*

Email*

+91

Phone Number*

United States +1

India +91

Canada +1

Australia +61

Singapore +65

New Zealand +64

Germany +49

United Arab Emirates +971

Hong Kong +852

Ireland +353

Afghanistan +93

Aland Islands +358

Albania +355

Algeria +213

AmericanSamoa +1684

Andorra +376

Angola +244

Anguilla +1264

Antarctica +672

Antigua and Barbuda +1268

Argentina +54

Armenia +374

Aruba +297

Ascension Island +247

Austria +43

Azerbaijan +994

Bahamas +1242

Bahrain +973

Bangladesh +880

Barbados +1246

Belarus +375

Belgium +32

Belize +501

Benin +229

Bermuda +1441

Bhutan +975

Bolivia +591

Bosnia and Herzegovina +387

Botswana +267

Brazil +55

British Indian Ocean Territory +246

Brunei Darussalam +673

Bulgaria +359

Burkina Faso +226

Burundi +257

Cambodia +855

Cameroon +237

Cape Verde +238

Cayman Islands +1345

Central African Republic +236

Chad +235

Chile +56

China +86

Christmas Island +61

Cocos (Keeling) Islands +61

Colombia +57

Comoros +269

Congo +242

Cook Islands +682

Costa Rica +506

Cote d'Ivoire +225

Croatia +385

Cuba +53

Cyprus +357

Czech Republic +420

Democratic Republic of the Congo +243

Denmark +45

Djibouti +253

Dominica +1767

Dominican Republic +1849

Ecuador +593

Egypt +20

El Salvador +503

Equatorial Guinea +240

Eritrea +291

Estonia +372

Eswatini +268

Ethiopia +251

Falkland Islands (Malvinas) +500

Faroe Islands +298

Fiji +679

Finland +358

France +33

French Guiana +594

French Polynesia +689

Gabon +241

Gambia +220

Georgia +995

Ghana +233

Gibraltar +350

Greece +30

Greenland +299

Grenada +1473

Guadeloupe +590

Guam +1671

Guatemala +502

Guernsey +44

Guinea +224

Guinea-Bissau +245

Guyana +592

Haiti +509

Holy See (Vatican City State) +379

Honduras +504

Hungary +36

Iceland +354

Indonesia +62

Iran +98

Iraq +964

Isle of Man +44

Israel +972

Italy +39

Jamaica +1876

Japan +81

Jersey +44

Jordan +962

Kazakhstan +77

Kenya +254

Kiribati +686

Korea, Democratic People's Republic of Korea +850

Korea, Republic of South Korea +82

Kosovo +383

Kyrgyzstan +996

Laos +856

Latvia +371

Lebanon +961

Lesotho +266

Liberia +231

Libya +218

Liechtenstein +423

Lithuania +370

Luxembourg +352

Macau +853

Madagascar +261

Malawi +265

Malaysia +60

Maldives +960

Mali +223

Malta +356

Marshall Islands +692

Martinique +596

Mauritania +222

Mauritius +230

Mayotte +262

Mexico +52

Micronesia, Federated States of Micronesia +691

Moldova +373

Monaco +377

Mongolia +976

Montenegro +382

Montserrat +1664

Morocco +212

Mozambique +258

Myanmar +95

Namibia +264

Nauru +674

Nepal +977

Netherlands +31

New Caledonia +687

Nicaragua +505

Niger +227

Nigeria +234

Niue +683

Norfolk Island +672

North Macedonia +389

Northern Mariana Islands +1670

Norway +47

Oman +968

Pakistan +92

Palau +680

Palestine +970

Papua New Guinea +675

Paraguay +595

Peru +51

Philippines +63

Pitcairn +872

Poland +48

Portugal +351

Puerto Rico +1939

Qatar +974

Reunion +262

Romania +40

Russia +7

Rwanda +250

Saint Barthelemy +590

Saint Helena, Ascension and Tristan Da Cunha +290

Saint Kitts and Nevis +1869

Saint Lucia +1758

Saint Martin +590

Saint Pierre and Miquelon +508

Saint Vincent and the Grenadines +1784

Samoa +685

San Marino +378

Sao Tome and Principe +239

Saudi Arabia +966

Senegal +221

Serbia +381

Seychelles +248

Sierra Leone +232

Sint Maarten +1721

Slovakia +421

Slovenia +386

Solomon Islands +677

Somalia +252

South Africa +27

South Georgia and the South Sandwich Islands +500

South Sudan +211

Spain +34

Sri Lanka +94

Sudan +249

Suriname +597

Svalbard and Jan Mayen +47

Sweden +46

Switzerland +41

Syrian Arab Republic +963

Taiwan +886

Tajikistan +992

Tanzania, United Republic of Tanzania +255

Thailand +66

Timor-Leste +670

Togo +228

Tokelau +690

Tonga +676

Trinidad and Tobago +1868

Tunisia +216

Turkey +90

Turkmenistan +993

Turks and Caicos Islands +1649

Tuvalu +688

Uganda +256

Ukraine +380

United Kingdom +44

Uruguay +598

Uzbekistan +998

Vanuatu +678

Venezuela, Bolivarian Republic of Venezuela +58

Vietnam +84

Virgin Islands, British +1284

Virgin Islands, U.S. +1340

Wallis and Futuna +681

Yemen +967

Zambia +260

Zimbabwe +263

By Signing up, you agree to ourTerms & Conditionsand ourPrivacy and Policy

Get your free handbook for CSM!!

Recommended Courses