Python Tutorial

By KnowledgeHut .

The Extensible Markup Language (XML) is a markup language much like HTML. It is a portable and. It is useful for handling small to medium amounts of data without using any SQL database.Python's standard library contains xml package. This package has following modules that define XML processing APIs.xml.etree.ElementTree: a simple and lightweight XML processor APIxml.dom: the DOM API definitionxml.sax: SAX2 base classes and convenience functionsElementTree moduleXML is a tree like hierarchical data format. This module has two classes for this purpose - 'ElementTree' treats the whole XML document as a tree, and 'Element' represents a single node in this tree. Reading and writing operations on XML files are done on the ElementTree level. Interactions with a single XML element and its sub-elements are done on the Element level.To create XML file using ElementTree:The tree is a hierarchical structure of elements starting with root followed by other elements. Each element is created by using Element() function of this module.import xml.etree.ElementTree as xml e=xml.Element('name')Each element is characterized by a tag and attrib attribute which is a dict object. For tree's starting element, attrib is an empty dictionary>>> root=xml.Element('students') >>> root.tag 'students' >>> root.attrib {}You may now set up one or more child elements to be added under root element. Each child may have one or more subelements. Add them using Subelement() function and define it's text attribute.child=xml.Element("student") nm = xml.SubElement(child, "name") nm.text = student.get('name') age = xml.SubElement(child, "age") age.text = str(student.get('age'))Each child is added to root by append() function as:root.append(child)After adding required number of child elements, construct a tree object by elementTree() function:tree = xml.ElementTree(root)The entire tree structure is written to a binary file by tree object's write() function:f=open('mytest.xml', "wb") tree.write(f)In following example tree is constructed out of list of dictionary items. Each dictionary item holds key-value pairs describing a student data structure. The tree so constructed is written to 'myfile.xml'import xml.etree.ElementTree as xml students=[{'name':'aaa','age':21,'marks':50},{'name':'bbb','age':22,'marks':60}] root = xml.Element("students") for student in students: child=xml.Element("student") root.append(child) nm = xml.SubElement(child, "name") nm.text = student.get('name') age = xml.SubElement(child, "age") age.text = str(student.get('age')) mark=xml.SubElement(child, "marks") mark.text=str(student.get('marks')) tree = xml.ElementTree(root) with open('myfile.xml', "wb") as fh: tree.write(fh)The 'myfile.xml' is stored in current working directory.<students><student><name>aaa</name><age>21</age><marks>50</marks></student><student><name>bbb</name><age>22</age><marks>60</marks></student></students>To parse XML file:Let us now read back the 'myfile.xml' created in above example. For this purpose following functions in ElementTree module will be used:ElementTree(): This function is overloaded to read the hierarchical structure of elements to a tree objects.tree = xml.ElementTree(file='myfile.xml')getroot(): This function returns root element of the treeroot = tree.getroot()getchildren(): This function returns the list of sub-elements one level below of an element.children = root.getchildren()In following example, elements and sub-elements of the 'myfile.xml' are parsed into a list of dictionary items.import xml.etree.ElementTree as xml tree = xml.ElementTree(file='myfile.xml') root = tree.getroot() students=[] children = root.getchildren() for child in children: student={} pairs = child.getchildren() for pair in pairs: student[pair.tag]=pair.text students.append(student) print (students)Output:[{'name': 'aaa', 'age': '21', 'marks': '50'}, {'name': 'bbb', 'age': '22', 'marks': '60'}]To modify XML fileWe shall use iter() function of Element. It creates a tree iterator for given tag with the current element as the root. The iterator iterates over this element and all elements below it, in document (depth first) order.Let us build iterator for all 'marks' subelements and increment text of each marks tag by 10.import xml.etree.ElementTree as xml tree = xml.ElementTree(file='myfile.xml') root = tree.getroot() for x in root.iter('marks'): mark=int (x.text) mark=mark+10 x.text=str(mark) with open("myfile.xml", "wb") as fh: tree.write(fh)Our 'myfile.xml' will now be modified accordingly.We can also use set() to update value of a certain key.x.set(marks, str(mark))The DOM APIThe Document Object Model is a cross-language API recommended by World Wide Web Consortium (W3C) for accessing and modifying the XML documents. It is extremely useful for random-access applications. The easiest way to load an XML document xml.dom module.Minimal implementation of the Document Object Model interface is done by xml.dom.minidom with an API that is available in other languages. It is simpler than the full DOM and also significantly smaller.The xml.dom.pulldom module provides a “pull parser” that generates DOM-accessible fragments of the document.The Document Object Model is a cross-language API recommended by World Wide Web Consortium (W3C) for accessing and modifying the XML documents. It is extremely useful for random-access applications. The easiest way to load an XML document is to use xml.dom module.Minimal implementation of the Document Object Model interface is done by xml.dom.minidom with an API that is available in other languages. It is simpler than the full DOM and also significantly smaller.The xml.dom.pulldom module provides a “pull parser” that generates DOM-accessible fragments of the document.The minidom object provides a parser method that creates a DOM tree from the XML file.xml.dom.minidom.parse(filename)The getElementsByTagName() function lets access to individual elements in the DOM tree.SAX APISAX is a standard interface for event-driven XML parsing. You need to subclass xml.sax.ContentHandler and obtain ContentHandler. The ContentHandler handles tags and attributes of XML.Methods for handling parsing events are defined in ContentHandler class.The ContentHandler class provides startElement() and endElement() methods which get called when an element starts and ends respectively.make_parser() function creates a a SAX XMLReader object.parser = xml.sax.make_parser()Then set the contenthandler to user-defined class subclassed from SAX.ContentHandler.Handler = MovieHandler() parser.setContentHandler( Handler )Now you can use above parser object to parse an XML file.parser.parse('myfile.xml')Following method creates a SAX parser and uses it to parse a document.xml.sax.parse( xmlfile, contenthandler[, errorhandler])The parseString() method to create a SAX parser and to parse the specified XML string.xml.sax.parseString(xmlstring,contenthandler[,errorhandler])Comparison of DOM vs SAXSAXyou register callbacks for events and let the parser proceed through the document.useful for large documents or of there are have memory limitationsparses the file as it reads it from diskentire file is never stored in the memory.DOMentire file is read into the memory and stored in a hierarchical (tree-based) form to represent all the features of an XML document.SAX not as fast as DOM, with large files.DOM can kill resources if used on many small files.SAX is read-only, while DOM allows changes to the XML file.

1. Python - Introduction

2. Python - Getting Started

3. Python - Basic Syntax

4. Python - Variables

5. Python - Numbers

6. Python - Strings

7. Python - List And Tuple

8. Python - Dictionary

9. Python - Sets

10. Python - Operators

11. Python - Conditional Statements

12. Python - Loops

13. Python - Built-In Modules

14. Python - User Defined Functions

15. Python - Functional Programming

16. Python - Custom Modules

17. Python - Packages

18. Python - Exceptions

19. Python - FileIO

20. Python - CSV

21. Python - Database Connectivity

22. Python – Tkinter

23. Python - Object Oriented Programming

24. Python - Decorators

25. Python - Inheritance

26. Python - Magic Methods

27. Python - Regex

28. Python - CGI

29. Python - Send Email

30. Python - Object Serialization

31. Python - Multithreading

32. Python - XML

33. Python - Socket Module

34. Python - Data Classes

Python - XML

The Extensible Markup Language (XML) is a markup language much like HTML. It is a portable and. It is useful for handling small to medium amounts of data without using any SQL database.

Python's standard library contains xml package. This package has following modules that define XML processing APIs.

xml.etree.ElementTree: a simple and lightweight XML processor API
xml.dom: the DOM API definition
xml.sax: SAX2 base classes and convenience functions

ElementTree module

XML is a tree like hierarchical data format. This module has two classes for this purpose - 'ElementTree' treats the whole XML document as a tree, and 'Element' represents a single node in this tree. Reading and writing operations on XML files are done on the ElementTree level. Interactions with a single XML element and its sub-elements are done on the Element level.

To create XML file using ElementTree:

The tree is a hierarchical structure of elements starting with root followed by other elements. Each element is created by using Element() function of this module.

import xml.etree.ElementTree as xml
e=xml.Element('name')

Each element is characterized by a tag and attrib attribute which is a dict object. For tree's starting element, attrib is an empty dictionary

>>> root=xml.Element('students')
>>> root.tag
'students'
>>> root.attrib
{}

You may now set up one or more child elements to be added under root element. Each child may have one or more subelements. Add them using Subelement() function and define it's text attribute.

child=xml.Element("student")
nm = xml.SubElement(child, "name")
nm.text = student.get('name')
age = xml.SubElement(child, "age")
age.text = str(student.get('age'))

Each child is added to root by append() function as:

root.append(child)

After adding required number of child elements, construct a tree object by elementTree() function:

tree = xml.ElementTree(root)

The entire tree structure is written to a binary file by tree object's write() function:

f=open('mytest.xml', "wb")
tree.write(f)

In following example tree is constructed out of list of dictionary items. Each dictionary item holds key-value pairs describing a student data structure. The tree so constructed is written to 'myfile.xml'

import xml.etree.ElementTree as xml
students=[{'name':'aaa','age':21,'marks':50},{'name':'bbb','age':22,'marks':60}]
root = xml.Element("students")
for student in students:
       child=xml.Element("student")
       root.append(child)
       nm = xml.SubElement(child, "name")
       nm.text = student.get('name')
       age = xml.SubElement(child, "age")
       age.text = str(student.get('age'))
       mark=xml.SubElement(child, "marks")
       mark.text=str(student.get('marks'))
tree = xml.ElementTree(root)
with open('myfile.xml', "wb") as fh:
       tree.write(fh)

The 'myfile.xml' is stored in current working directory.

<students><student><name>aaa</name><age>21</age><marks>50</marks></student><student><name>bbb</name><age>22</age><marks>60</marks></student></students>

To parse XML file:

Let us now read back the 'myfile.xml' created in above example. For this purpose following functions in ElementTree module will be used:

ElementTree(): This function is overloaded to read the hierarchical structure of elements to a tree objects.

tree = xml.ElementTree(file='myfile.xml')

getroot(): This function returns root element of the tree

root = tree.getroot()

getchildren(): This function returns the list of sub-elements one level below of an element.

children = root.getchildren()

In following example, elements and sub-elements of the 'myfile.xml' are parsed into a list of dictionary items.

import xml.etree.ElementTree as xml
tree = xml.ElementTree(file='myfile.xml')
root = tree.getroot()
students=[]
children = root.getchildren()
for child in children:
       student={}
       pairs = child.getchildren()
       for pair in pairs:
           student[pair.tag]=pair.text
       students.append(student)
print (students)

Output:

[{'name': 'aaa', 'age': '21', 'marks': '50'}, {'name': 'bbb', 'age': '22', 'marks': '60'}]

To modify XML file

We shall use iter() function of Element. It creates a tree iterator for given tag with the current element as the root. The iterator iterates over this element and all elements below it, in document (depth first) order.

Let us build iterator for all 'marks' subelements and increment text of each marks tag by 10.

import xml.etree.ElementTree as xml
tree = xml.ElementTree(file='myfile.xml')
root = tree.getroot()
for x in root.iter('marks'):
       mark=int (x.text)
       mark=mark+10
       x.text=str(mark)
with open("myfile.xml", "wb") as fh:
       tree.write(fh)

Our 'myfile.xml' will now be modified accordingly.

We can also use set() to update value of a certain key.

x.set(marks, str(mark))

The DOM API

The Document Object Model is a cross-language API recommended by World Wide Web Consortium (W3C) for accessing and modifying the XML documents. It is extremely useful for random-access applications. The easiest way to load an XML document xml.dom module.

Minimal implementation of the Document Object Model interface is done by xml.dom.minidom with an API that is available in other languages. It is simpler than the full DOM and also significantly smaller.

The xml.dom.pulldom module provides a “pull parser” that generates DOM-accessible fragments of the document.

The minidom object provides a parser method that creates a DOM tree from the XML file.

xml.dom.minidom.parse(filename)

The getElementsByTagName() function lets access to individual elements in the DOM tree.

SAX API

SAX is a standard interface for event-driven XML parsing. You need to subclass xml.sax.ContentHandler and obtain ContentHandler. The ContentHandler handles tags and attributes of XML.Methods for handling parsing events are defined in ContentHandler class.

The ContentHandler class provides startElement() and endElement() methods which get called when an element starts and ends respectively.

make_parser() function creates a a SAX XMLReader object.

parser = xml.sax.make_parser()

Then set the contenthandler to user-defined class subclassed from SAX.ContentHandler.

Handler = MovieHandler()
parser.setContentHandler( Handler )

Now you can use above parser object to parse an XML file.

parser.parse('myfile.xml')

Following method creates a SAX parser and uses it to parse a document.

xml.sax.parse( xmlfile, contenthandler[, errorhandler])

The parseString() method to create a SAX parser and to parse the specified XML string.

xml.sax.parseString(xmlstring,contenthandler[,errorhandler])

Comparison of DOM vs SAX

SAX

you register callbacks for events and let the parser proceed through the document.
useful for large documents or of there are have memory limitations
parses the file as it reads it from disk
entire file is never stored in the memory.

DOM

entire file is read into the memory and stored in a hierarchical (tree-based) form to represent all the features of an XML document.
SAX not as fast as DOM, with large files.
DOM can kill resources if used on many small files.

SAX is read-only, while DOM allows changes to the XML file.

31-A Python - Multithreading

33-A Python - Socket Module

Your email address will not be published. Required fields are marked *

Comments

Eula

This is my first time here. I am truly impressed to read all this in one place.

Jaypee Dela Cruz

Thank you for your wonderful codes and website, you helped me a lot especially in this socket module. Thank you again!

lucky verma

Thank you for taking the time to share your knowledge about using python to find the path! Your insight and guidance is greatly appreciated.

Pre Engineered Metal Building

Usually I by no means touch upon blogs however your article is so convincing that I by no means prevent myself to mention it here.

Pre Engineered Metal Building

Usually, I never touch upon blogs; however, your article is so convincing that I could not prevent myself from mentioning how nice it is written.

View More Comments

Search

Python Tutorial

By KnowledgeHut .

Python Tutorial

Python - XML

ElementTree module

To create XML file using ElementTree:

To parse XML file:

To modify XML file

The DOM API

SAX API

Leave a Reply

Comments

Eula

Jaypee Dela Cruz

lucky verma

Pre Engineered Metal Building

Pre Engineered Metal Building

Suggested Tutorials

Swift Tutorial

By KnowledgeHut .

R Programming Tutorial

By KnowledgeHut .

C# Tutorial

By KnowledgeHut .