Basic XML Parsing With Python and LXML
Recently I’ve been developing an API using python and Django for work, which uses XML responses to speak to clients. One of my goals for the client was to be able to easily parse the XML responses that the server sends, so that I could appropriately handle errors.
Fortunately, python has many tools for building and parsing XML. During my research, I tested several options, but found that the well supported library LXML was a perfect match for what I needed. Unfortunately, I had a hard time figuring this out, as examples to just parse XML content was lacking in the official tutorial, and there were no good resources online with code samples.
So let’s take a quick peek at a sample XML document, then we’ll analyze some simple LXML code to see how it works. Of course, before you can run any of these code samples, you’ll need to download and install LXML (there are packages available on most Linux systems already).
Here’s are two sample XML responses that our server may send to the clients:
<?xml version="1.0" encoding="utf-8"?> <response version="1.0"> <code>200</code> <id>50</id> </response>
<?xml version="1.0" encoding="utf-8"?> <response version="1.0"> <code>400</code> <errors> <error>This API call requires a HTTP POST.</error> </errors> </response>
The first XML document shows a successful response. The
code tag contains an
200 OK code, which means the operation succeeded, and the
contains an identification number which the client needs to store for later
Now, let’s quickly analyze the situation. We have an XML tag which will be
available whether the operation succeeds or fails, which is the
So, for the client to be able to distinguish between a success or failure, it
needs to first figure out what
code was returned.
Let’s write some
python-lxml code to quickly output the value of the
from lxml import etree response = 'some xml response here' try: doc = etree.XML(response.strip()) code = doc.findtext('code') print code except etree.XMLSyntaxError: print 'XML parsing error.'
In our code, we first build a XML document from our XML response string, then
findtext() method (provided by LXML) to retrieve the value of the
code tag. Run this example, play around with it, and you’ll see that you can
pull up any value you want from the XML document.
In my client code, the program flow looks something like:
from sys import exit from lxml import etree response = 'some xml response here' try: doc = etree.XML(response.strip()) code = doc.findtext('code') print code except etree.XMLSyntaxError: print 'XML parsing error.' exit(1) if code == '200': # store the id somewhere id = doc.findtext('id') else: # handle errors, do more stuff print doc.findtext('error') exit(1) exit(0)
This lets me perform error checking on the XML response to make sure that everything went smoothly. If something goes wrong during the API call, then I will know about it, and can perform other actions to fix any problems or report errors.
If you’d like to do more advanced XML parsing with LXML, read the official tutorial. It is very verbose, but contains excellent examples which clearly demonstrate how to both parse and generate XML.
So the next time you’re looking for a quick way to process XML, check out LXML.
PS: If you read this far, you might want to follow me on Mastodon or GitHub and subscribe via RSS or email below (I'll email you new articles when I publish them).