The Sysadmin Notebook  

Sitemap

Perl XML

Processing XML data with Perl

Contents

XML data is stored in text files, and Perl excels as a text processing language. Of course it excels at plenty of other things too, but a quick cpan search for XML will return a few thousand modules.

XML::Simple

Top Bottom

XML::Simple can be used to translate between XML data and Perl data structures. Given the following XML data file:

<commands>
  <command>
    <name>ls</name>
    <description>list directory contents</description>
  </command>
  <command>
    <name>ps</name>
    <description>report a snapshot of current process</description>
  </command>
  <command>
    <name>df</name>
    <description>report file system disk space usage</description>
  </command>
</commands>

the following script can be used to read the file into a hash for later output:

use XML::Simple;

my $file = 'commands.xml';
my $xmlObj = XML::Simple->new();

my $document = $xmlObj->XMLin($file);

foreach my $key (keys (%{$document->{command}})) {
	print '(' . $key . ')';
	print $document->{command}->{$key}->{'description'} . "\n";
}

The print statements produce the following:

(df)report file system disk space usage
(ls)list directory contents
(ps)report a snapshot of current process

The process can be reversed, turning a Perl hash into an XML data file. A key by default represents an element name, and the attribute-value pairs are represented by a hash reference. Repeating elements have their value set to an anonymous array of hash references:

use XML::Simple;

my %commands = (command => [
	{name => 'df', 
		description => 'report file system disk space usage'},
	{name => 'ls',  description => 'list directory contents'},
	{name => 'ps', 
		description => 'report a snapshot of current processes'},]
	);
my $xmlObj = XML::Simple->new(RootName => 'commands');

print $xmlObj->XMLout(\%commands);

Which produces:

<commands>
  <command name="df" description="report file system disk space usage" />
  <command name="ls" description="list directory contents" />
  <command name="ps" description="report a snapshot of current processes" />
</commands>

To have the attributes parsed into elements we can set the 'noattr' property on the XML::Simple object. We can also declare a string value for the 'xmldecl' property to get an xml declaration in the output.

use XML::Simple;

my %commands = (command => [
	{name => 'df', 
		description => 'report file system disk space usage'},
	{name => 'ls',  description => 'list directory contents'},
	{name => 'ps', 
		description => 'report a snapshot of current processes'},]
	);
my $xmlObj = XML::Simple->new(RootName => 'commands');

print $xmlObj->XMLout(\%commands, noattr => 1, xmldecl => '<?xml version="1.0">');

This produces the following output:

<?xml version="1.0">
<commands>
  <command>
    <name>df</name>
    <description>report file system disk space usage</description>
  </command>
  <command>
    <name>ls</name>
    <description>list directory contents</description>
  </command>
  <command>
    <name>ps</name>
    <description>report a snapshot of current processes</description>
  </command>
</commands>

XML::Writer

Top Bottom

XML::Writer gives us another technique for writing XML data files:

use XML::Writer;

my %commands = (df => {description => 'report file system disk space usage'},
	ls => {description => 'list directory contents'},
	ps => {description => 'report a snapshot of current processes'},
	);
my $xmlObj = XML::Writer->new(DATA_MODE => 1, DATA_INDENT => 1);

$xmlObj->xmlDecl("UTF-8");
$xmlObj->startTag('commands');

foreach my $command (keys %commands) {
	$xmlObj->startTag("command");
	$xmlObj->startTag("name");
	$xmlObj->characters($command);
	$xmlObj->endTag('name');
	$xmlObj->startTag("description");
	$xmlObj->characters($commands{$command}->{description});
	$xmlObj->endTag("description");
	$xmlObj->endTag("command");
}
$xmlObj->endTag('commands');

$xmlObj->end();

The 'DATA_MODE' property causes a new line to be added the end of each elements closing tag. Note the 'UTF-8' argument in the 'xmlDecl' method. This produces the following output:

<?xml version="1.0" encoding="UTF-8"?>

<commands>
 <command>
  <name>df</name>
  <description>report file system disk space usage</description>
 </command>
 <command>
  <name>ls</name>
  <description>list directory contents</description>
 </command>
 <command>
  <name>ps</name>
  <description>report a snapshot of current processes</description>
 </command>
</commands>

XML::Parser

Top Bottom

XML::Parser is a good starting point for processing XML with Perl. Use the new() method to create an XML::Parser instance and call the setHandlers() method to define how to handle start-tags, end-tags, PCDATA, processing instructions, etc.

use strict;
use warnings;
use XML::Parser;

my $parser = XML::Parser->new();
$parser->setHandlers(	Start	=> \&start,
			End	=> \&end,
			Char	=> \&cdata,
			Proc	=> \&pih,
		);
$parser->parsefile('filename.xml');

XML::Parser can be used to determine if your XML is well-formed. Don't expect any output if it is well-formed:

perl -MXML::Parser -e 'XML::Parser->new->parsefile("books.xml")'

You could place this in a script to check multiple files in your XML folders. But XML::Parser can do a whole lot more. By using the setHandlers method, you can transform XML data to other formats. Let's look at an example that can convert XML to XHTML. The following XML file contains some information on three books about Perl:

<?xml version="1.0" ?>
<books info="Some Interesting Perl Books">
  <book>
    <title>Perl Cookbook</title>
    <author>Tom Christiansen, Nathan Torkington</author>
    <publisher>O'Reilly</publisher>
    <rating>10</rating>
  </book>
  <book>
    <title>Perl Best Practices</title>
    <author>Damian Conway</author>
    <publisher>O'Reilly</publisher>
    <rating><?perl int(rand(10)) ?></rating>
  </book>
  <book>
    <title>Programming Perl</title>
    <author>Larry Wall, Tom Christiansen, Jon Orwant</author>
    <publisher>O'Reilly</publisher>
    <rating>9</rating>
  </book>
</books>

We can use XML::Parser to convert the XML file to HTML:

#!/usr/bin/perl
use strict;
use warnings;
use XML::Parser;

my $parser = XML::Parser->new();
$parser->setHandlers(	Start 	=> \&start,
			End	=> \&end,
			Char	=> \&char,
			Proc	=> \&proc,
		);
my $header = &getXHTMLHeader();
print $header;
$parser->parsefile('books.xml');

my $currentTag = "";

sub start() {
	my ($parser, $name, %attr) = @_;
	$currentTag = lc($name);
	if ($currentTag eq 'books') {
		print "<head><title>". $attr{'info'} . "</title></head>";
		print "<body><h2>" . $attr{info} . "</h2>";
		print '<table summary="' . $attr{info} . '"><tr><th>Title</th><th>Author</th><th>Publisher</th><th>Rating</th></tr>';
	}
	elsif ($currentTag eq 'book') {
		print "<tr>";
	}
	else {
		print "<td>";
	}
}
sub end() {
	my ($parser, $name, %attr) = @_;
	$currentTag = lc($name);
	if ($currentTag eq 'books') {
		print "</table></body></html>";
	}
	elsif ($currentTag eq 'book') {
		print "</tr>";
	}
	else {
		print "</td>";
	}
}
sub char() {
	my ($parser, $data) = @_;
	print $data;
}
sub proc() {
	my ($parser, $target, $data) = @_;
	if (lc($target) eq 'perl') {
		$data = eval($data);
		print $data;
	}
}
sub getXHTMLHeader() {
	my $header = '<?xml version="1.0" encoding="UTF-8" ?>
	<!DOCTYPE html
	  PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
	  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
	  <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">';
	return $header;
}

The result of running this script can be seen here. Of course you don't have to produce XHTML. You should be able to use XML::Parser in conjunction with other perl modules to produce whatever takes your fancy.

XML::XSLT

Top Bottom

Using the same xml:

<?xml version="1.0" ?>
<books info="Some Interesting Perl Books">
  <book>
    <title>Perl Cookbook</title>
    <author>Tom Christiansen, Nathan Torkington</author>
    <publisher>O'Reilly</publisher>
    <rating>10</rating>
  </book>
  <book>
    <title>Perl Best Practices</title>
    <author>Damian Conway</author>
    <publisher>O'Reilly</publisher>
    <rating><?perl int(rand(10)) ?></rating>
  </book>
  <book>
    <title>Programming Perl</title>
    <author>Larry Wall, Tom Christiansen, Jon Orwant</author>
    <publisher>O'Reilly</publisher>
    <rating>9</rating>
  </book>
</books>

And an XSLT stylesheet:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0">
<xsl:template match="/books">
  <html>
  <head>
    <title><xsl:value-of select="@info" /></title>
  </head>
  <body>
    <h3><xsl:value-of select="@info" /></h3>
    <table summary="yet to be implemented">
	<tr><th>Title</th><th>Author</th><th>Publisher</th><th>Rating</th></tr>
	<xsl:apply-templates select="book" />
    </table>
  </body>
</html>
</xsl:template>

<xsl:template match="book">
<tr>
  <td><xsl:value-of select="title" /></td>
  <td><xsl:value-of select="author" /></td>
  <td><xsl:value-of select="publisher" /></td>
  <td><xsl:value-of select="rating" /></td>
</tr>
</xsl:template>

</xsl:stylesheet>

We can use the following perl script:

#!/usr/bin/perl
use strict;
use warnings;

use XML::XSLT;

my $xslfile = 'books.xslt';
my $xmlfile = 'books.xml';

my $xslt = eval {XML::XSLT->new($xslfile)};
if ($@) {
	die("Sorry could not create XSLT instance:\n", $@);
}

print $xslt->serve(Source=> $xmlfile, http_headers => 0, 
	xml_declaration => 0), "\n";

to produce the html to be found here. Alternatively we can use the transform method to apply the stylesheet:

#!/usr/bin/perl
use strict;
use warnings;

use XML::XSLT;

my $xslfile = 'books.xslt';
my $xmlfile = 'books.xml';

my $xslt = eval {XML::XSLT->new($xslfile)};
if ($@) {
	die("Sorry could not create XSLT instance:\n", $@);
}
$xslt->transform($xmlfile);
print $xslt->toString;

Unsurprisingly this produces the html found here