Tuesday, April 12, 2011

Episode #142: XML in Shell

Hal gets email from an old friend

We got another email from Ryan Tuthill. We'd been counting the days until his next missive, and suddenly, like the promise of Spring, there it was in our inboxes.

Ryan had been doing a little web surfing and had come across this article for creating a rotating slide show of images on your desktop in Ubuntu. Now I have to admit that my first thought upon reading that article was, "Holy carp! Why didn't they just set up a cron job to change the image on a regular basis instead of mucking around with all that XML nonsense?" But Ryan's question was perhaps more interesting: "Why are they entering all that XML by hand when there ought to be a command line which could generate it for you?"

So let's look at the problem Ryan has set for us. Aside from a header and a footer which we can generate by hand, we need to create a series of XML blocks like this:

<static>
<duration>1795.0</duration>
<file>/some/path/to/img1.png</file>
</static>

<transition>
<duration>5.0</duration>
<from>/some/path/to/img1.png</from>
<to>/some/path/to/img2.png</to>
</transition>

The values that are changing are the image file names in each block.

So it seems we need a list of files and a loop. I think I'll take a cue from our last Episode and just generate my list of files like this:

$ find ~/Documents/My\ Pictures -type f | egrep '\.(gif|jpg|png)$'
/home/hal/Documents/My Pictures/aus-pics/2007-12-06/2007 12 06 Australia 001.jpg
/home/hal/Documents/My Pictures/aus-pics/2007-12-06/2007 12 06 Australia 012.jpg
/home/hal/Documents/My Pictures/aus-pics/2007-12-06/2007 12 06 Australia 006.jpg
,,,

Now I could do an output substitution here with "$(...)", but I'm worried about the size of my input, so I think I'll just pipe things into a nice little while loop:

$ find ~/Documents/My\ Pictures -type f | egrep '\.(gif|jpg|png)$' |
while read file; do
echo "<transition><duration>5.0</duration><from>$prev</from><to>$file</to></transition>";
echo "<static><duration>1795.0</duration><file>$file</file></static>";
prev=$file;
done

<transition><duration>5.0</duration><from></from><to>...001.jpg</to></transition>
<static><duration>1795.0</duration><file>...001.jpg</file></static>
<transition><duration>5.0</duration><from>...001.jpg</from><to>...012.jpg</to></transition>
<static><duration>1795.0</duration><file>...012.jpg</file></static>
<transition><duration>5.0</duration><from>...012.jpg</from><to>...006.jpg</to></transition>
<static><duration>1795.0</duration><file>...006.jpg</file></static>
...

Hey, the XML doesn't have to be beautiful, it just has to be correct.

What my loop does here is track two variables: $file which is the next picture we'll be doing a <transition> block to, and $prev which is the last picture I did a <static> block for. But I cheat a little bit on the first time through the loop. I know I'm not going to have $prev set at this point, so my first <transition> block is going to be erroneous. But I go ahead and dump it out anyway, and follow it up with a (correct) <static> block for the first file name. After I've "primed the pump" so to speak, the <transition> and <static> blocks for the rest of the input will be just fine.

So the first line of output is junk. But we can get rid of the junk line easily with a little post-processing:

$ find ~/Documents/My\ Pictures -type f | egrep '\.(gif|jpg|png)$' |
while read file; do
echo "<transition><duration>5.0</duration><from>$prev</from><to>$file</to></transition>";
echo "<static><duration>1795.0</duration><file>$file</file></static>";
prev=$file;
done | tail -n +2 >slideshow.xml

$ head slideshow.xml
<static><duration>1795.0</duration><file>...001.jpg</file></static>
<transition><duration>5.0</duration><from>...001.jpg</from><to>...012.jpg</to></transition>
<static><duration>1795.0</duration><file>...012.jpg</file></static>
<transition><duration>5.0</duration><from>...012.jpg</from><to>...006.jpg</to></transition>
<static><duration>1795.0</duration><file>...006.jpg</file></static>
...

A little "tail -n +2" after the while loop to scrape off the first line and we're golden. And then all we have to do is redirect our output into a file. You can edit this file by hand and add the necessary additional XML tags per the original article.

So the trick here is to be willing to make a "mistake" at the beginning of your input in order to simplify your loop. You can always clean up after yourself later. And speaking of cleaning up after your own mistakes, here's Tim who's had plenty of practice in that area...

Tim never cleans up after himself

Hal's XML "handling" is just text with extra >'s and <'s thrown in. A few months ago I asked about doing some XML parsing, and his archaic shell couldn't handle it. So I'll play along and create some XML his way.

In brief, we take the same approach as Hal and start off by finding all of the images we want:

PS C:\Users\tm\Pictures> Get-ChildItem -Recurse -Include *.jpg,*.png,*.gif

C:\Users\tm\Pictures\aus-pics\2007-12-06\2007 12 06 Australia 001.jpg
C:\Users\tm\Pictures\aus-pics\2007-12-06\2007 12 06 Australia 002.jpg
C:\Users\tm\Pictures\aus-pics\2007-12-06\2007 12 06 Australia 003.jpg
...


Here is the same command, but using a Get-ChildItem alias and shortened parameter names:

PS C:\Users\tm\Pictures> ls -r -i *.jpg,*.png,*.gif


This gives us all our file objects. From here the quickest way to create our output is to do exactly what Hal did, generate the output, plus mistake, and then clean up.

PS C:\Users\tm\pictures> ls -r -i *.jpg,*.png,*.gif | % {
echo "<transition><duration>5.0</duration><from>$prev</from><to>$_</to></transition>";
echo "n<static><duration>1795.0</duration><file>$_</file></static>"; $prev = $_
} | select -Skip 1


We use Echo (alias of Write-Output) to output each line. The current pipeline object ($_) represents our iterator as we go through our ForEach-Object (alias %) loop. We then use Select-Object (alias select) with the Skip parameter to skip (duh) the first "mistake" line.

Outputting text is fine since we are creating the XML from scratch. Hal would be crying if we actually needed to manipulate or parse XML. We can treat XML as just another object and access the various elements and properties.

PS C:\> $xml = [xml](Get-Content file.xml)
PS C:\> $xml.objects.static
duration file
-------- ----
1795.0 2007 12 06 Australia 001.jpg
1795.0 2007 12 06 Australia 002.jpg
1795.0 2007 12 06 Australia 003.jpg


Oh, if only Hal and his shell from the 1900's could keep up.