Tuesday, September 28, 2010

Episode #114: Pod People

Tim pumps up the jam:

I enjoy listening to podcasts, and I listen to a lot of them. On Windows I use Juice to download my podcasts. It downloads all the files to a directory of my choosing and creates a subdirectory for each podcast.

Since I have so many mp3s to listen to, I speed them up and normalize the volume. To make use these outside tools easier to use I move all of the mp3s to one directory. This where I break out some fu.

I first used cmd.exe for this task since PowerShell wasn't out yet.

C:\> for /f "tokens=*" %i in ('dir C:\Podcasts /b /s ^| find ".mp3" ^|
find /v "!ToFix"') do @move /Y "%i" "C:\Podcasts"\!ToFix\"
We use our friend the For loop to get each file. The tokens options will prevent any delimiters from being used so our variable will contain the full path and won't be broken by spaces. Inside the For loop we get a directory listing and use a recursive (/s) and bare format (/b) so we get the full path and no headers. The first Find command filters for files containing ".mp3", and the second Find filters out any mp3 file that contains "!ToFix" in the path since it is already in the correct location.

At this point the variable %i contains the full path to the file. We then use the Move command to move the file to our directory. I use the /Y option with the Move command to force overwriting of files with the same name. I only do this because Juice sometimes will download an old episode and I don't want to manually confirm each overwrite.

Now, how do we do the same thing in PowerShell?

PS C:\> Get-ChildItem -Recurse \podcasts -Include *.mp3 -Exclude !ToFix |
Move-Item -Destination C:\temp\podcasts\!ToFix -Force
Shortened by using aliases and shortened parameter names:

PS C:\> ls -r \podcasts -In *.mp3 -Ex "!ToFix" |
move -Dest C:\temp\podcasts\!ToFix -Force
We use Get-ChildItem with the recurse option to get all the objects in the podcast directory. The Include option filters for files that contain ".mp3" and the Exclude option removes the !ToFix directory. The results are piped into Move-Item where the destination directory is specified. The Force option is used to force overwriting.

Now I'm ready to nerd out with my podcasts. Hal is a nerd too, but I wonder if he listens to any podcasts.

Hal gets jiggy with it:

No I don't listen to podcasts. And you damn kids stay off my lawn!

Here's the first solution I rattled off for this week's challenge:

find podcasts -name \*.mp3 | grep -v /ToFix/ | xargs -I{} cp {} podcasts/ToFix
This one's pretty straightforward: find all files named *.mp3, filter out pathnames containing the "ToFix" target directory, and copy all other files into the target dir via "xargs ... cp".

We have to use "-I{}" here because the cp command will be terminated by the target directory path. The path names we're feeding in on the standard input are the middle arguments to the command, and we use the "{}" to place them appropriately in the command line. One bonus to using "-I" is that it forces xargs to treat each line of input as a single argument and ignore spaces in file names, so we don't need to worry about escaping or quoting our arguments or messing around with "find ... -print0".

But since we just covered "find ... -prune" in Episode 112, you may be wondering why I used "grep -v" above when I could just filter out the "ToFix" directory within the find command itself:

find podcasts \( -name \*.mp3 \) -o \( -name ToFix -prune -false \) | 
xargs -I{} cp {} podcasts/ToFix
Comparing the two commands, the question sort of answers itself, doesn't it? Personally, I think the version with grep is a whole lot clearer. It's certainly easier to type. So while prunes are good for grumpy old men like me, that doesn't mean I have to use them all the time.

Are you kids still on my lawn?

Hal follows up

Woo-wee! We sure got a lot of feedback after this Episode!

Davide Brini wrote in to point out that newer versions of find have a "-path" option that can be used to simplify things considerably:

find podcasts -name \*.mp3 ! -path podcasts/ToFix/\* | xargs -I{} cp {} podcasts/ToFix
Davide also points out that if you live in the Linux world or at least in a world where you have the GNU fileutils installed everywhere, you can do things like this:

find podcasts -name \*.mp3 ! -path podcasts/ToFix/\* -exec cp -t podcasts/ToFix {} +
First, note the use of "cp -t ...", which allows me to specify the target directory as an initial argument. That means the remaining arguments are just the files that need to be copied, so I don't need to muck around with the tedious "-I{}" construct in xargs.

But speaking of xargs, the GNU version of find has a "-exec ... +" construct that basically does the same thing as xargs does. Rather than executing the specified command once for each file found, the "-exec ... +" option will aggregate multiple found files to reduce the number of executions.

In a separate email, Ole Tange wrote in to point out that my solutions fail miserably if any of the files we're copying contain quotes in the file names. Ole suggests using either the "-print0" option to GNU find or GNU Parallel, which properly handles input lines containing quotes:

find podcasts -name \*.mp3 | grep -v /ToFix/ | parallel -X cp {} podcasts/ToFix
Davide's final solution using "-exec ... +" also properly deals with the "file names containing quotes" issue.

So the bottom line is that if you live in a GNU software environment, then these problems can all be solved quite easily. But I wondered how I'd solve all of these issues without the shiny GNU tools. Here's the most robust solution I could come up with:

find podcasts -name \*.mp3 | grep -v ^podcasts/ToFix/ |
while read file; do n=`basename "$file"`; cp "$file" "podcasts/ToFix/$n"; done

Not pretty, but it works.