Tuesday, May 3, 2011

Episode #145: A Date to Copy

Tim checks the mail (again):

One of our readers writes in asking how to generate "a list of locations of doc files which are created or modified after month and year 01 /2011." He then wants to copy all those files to another directory. He specifically stated he wanted it to work on XP and without PowerShell.

Well, let me get on my soap box for a second. PowerShell v1 and v2 are available for XP. I can't recommend enough that people install PowerShell. It will make your Windows administration and command line experience easier and much less painful. Case in point, it took me a decent amount of time to write the cmd.exe portion of this episode, which includes all the trial and error. However, the PowerShell portion is so straight forward I wrote that portion off the top of my head. So if you are on a Windows box without PowerShell, go install it before continuing. Ok, enough ranting, back to our regularly scheduled programming...

One nice thing is that we are only looking for files based on year, so that makes our time parsing much easier in cmd.exe. If we had to parse based on month and year it would be exponentially harder with this antiquated shell.

We can use a number of the techniques taken from Episode #29. Specifically, the For loop with the ~f (full file path). We can use ~t with our variable to get the modified timestamp, but there isn't a modifier to give us the creation time, so we'll have to take a slightly different approach.

First, we start off with a For loop that will find all the .doc files on the system.

C:\> for /r %i in (*.doc) do @echo %~fi

C:\Doc\file1.doc
C:\Doc\file2.doc
...


We have all the .doc files, so now we take a giant leap to our final command:

C:\> for /r %i in (*.doc) do (@dir /tw %~fi | find "/2011" > NUL ||
@dir /tc %~fi | find "/2011" > NUL) && copy %~fi C:\DocsFrom2011


We use two Dir commands with different time (/t) options. The first, /tw, displays the Last Write Time. The second, /tc, displays the Creation Time. Each of these Dir commands is piped into a Find command that looks for the year portion of the date "/2011". All that is wrapped in a bit of magic logic that is used to determine if the file should be copied or not. The magic explained...

We use some Logical ORs (||) and a Logical And (&&) to do the magic. The cmd.exe shell uses short circuit logical operators, but what does that mean? In short, if we wanted to evaluate A && B the second expression (B) is only evaluated when the result is not fully determined by A. For example, given A && B, when A is false there is no reason to evaluate B since the Logical And will then be false regardless of the value of B. Enough of the Math/Logic lesson, back to our command...

The logic in our command reduces to: (Last Write Time Matches OR CreationDate Matches) AND copy. Given our short circuit logical operators, if neither the Creation or Last Write Times match, we don't do the copy. But if either matches, copy.

Now on to the much easier, PowerShell.

PowerShell

Not only is the PowerShell version is much easier, and much more robust, as it can actually compare dates. That means we could just as easily look for files created after any arbitrary date, not just January 1.

PS C:\> Get-ChildItem -Recurse -Include *.doc | Where-Object {
$_.LastWriteTime -ge "1/1/2011" -or $_.CreationTime -ge "1/1/2011" } |
Copy-Item -Destination C:\DocsFrom2011


The command can be shortened using built in aliases and shortened parameter names:

PS C:\> ls -r -i *.doc | ? { $_.LastWriteTime -ge "1/1/2011" -or $_.CreationTime -ge "1/1/2011" } |
cp -d C:\DocsFrom2011


This command does a recursive directory listing and pipes the results into a Where-Object filter. The filter only passes objects where the relevant timestamps are greater than, or equal to, Jan 1, 2011. All the objects that make it through the filter are sent to Copy-Item to be copied to our target folder.

The PowerShell version is included, even though this portion wasn't requested by our reader. I guess that makes me only half wanted. Since Hal's shell isn't supported on XP I guess he isn't wanted either, but here he is anyway.

Hal checks in

Not wanted? Don't go projecting your insecurities onto me now. Just repeat this daily affirmation, "CMD.EXE is good enough, and gosh darn it people just love XP!" Besides, you can always install Cygwin to help you over the rough spots.

This one really is similar to Episode #29-- we just have to do something with the file names once we pick them out. For that I think I'll bust out the cpio magic as I did back in Episode #115:

# touch -t 201101010000 /tmp/marker
# cd /some/source/dir
# find . -depth \( -name \*.doc -o -name \*.docx \) -newer /tmp/marker |
cpio -pd /path/to/target/dir

5854 blocks

First I use touch to create a file with the earliest date for files which we're interested in. I use this with "find ... -newer" to find all files that have been modified since this timestamp. Unfortunately traditional Unix file systems still don't track creation time on files (unless you've upgraded to EXT4), so last modified time is all we have to go on.

Our loyal reader specified that they were only interested in doc files, but I decided to make my life harder by looking for both the old style *.doc files and the newer *.docx format. Notice that the Windows guy in our little CommandLineKungFu partnership didn't think to look for both file extensions.

Notice that I'm also using the "-depth" option with find to work around any possible issues with read-only directory permissions when cpio is creating the parallel directory structure in the target directory. See the discussion in Episode #115 for more details.

So there you go, a solution to a new problem created by putting two earlier solutions together. And that's pretty much the Unix command-line religion anyway. And that's why Unix is cool. And, gosh darn it, people like it!