Tuesday, July 12, 2011

Episode #153: I'll Have What She's Having

Tim gets off easy

This week friend of the blog Jeff "I am the Hammer" Haemer writes in looking for a solution to this problem:

Given two directories with the same file and directory names, but with different file contents and perms/ownerships, copy the perms and ownerships from the "master" directory to the new dir while preserving the content that's different in the new directory (i.e., copy the ownerships and perms without overwriting any files).

Jeff, the *nix ninja, set me up for a super easy week. Upon receiving this email I quickly checked out robocopy's help.

C:\> robocopy /?

...
/COPY:copyflag[s] :: what to COPY for files (default is /COPY:DAT).
(copyflags : D=Data, A=Attributes, T=Timestamps).
(S=Security=NTFS ACLs, O=Owner info, U=aUditing info).
...


Bingo! Looks like we can use the S, O and A flags to copy our data. But let's double check.

To test it I created a two directories, dir1 and dir2 that looked like this.

C:\> tree /f
Folder PATH listing
Volume serial number is DEAD-BEEF

C:.
|
+---dir1
| aaaa.txt
| bbbb.txt
| cccc.txt
|
+---dir2
aaaa.txt
bbbb.txt


The content of each file is different. Also, I added permissions for "Anonymous Login" to aaaa.txt and I completely changed the permissions of bbbb.txt. The file cccc.txt was left with the default permissions.

C:\dir1> cacls *
C:\dir1\aaaa.txt NT AUTHORITY\ANONYMOUS LOGON:R
BUILTIN\Administrators:(ID)F
NT AUTHORITY\SYSTEM:(ID)F
BUILTIN\Users:(ID)R
NT AUTHORITY\Authenticated Users:(ID)C

C:\dir1\bbbb.txt NT AUTHORITY\SYSTEM:F
FNSCORP\tm:F

C:\dir1\cccc.txt BUILTIN\Administrators:(ID)F
NT AUTHORITY\SYSTEM:(ID)F
BUILTIN\Users:(ID)R
NT AUTHORITY\Authenticated Users:(ID)C


All the files in dir2 look like this:

C:\dir2> cacls *
C:\dir2\????.txt NT AUTHORITY\ANONYMOUS LOGON:R
BUILTIN\Administrators:(ID)F
NT AUTHORITY\SYSTEM:(ID)F
BUILTIN\Users:(ID)R
NT AUTHORITY\Authenticated Users:(ID)C


Now we run our command.

PS C:\> robocopy c:\dir1 c:\dir2 /COPY:ASO


...and check the permissions:

C:\dir2> cacls *
C:\dir2\aaaa.txt NT AUTHORITY\ANONYMOUS LOGON:R
BUILTIN\Administrators:(ID)F
NT AUTHORITY\SYSTEM:(ID)F
BUILTIN\Users:(ID)R
NT AUTHORITY\Authenticated Users:(ID)C

C:\dir2\bbbb.txt NT AUTHORITY\SYSTEM:F
FNSCORP\tm:F


Permissions match, the file content hasn't changed, and the additional file wasn't copied. That was super easy. Even better, it works in PowerShell and CMD. Unfortunately, it doesn't work in *nix land, so Hal has got some work ahead of him. Hal, get to it.

Hal gets screwed

I'm considering revoking Jeff's "friend of the blog" status for setting me up for failure on this one. Unfortunately, rsync/tar/cpio/etc don't have an option like robocpy does for copying permissions and ownerships but not content. So we're left with cobbling together our own command line madness.

But it turns out that getting the file permissions in a usable format is a difficult thing to do in a portable fashion. It's no problem on Linux or BSD, where we have the stat command. Here's the Linux solution:

# cd /your/source/dir
# find * -print0 | xargs -0 stat -c '%a %u %g %n' |
while read perms user group file; do
chown $user:$group "/path/to/target/dir/$file";
chmod $perms "/path/to/target/dir/$file";
done

Essentially I'm using "find * -print0 | xargs -0 stat -c '%a %u %g %n'" as a fill-in for the "ls" command to get my file info in the form that I need it. For each file the stat format I'm specifying with "-c" will give me the permissions in octal, the numeric UID and GID, and the file name. From there it's just a matter of using this data appropriately inside the while loop to do the chown and chmod.

Notice that I'm being careful to use "-print0" and "xargs -0" so that we handle files with spaces properly. The "read" statement at the top of the while loop will schlurp up everything after the GID as the file name, so that takes care of the spaces in file names problem there. However, inside the loop we need to be careful with our quoting so things work out OK.

The BSD version of our command is nearly identical except for the stat command. On BSD the correct stat command to plug into xargs is "stat -f '%Mp%Lp %u %g %N'". The permissions bits on BSD are returned with "%p", but unfortunately "%p" includes the file type as part of the octal sequence, so you get unhelpful output like "100644" for regular files, "40755" for directories, etc. The "%Mp%Lp" sequence means to output just the suid/sgid/sticky bits ("%Mp", the middle permissions bits) and the normal r/w/x info ("%Lp", the lower permissions bits). By the way, notice that the BSD stat command also uses "-f" for the format option and "%N" for the file name, while Linux uses "-c" and "%n" respectively.

But what about other Unix flavors that don't have a built-in stat command? It would be against the rules of the blog to even suggest you go download and install the GNU coreutils package. So the only thing I can think of doing is using "ls -ln" instead of stat to get the information about each file. The problem is parsing the permissions bits and converting them into octal notation to use with chmod. Remember that such a conversion routine would need to deal with things like "rwxrw-r--", "r-sr-s--x", and "rwxrwxrwt" (to say nothing of crazy corner cases like "S" and "l"). That's almost certainly going to turn into a script. In fact, you're probably better off just coding the who thing in Perl or Python in the first place so you can just call stat() on the files directly.

So we've got a decent solution for Linux and BSD, but a trip to Scriptistan on all other platforms. That's a bummer. I think I'll make myself feel better by watching one of my favorite movie scenes (as a bit of movie trivia, that's Director Rob Reiner's mom delivering the punch line at the end of the scene).

Tuesday, July 5, 2011

Episode #152: Follow the Bouncing Link

Hal's hot on the trail

I'm a fan of the Debian "Alternatives" system. It's an elegant way of configuring which text editor, mail server, installation of Java, etc should be the default version used on the system. The only downside is that it can sometimes be non-obvious where the actual executable resides because of all of the symlinks involved. And there are plenty of other situations where the actual file you're looking for might be on the other end of a long chain of links.

In the cases of Debian's Alternatives, the update-alternatives command-line interface lets you query to find out information about the final executable path name at the end of the chain of links:

$ update-alternatives --list java
/usr/lib/jvm/java-6-openjdk/jre/bin/java

Instead of "--list", you can also use "--display" to get even more detailed information:

$ update-alternatives --display java
java - auto mode
link currently points to /usr/lib/jvm/java-6-openjdk/jre/bin/java
/usr/lib/jvm/java-6-openjdk/jre/bin/java - priority 1061
slave java.1.gz: /usr/lib/jvm/java-6-openjdk/jre/man/man1/java.1.gz
Current `best' version is /usr/lib/jvm/java-6-openjdk/jre/bin/java.

But what about cases where you have a chain of symlinks that aren't part of the alternatives system? "Friend of the Blog" Jeff Haemer sent along this nice little snippet of Fu:

$ readlink -f $(which java)
/usr/lib/jvm/java-6-openjdk/jre/bin/java

"which java" returns the cached executable path from our search path. Normally I'd prefer using the "type" command instead of "which" because type will even tell you if the command you're executing is actually an alias. But in this case, "which" is better because it simply returns the executable path while "type" adds some extra text:

$ type java
java is /usr/bin/java
$ which java
/usr/bin/java

We take the executable path returned by "which" and use it as an argument to "readlink -f". "readlink" will tell you what file a given link points to, and "-f" will follow an entire chain of symlinks and tell you the final path that the last link in the chain points to.

There's only one small problem: "readlink" is part of the GNU coreutils package and may not be available on all flavors of Unix. I can do a simple version of "readlink" that only dereferences a single layer of symlinks with the following Fu:

$ ls -l $(which java) | awk '/->/ {print $NF}'
/etc/alternatives/java

The "ls" output about our symlink is going to have a "->" symbol followed by the path that the link points to, which will also be the last thing on the line. So I use awk to match the "->" and then print out the last field, aka "$NF".

To emulate "readlink -f", I'm going to need a loop:

$ exec=$(which java)
$ while [[ -L $exec ]]; do exec=$(ls -l $exec | awk '/->/ {print $NF}'); done
$ echo $exec
/usr/lib/jvm/java-6-openjdk/jre/bin/java

First I set the variable "$exec" to be the path name returned by "which java". Then as long as my "$exec" path is a symlink ("[[ -L $exec ]]") I use my "readlink" stand-in to set the new value of "$exec" to be the path that the link points to. Eventually the last link in the chain will point to an actual file and the loop will terminate. At that point, I just output the last value of "$exec".

With a little more Fu, we can actually see each step of the process:

$ exec=$(which java)
$ while [[ -L $exec ]]; do
link=$(ls -l $exec | awk '/->/ {print $NF}');
echo $exec points to $link;
exec=$link;
done

/usr/bin/java points to /etc/alternatives/java
/etc/alternatives/java points to /usr/lib/jvm/java-6-openjdk/jre/bin/java

The loop is essentially the same. I've just added an extra variable so I can print out both the link name in "$exec" and the thing it points to in "$link". The extra output may be useful if you're ever trying to debug what's going wrong with a long chain of symlinks.

Now I know that Windows doesn't have anything like the Alternatives system in Debian, but maybe Tim has some command-line magic up his sleeve for decoding all of those nasty shortcut files that users like to set up?

Tim is trailing

We don't have all that fancy mumbo-jumbo on the Windows side. We pretty much just have Links (a.k.a. Shortcuts). These Link files have the extension "lnk". When viewing them via the GUI you won't see the ".lnk" extension, but via the command line you do.

PS C:\Users\tim\Desktop> ls *.lnk 

Directory: C:\Users\tim\Desktop

Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 7/4/2011 07:04 PM 893 TweetDeck.lnk


If you use Get-Content (alias gc, cat, type) to view the file it is mostly unprintable characters, but it does contain the path in the file. We don't have a command similar to the *nix "strings" command so we can't extract it that way. However, we can use some weird stuff built-in in tools to extract the path from the file.

PS C:\> $s = New-Object -ComObject WScript.Shell
PS C:\> $s.CreateShortcut('C:\Users\tim\Desktop\TweetDeck.lnk').TargetPath
C:\Program Files\TweetDeck\TweetDeck.exe


Yep, its really that ugly.

We start off by creating a new Windows Shell Object. Once we create the object we use the CreateShortcut method to open an existing shortcut. At first glance you might think we are creating a shortcut, but we aren't. The method's name is a bit of a misnomer as it can be used to create or open an existing shortcut.

Once the shortcut is open, we then output the value of the TargetPath property. One additional point of weirdness, the path given to CreateShortcut must be the full path. If you give it a relative path, including just the filename, it returns nothing since it can't find the shortcut.

I wouldn't exactly call it magic, but it does work.