Tuesday, November 16, 2010

Episode #121: Naughty Characters

Hal has friends in low places:

This week's Episode comes to us courtesy of one our loyal readers who had a bit of a misadventure with vi. The intended keyboard sequence was ":w^C<Enter>", aka "save the file, oh wait nevermind". Unfortunately, there was a bit of a fumble on the ^C and the command that actually got entered was ":w^X<Enter>", aka "save the file as '^X'". Whoops! My friend Jim always says that "experience is what you get when you don't get what you want." Our loyal reader was about to get a whole bunch of experience.

Even listing a file called ^X can be problematic. On Linux and BSD, non-printable characters are represented as a "?" in the output of ls. But on older, proprietary Unix systems like Solaris these characters will be output as-is, leading to weird output like this:

$ ls -l
total 2
-rw-r--r-- 1 hal staff 7 Nov 12 17:28

Wow, that's spectacularly unhelpful.

The GNU version of ls has the -b command switch that will display non-printable characters in octal:

$ ls -lb
total 4
-rw-r--r-- 1 hal hal 7 2010-11-12 14:18 \030

On other architectures, this trick works well:

$ ls -l | cat -v
total 2
-rw-r--r-- 1 hpomer staff 7 Nov 12 17:28 ^X

"cat -v" causes the control characters to be displayed with the "^X" notation.

Great, we can see the characters now, but how do we remove the file? This works:

$ rm $(echo -e \\030)
$ ls -l
total 0

Here we're using "echo -e" to output the literal control sequence using the octal value. We then use the output of the echo command as the argument to rm. Voila! No more file.

Our loyal reader sent in an alternate solution, which is the "classic" way of solving this problem:

$ ls -lbi
total 0
918831 -rw-r--r-- 1 hal hal 0 2010-11-12 14:36 \030
$ find . -inum 918831 -exec rm {} \;
$ ls -l
total 0

The trick is to use "ls -i" to dump out the inode number associated with the file. Then we can use "find . -inum ... -exec rm {} \;" to "find" the file and remove it. Actually, the solution we received was to use "... -exec mv {} temp \;" instead of rm-- that way you can easily review the contents of the file before deciding to remove it. That's probably safer.

Besides files containing non-printable characters, there are other file names that can ruin your day. For example, having a file whose name starts with a dash can be a problem:

$ ls
$ rm -i
rm: missing operand
Try `rm --help' for more information.

Whoops! The rm command is interpreting the file name as a command-line switch!

There are actually several ways of removing these kinds of files. The "find . -inum ..." trick works here, of course. Another approach is:

$ rm -- -i

For most Unix commands these days the "--" tells commands to stop processing arguments and treat everything else on the command line as a file name. But there's actually a more terse solution that doesn't require the command to support "--":

$ touch ./-i
$ rm ./-i

"./-i" means "the file called -i in the current directory", and the advantage to specifying the file name this way is that the leading "./" means that the command no longer sees the file name as a command-line switch with a leading dash.

So there you go: a walk on the wild side with some weird Unix file names. I wonder if Tim has any problem files he has to deal with on the Windows side?

Tim works with those who shall not be named:

Oh silly Hal, surely you know the tremendous problems I have...I mean with files.

The problem in Windows isn't so much with characters, as it is with certain names. Windows doesn't have wild characters, but it does have some names that shall not be spoken. The names include: CON, PRN, AUX, NUL, COM1..COM9, and LPT1...LPT9. These names date from back in the DOS days, and represented devices like the console, printer, auxiliary device, null bucket, serial port, and parallel port. Since Windows recognizes these as devices, you can't easily create files or directories with the same names. Here is what happens if you try:

C:\> mkdir con
The directory name is invalid.

And if you try to redirect output to one of these files you will see no file created.

C:\> echo "stuff" > con
C:\> dir con*
Volume in drive C has no label.
Volume Serial Number is ED15-DEAD

Directory of C:\

File Not Found

See, no file.

To create a file or directory with one of these special names, we have to prefix the path with \\?\. This prefix tells the Windows API to disable string parsing. The "\\.\" prefix is similar and will access the Win32 device namespace instead of the Win32 file namespace. This is how access to physical disks and volumes is accomplished directly, without going through the file system. (reference).

In layman's terms, use one of these options to create a directory.

C:\> mkdir \\.\c:\con
C:\> mkdir \\?\c:\con

Same goes for files:

C:\> echo "some text" > \\.\c:\con
C:\> echo "some text" > \\?\c:\con

Note, you have to use the full file path to create the file. So if you want to create a file in the system32 directory you need to do this:

C:\Windows\System32> echo "some text" > \\.\c:\Windows\System32\con

Just because you can create the file, doesn't mean it will work well. Some of the API's don't support the prefix, so don't be surprise if an app crashes when it tries to access one of these files.

As for PowerShell, well, I can't see a way to create a file or directory. It always returns an error such as this:

PS C:\> mkdir -Path \\.\c:\con

New-Item : The given path's format is not supported.
At line:38 char:24
+ $scriptCmd = {& <<<< $wrappedCmd -Type Directory @PSBoundParameters }
+ CategoryInfo : InvalidOperation: (\\.\c:\con:String) [New-Item], NotSupportedException
+ FullyQualifiedErrorId : ItemExistsNotSupportedError,Microsoft.PowerShell.Commands.NewItemCommand

If you can figure out how to do it in PowerShell (without using .NET), let me know.