Tuesday, March 29, 2011

Episode #140: Folder Foulups

Tim checks the mailbag this time

Tom He-Who-Shall-Not-Be-Named "Doe" writes in:

We have a network share and while users do not have rights to create folders at the root of the drive, they have the ability to accidentally start dragging a copy of a root folder to some sub-location in a directory where they have modify rights, thus creating incomplete (and unnecessary) copies of root directories. I never get word of this (surprise!).

Can you put together a script that I can run periodically which will dynamically inventory the top-level directories (currently 28) on this network share and looking for sub-folders with matching names *** all the way down *** underneath those top-level dirs? Yes, I realize this could run for hours, but I will run it against a DR box that has DFSR on it, so I won't impact performance for the users.


He also included a sample directory structure:

DIR1
-----> FolderA
-----> FolderB
-----> FolderC
DIR2
----->FolderD
DIR3
----->DIR1
----------> FolderA
----------> FolderB
----------> FolderC
----->FolderE


Per our conversation with Tom I'm-on-the-run-from-the-mob Doe, all we need to do is match a subdirectory name with the name of one of the root directories. Here is the rather simple command to do what we need:

PS C:\MyShare> $a = ls | % {$_.Name}; ls -r * | 
? { $_.PSIsContainer -and $a -contains $_.Name } | select fullname


FullName
--------
C:\MyShare\DIR3\DIR1


The are two parts to our command. First, we load a variable ($a) with the names of each of our root directories. We have to use the ForEach-Object cmdlet to extract the name in a scalar, string format.

The second portion is our search. We start off with a recursive directory listing and send the objects into a Where-Object (alias ?) filter. Our filter ensures that we are only looking at directory objects (Containers). It also checks to see if the Name of the current object is contained in our collection of directory names. If both criteria are satisfied the results are passed down the pipeline. Finally, we output just the full path (FullName) of our object.

There is one tiny little nugget of information that I didn't mention, but if you looked really close you might have wondered why I added the asterisk with ls -r. Why? Well let's see what happens when we don't use it:

PS C:\MyShare> $a = ls | % {$_.Name}; ls -r | 
? { $_.PSIsContainer -and $a -contains $_.Name } | select fullname


FullName
--------
C:\MyShare\DIR1
C:\MyShare\DIR2
C:\MyShare\DIR3
C:\MyShare\DIR3\DIR1


For some reason the command "ls -r" includes the directory listing for the current directory but "ls -r *" does not. It is a cool little trick, but I have no idea why it works. Here is a better demonstration using the Compare-Object cmdlet:

PS C:\MyShare> Compare-Object (ls -r) (ls -r *)

InputObject SideIndicator
----------- -------------
DIR1 <=
DIR2 <=
DIR3 <=


Weird, but it works.

CMD.EXE

While we can't take the same approach with CMD.EXE, we can accomplish this task with the following command:

C:\MyShare> for /F "tokens=*" %a in ('dir /B /S /A:D') do
@for /F "tokens=*" %r in ('dir /B /A:D') do @echo %a| findstr "\\%r$"

C:\MyShare\DIR1
C:\MyShare\DIR2
C:\MyShare\DIR3
C:\MyShare\DIR3\DIR1


Our command wraps a For loop around a bare format (/B), recursive (/S) directory listing that only looks for directories (/A:D). The variable %a contains the full path of our directory. We then use another For loop to get the names (not full path) of the directories in our root folder. It should be noted that the bare format (/B) is different depending on whether the recursive option (/S) is used. The format is the full format when used with /S, and just the filename without /S.

We then use Echo to output %a (full directory path), and use Findstr to search for %r (a root folder name). The regular expression ensures the name is a full match, and that the match is at the end of the string. To do this we make sure that our name is between a backslash and the end of line ($). The backslash is the regular expression escape character so it must be escaped with another backslash.

Unfortunately, this command is going to be slooooooow. For each new directory we traverse, we will have to do a directory listing of our root directory. This means we will probably do thousands of extra directory listings that we don't need to do with PowerShell.

We can filter out our initial directories by adding an extra Findstr command to the end to filter out anything that is two levels deep. This additional filter looks like this:

 ... | findstr /V "C:\\MyShare\\[^\\]*$"
C:\MyShare\DIR3\DIR1


The /V only prints lines that do not contain a match. Our search string matches the first portion of our path ("C:\MyShare\"), then any character that isn't a backslash, and finally the end of line character. The means that C:\MyShare\Blah will be filtered out, but C:\MyShare\Blah\Blah will be displayed.

It may not be fast, but at least its [mostly] easy. Now, let's see if Hal fast and easy.

Hal wonders if Tim is projecting a bit

Personally I prefer to think of it as "efficient and morally flexible", but have it your own way. As far as the challenge goes, Unix makes this both fast and, well... easy is in the eye of the beholder I guess.

There is a simple find command for solving this challenge:

$ find */* -type d \( -name DIR1 -o -name DIR2 -o -name DIR3 \)
DIR3/DIR1

Notice that I'm doing "find */* ..." so that I start my search at the level below the top. Otherwise I'd get matches on the top-level directories themselves, and that wouldn't be helpful.

This command works great as long as there are only three directories and their names don't change. But Mystery Tom's challenge specifies that there are actually 28 directories, which is way more than I'd want to type by hand. So it would be cool if there was a way to create the list of "-name DIR1 -o -name DIR2 ..." automatically.

It's actually pretty straightforward to do this:

$ ls | tr \\n ' ' | sed 's/ $//; s/ / -o -name /g'
DIR1 -o -name DIR2 -o -name DIR3

I start with the list of top-level directories, which the ls command will output with newlines after each name. So I use tr to transform the newlines to spaces, which leaves me with a trailing space in place of the last newline. But since I'm feeding the whole thing into sed anyway, I first have sed remove the trailing space and then replace all of the remaining spaces with " -o -name ".

The only problem is that I have no "-name" at the beginning of my output, but I can fix that up when I do the output substitution into the original find command:

$ find */* -type d \( -name $(ls | tr \\n ' ' | sed 's/ $//; s/ / -o -name /g') \)
DIR3/DIR1

All I did here was take our ls pipeline and throw it inside "$(...)"-- the command output substitution operation-- in place of the "DIR1 -o -name DIR2 -o -name DIR3" that I hand-entered in the original find command. Since we needed the extra "-name" at the front, I just put that in manually before the "$(...)". So now we have a command that will work for any number of directories and even work when the directory names change.

And it's much sexier than either of Tim's solutions.