Tuesday, October 26, 2010

Episode #118: We're Watching... Well, Ed mostly

Tim logs in:

One of our (presumably) faithful followers, wrote in asking how to track a user's login and logout times. Here at CLKF, we don't have a problem with keeping an eye on employees, since we don't have any (yet). But once we get them, we will ensure they begin work promptly at 5am and work until at least 9pm. How, do you ask? Lashings. The same goes for poor performace. If morale drops, the beatings will continue until morale increases. Simple. And that is the way that Hal and Ed brought me onboard.

For those of you who don't live in a utopia similar to ours, you may need to track the login and logout times. We can do this a few different ways depending on your environment. Windows keeps track of the logon and logoff events in the event log. The way they are tracked depends on the version of Windows. That divide is between the newer versions, beginning with Vista, and the older versions.

Let's first look at Windows Vista, 7 and 2008. While only Windows 7 and 2008R2 come with PowerShell installed by default, these newer version have a new logging system. With the new system, the logon event ID is 4648 and the logoff event ID is 4647. With this new logging system (and PowerShell v2), we can use the cmdlet Get-WinEvent. However, all versions of Windows and PowerShell support Get-EventLog.

The Get-WinEvent cmdlet abstracts the log objects. This abstractions allows for searching of multiple logs at once, but it comes at a price. The abstraction makes searching for specific data is a little quirky. Here is what I mean.

PS C:\> Get-WinEvent -FilterHashtable @{LogName="Security";Id=4647,4648}

TimeCreated ProviderName Id Message
----------- ------------ -- -------
9/8/2010 6:51:42 PM Microsoft-Windows-Security-Auditing 4648 A logon was attempted...
9/8/2010 5:30:40 PM Microsoft-Windows-Security-Auditing 4647 User initiated logoff...
9/8/2010 4:40:24 PM Microsoft-Windows-Security-Auditing 4648 A logon was attempted...
9/8/2010 1:54:30 PM Microsoft-Windows-Security-Auditing 4647 User initiated logoff...
...


This abstracted command doesn't allow us to filter with using the -Id paramter, so we have to use a hashtable. The hashtable allows us to filter on any property of the underlying log. All we need to do is pick the property and the value for filtering, like this @{name1=value1,name2=value2}. It isn't that bad, but there is no tab completion for the properties used in the hash table.

Now we have a list of all the logon and logoff events, but we have to find events specific to our user. The username is in the message, so we need to filter for events related to the user in question.

PS C:\> Get-WinEvent -FilterHashtable @{LogName="Security";Id=4647,4648} | 
? { $_.Message | Select-String "skodo" }


TimeCreated ProviderName Id Message
----------- ------------ -- -------
9/8/2010 5:30:40 PM Microsoft-Windows-Security-Auditing 4647 User initiated logoff...
9/8/2010 4:40:24 PM Microsoft-Windows-Security-Auditing 4648 A logon was attempted...


The Where-Object (alias ?) is used to filter for events related to our user. We take the Message property and search it, with Select-String, for the username of that pesky Ed Skoudis. This gives us the full object for the event, and we have all the associated data. This data can exported to CSV, XML, or other formats.

Now on to those older versions which don't support Get-WinEvent, we have to use Get-EventLog cmdlet. The nice thing with Get-EventLog is that it is supported on all version of Windows and both versions of PowerShell.

PS C:\> Get-EventLog -LogName Security -InstanceId 528,538

Index Time EntryType Source InstanceID Message
----- ---- --------- ------ ---------- -------
337 Oct 9 8:35 SuccessAudit Security 528 Successful Logon:...
335 Oct 9 8:34 SuccessAudit Security 538 User Logoff:...
333 Oct 9 8:33 SuccessAudit Security 528 Successful Logon:...
331 Oct 9 8:32 SuccessAudit Security 538 User Logoff:...
329 Oct 9 8:31 SuccessAudit Security 528 Successful Logon:...
327 Oct 9 8:30 SuccessAudit Security 538 User Logoff:...
325 Oct 9 8:29 SuccessAudit Security 528 Successful Logon:...


This is a list of all the logon and logoff events, and to find entries for a specific user we filter just like we did above.

PS C:\> Get-EventLog -LogName Security -InstanceId 528,538 | ? { $_.Message | Select-String "skodo" }

Index Time EntryType Source InstanceID Message
----- ---- --------- ------ ---------- -------
333 Oct 9 8:33 SuccessAudit Security 528 Successful Logon:...
331 Oct 9 8:32 SuccessAudit Security 538 User Logoff:...


Now we know where Ed has been on "my" machines, let's see if Hal can track him down on his machines.

Hal checks out

Yes, let's see what the devious Ed has been up to with all his free time since he's not writing Command Line Kung Fu every week:


$ last
halpo pts/1 ip198-166-98-130 Fri Oct 1 22:29 - 22:47 (00:18)
tinytim pts/1 pool-196-22-144- Fri Oct 1 21:13 - 21:44 (00:31)
mrbucket pts/1 ip198-166-98-35. Fri Oct 1 16:51 - 16:52 (00:00)
tinytim pts/6 10.20.30.40 Fri Oct 1 14:57 - 15:29 (00:31)
skodo pts/1 h-68-202-198-66. Fri Oct 1 14:56 - 15:04 (00:08)
halpo pts/6 10.20.30.20 Fri Oct 1 08:27 - 11:14 (02:47)
halpo pts/6 10.20.30.20 Fri Oct 1 07:59 - 07:59 (00:00)
root console :0 Fri Oct 1 07:55 still logged in
skodo pts/1 h-68-202-198-66. Thu Sep 30 23:16 - 10:25 (11:08)
reboot system boot Thu Sep 30 23:15
reboot system down Thu Sep 30 23:09
skodo pts/14 h-68-202-198-66. Thu Sep 30 22:43 - down (00:31)
skodo pts/9 h-68-202-198-66. Thu Sep 30 22:38 - down (00:36)
halpo pts/7 ip198-166-98-130 Thu Sep 30 22:34 - 22:40 (00:05)
skodo pts/9 h-68-202-198-66. Thu Sep 30 22:24 - 22:35 (00:11)
skodo pts/9 h-68-202-198-66. Thu Sep 30 18:24 - 18:25 (00:00)
skodo pts/14 h-68-202-198-66. Thu Sep 30 17:45 - 18:22 (00:37)
skodo pts/7 h-68-202-198-66. Thu Sep 30 17:42 - 18:22 (00:39)
skodo pts/9 h-68-202-198-66. Thu Sep 30 17:18 - 18:22 (01:04)
halpo pts/7 ip198-166-98-130 Thu Sep 30 16:06 - 16:08 (00:02)
halpo pts/7 ip198-166-98-130 Thu Sep 30 16:02 - 16:03 (00:01)
tinytim pts/9 pool-196-22-144- Thu Sep 30 16:01 - 16:15 (00:13)
[...]

wtmp begins Fri Jul 8 10:49

Hmmm, suspiciously user "skodo" was the only person on the box the last time it rebooted. Ed, you've got some 'splaining to do! Actually, the person I really want to visit with the old "clue by four" is whoever left root logged in on the console. Bad form.

The last command operates on the system default wtmp file, usually /var/log/wtmp. On some Unix machines, this log grows forever without bound during the entire lifetime of the system. On other machines, the wtmp gets rotated weekly like other logs. If you want to look at an older wtmp file, use "last -f <filename>". This is also helpful when you're doing forensics on a Unix system image and you want to examine the wtmp file.

Notice in the above output that host names are displayed when reverse DNS information is available. The Linux version of the last command has a "-i" flag to always show IP addresses. Unfortunately, this option isn't widely available on older, proprietary Unix flavors. One option that is common to most Unix-like OSes that I've tried is the "-a" flag that moves the host information to the last column of output so that it doesn't get truncated.

The last command also lets you be selective. For example, you can ask for the most recent 5 logins:

$ last -5
halpo pts/1 ip198-166-98-130 Fri Oct 1 22:29 - 22:47 (00:18)
tinytim pts/1 pool-196-22-144- Fri Oct 1 21:13 - 21:44 (00:31)
mrbucket pts/1 ip198-166-98-35. Fri Oct 1 16:51 - 16:52 (00:00)
tinytim pts/6 10.20.30.40 Fri Oct 1 14:57 - 15:29 (00:31)
skodo pts/1 h-68-202-198-66. Fri Oct 1 14:56 - 15:04 (00:08)

Or perhaps we'd like to see Ed's last three logins:

$ last -3 skodo
skodo pts/1 h-68-202-198-66. Fri Oct 1 14:56 - 15:04 (00:08)
skodo pts/1 h-68-202-198-66. Thu Sep 30 23:16 - 10:25 (11:08)
skodo pts/14 h-68-202-198-66. Thu Sep 30 22:43 - down (00:31)


We can even get a history of the system reboots:

$ last reboot
reboot system boot Thu Sep 30 23:15
reboot system down Thu Sep 30 23:09
reboot system boot Thu Mar 18 10:44
reboot system down Thu Mar 18 10:38
reboot system boot Thu Mar 18 10:25
reboot system down Thu Mar 18 10:08
[...]

Unfortunately, there aren't any options to select specific fields out of the last output. So you're pretty much stuck with piping last into awk to pull out the information you want.

Tuesday, October 19, 2010

Episode #117: Know When to Stop

Hal doesn't want to know:

We received an interesting challenge from loyal reader Joel Dinel:
My coworker has a directory which contains other sub-directories. These sub-directories contain files, but might also contain other sub-directories. My buddy would like to find, for each subdir of the current working directory, the most recent file. He would like to ignore any directories contained in sub-dirs.

Why Joel's "friend" wants to do this is a question perhaps best left unanswered. Ours is not to question why here at Command Line Kung Fu, we're merely here as a public resource for your oddest command-line challenges.

I pondered Joel's problem for a while trying to figure out how best to get started. I thought about using find to locate all of the regular files from the current directory downwards, but that won't give them to me in date-sorted order. I thought about looping over each directory and parsing the output of "ls -t", but filtering out the sub-sub-directories (if they happened to be the most recent object) could get complicated.

Then I thought that maybe I was over-thinking the problem and I should just try the naive approach:

$ ls -t */*
sub1/file2
sub2/file3
sub3/file1
sub3/file2
sub3/file3
sub1/file3
sub2/file1
sub2/file2
sub1/file1

sub3/otherdir:

sub2/subdir:

sub1/subsub:
subsubsub
file1

This actually looks surprisingly promising. The regular files from all of the sub-directories are listed first in date-sorted order, and then we get listings of the sub-sub-directories separated by blank lines. So all I have to do here is pull out the first listed file from each sub-directory and stop when I hit the first blank line.

After getting shown up by Davide Brini last weekDavide Brini's brilliant solution to last week's challenge, I had awk on the brain and I stole a page or two from Davide's playbook:

$ ls -t */* | awk -F/ '/^$/ {exit} !seen[$1] {seen[$1]=1; print}'
sub1/file2
sub2/file3
sub3/file1

The first part of the awk code, "/^$/ {exit}", simply terminates the program when we hit a blank line. We want to do this first, so we terminate the program immediately and don't produce any output when we hit the blank line.

The interesting stuff is happening in the rest of the awk expression. I'm splitting my input on slashes ("-F/"), so $1 in this case will be set to the name of the sub-directory. I'm keeping an array called "seen", which is indexed by sub-directory name. If there is not yet an array entry for a particular sub-dir, then this must be the first (and therefore most recent) file listed for that subdirectory. So we output that line ("print"), and update the "seen" array to indicate that we've output the most recent file for the sub-directory. I won't say it's obvious, but it's very compact and clean.

Honestly, this works fine on my test directory, but I was a little worried that the initial shell glob, "ls -t */*", would choke and die on a large directory structure. However, when I tested my solution in directories like /usr, /usr/include, and /proc and couldn't make it fail.

During my testing I did notice one issue though:

$ ls -l
total 224
drwxr-xr-x 2 root root 53248 2010-09-27 09:22 bin
drwxr-xr-x 2 root root 4096 2010-05-01 06:56 games
drwxr-xr-x 53 root root 12288 2010-09-27 07:55 include
drwxr-xr-x 231 root root 69632 2010-10-02 15:26 lib
drwxr-xr-x 39 root root 32768 2010-09-27 07:54 lib32
lrwxrwxrwx 1 root root 3 2009-11-30 17:25 lib64 -> lib
drwxr-xr-x 14 root root 4096 2010-09-29 11:53 local
drwx------ 2 root root 16384 2009-11-30 17:24 lost+found
drwxr-xr-x 2 root root 12288 2010-10-02 15:26 sbin
drwxr-xr-x 332 root root 12288 2010-09-27 09:10 share
drwxrwsr-x 6 root src 4096 2010-06-05 08:11 src
$ sudo ls -t */* | awk -F/ '/^$/ {exit}; !seen[$1] {seen[$1]=1; print}'
lib64/libgdiplus.so
lib/libgdiplus.so
bin/xulrunner
sbin/a2dismod
lib32/libbz2.so.1.0
include/ldap_features.h
games/same-gnome

Notice that lib64 is a symlink to lib. The glob happily follows the symlink and reports that the file libgdiplus.so is the most recent file in each of these two "directories" when really we're talking about the same file in the same directory twice.

If that's a problem for you, then we'll have to do this the hard way:

# for d in $(ls -F | egrep '/$'); do 
echo $d$(ls -tF $d | egrep -v '[@/]$' | head -1);
done | egrep -v '/$'

bin/vmss2core
games/same-gnome
include/ldap_features.h
lib/libgdiplus.so.0.0.0
lib32/libbz2.so.1.0.4
sbin/vmware-authd

The "ls -F" command puts a special character after each file name. Directories are tagged with a trailing "/", which I match with egrep to get a list of only the sub-directories of the current directory. On Linux I could have just used "find . -maxdepth 0 -type d", but "-maxdepth" is not supported on all versions of find. In any event, I'm taking my list of directories and iterating over it with a for loop.

Now "ls -t <somedir> | head -1" is a useful idiom for getting the name of the most recently modified object in a directory. But that object might be a sub-directory and we don't want that in this case. So I'm once again using the "-F" option with ls to indicate the type of file and then using egrep to filter out both directories ("/") and symlinks ("@") before I pipe things into head. This will give me the name of the most recently modified file (or device or named pipe or whatever-- you could filter those out too if you wanted), but I need to prepend the directory name.

The problem is that in some cases the output of my ls pipeline can be null-- like when the directory is empty or contains only sub-directories and/or symlinks. So I'm using the funny echo construct you see in the loop above rather than something like "echo -n $d; ls -tF ..." just so that I can guarantee that we get a terminating newline. However in the cases where the output of the ls pipeline is null I'll just have a sub-directory name and a trailing slash but no actual file name. So I added one more egrep after the loop to filter out these unnecessary lines of output.

In fact you'll notice that the output of my loop solution is considerably different from the output of the awk solution. Adding a "| xargs ls -l" to each expression makes the issues clearer:

# ls -t */* | awk -F/ '/^$/ {exit}; !seen[$1] {seen[$1]=1; print}' | xargs ls -l
lrwxrwxrwx 1 root root 27 2010-09-27 07:58 bin/xulrunner -> /etc/alternatives/xulrunner
-r-xr-sr-x 1 root games 103344 2010-02-18 00:01 games/same-gnome
-rw-r--r-- 1 root root 1890 2010-07-29 17:50 include/ldap_features.h
lrwxrwxrwx 1 root root 15 2010-09-27 07:54 lib32/libbz2.so.1.0 -> libbz2.so.1.0.4
lrwxrwxrwx 1 root root 19 2010-10-02 15:26 lib64/libgdiplus.so -> libgdiplus.so.0.0.0
lrwxrwxrwx 1 root root 19 2010-10-02 15:26 lib/libgdiplus.so -> libgdiplus.so.0.0.0
lrwxrwxrwx 1 root root 7 2010-09-27 07:55 sbin/a2dismod -> a2enmod
# for d in $(ls -F | egrep '/$'); do
echo $d$(ls -tF $d | egrep -v '[@/]$' | head -1);
done | egrep -v '/$' | xargs ls -l

-rwxr-xr-x 1 root root 857848 2010-09-27 07:37 bin/vmss2core
-r-xr-sr-x 1 root games 103344 2010-02-18 00:01 games/same-gnome
-rw-r--r-- 1 root root 1890 2010-07-29 17:50 include/ldap_features.h
-rw-r--r-- 1 root root 70076 2010-09-10 14:36 lib32/libbz2.so.1.0.4
-rw-r--r-- 1 root root 410184 2010-09-23 08:26 lib/libgdiplus.so.0.0.0
-rwsr-xr-x 1 root root 871296 2010-09-27 07:35 sbin/vmware-authd

The awk solution doesn't filter out symlinks like my for loop version does. I can fix things up a little bit by employing the same "ls -F" trick we used in the loop version:

# ls -tF */* | awk -F/ '/^$/ {exit}; !/@$/ && !seen[$1] {seen[$1]=1; print}'
bin/vmss2core*
sbin/vmware-authd*
lib64/libgdiplus.so.0.0.0
lib/libgdiplus.so.0.0.0
lib32/libbz2.so.1.0.4
include/ldap_features.h
games/same-gnome*

Do you see that I've added an extra pattern match in the awk code-- "!/@$/"-- to filter out the symlinks? Of course now I've got the trailing "*" markers for executable files. A little sed will clean that up:

# ls -tF */* | awk -F/ '/^$/ {exit}; !/@$/ && !seen[$1] {seen[$1]=1; print}' | 
sed 's/\*$//' | sort

bin/vmss2core
games/same-gnome
include/ldap_features.h
lib32/libbz2.so.1.0.4
lib64/libgdiplus.so.0.0.0
lib/libgdiplus.so.0.0.0
sbin/vmware-authd
# for d in $(ls -F | egrep '/$'); do
echo $d$(ls -tF $d | egrep -v '[@/]$' | head -1);
done | egrep -v '/$'

bin/vmss2core
games/same-gnome
include/ldap_features.h
lib/libgdiplus.so.0.0.0
lib32/libbz2.so.1.0.4
sbin/vmware-authd

I also threw in a sort command to make it easier to compare the output of the new awk expression with the the for loop output. The only real discrepancy now is that our glob in the awk version is still following the symlink and giving us output about the lib64 directory when it probably shouldn't. So you could say that the for loop version is more correct, but the awk solution "works" in most cases and is easier to type.

Two solutions from me this week. Why am I thinking that coming up with just one is going to be rough for Tim this time around?

Tim only digs so deep

Silly Hal, this one isn't bad at all. Here is what I came up with:

PS C:\> ls | ? { $_.PSIsContainer} | % { (ls $_ | ? { -not $_.PSIsContainer} | sort LastWriteTime -Desc)[0] }

Directory: Microsoft.PowerShell.Core\FileSystem::C:\dir1

Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 10/17/2010 12:59 PM 12 ccc.txt


Directory: Microsoft.PowerShell.Core\FileSystem::C:\dir2

Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 10/17/2010 1:07 PM 12 zzz.txt
The first part of this command is just a directory listing that filters for directories (containers).

PS C:\> ls | ? { $_.PSIsContainer}

Directory: Microsoft.PowerShell.Core\FileSystem::C:\temp

Mode LastWriteTime Length Name
---- ------------- ------ ----
d---- 10/17/2010 12:59 PM <DIR> dir1
d---- 10/17/2010 1:07 PM <DIR> dir2
The next portion is where the magic happens. We use the ForEach-Object cmdlet (alias %) to iterate through each subdirectory under our working directory. The iterator is the current pipeline object ($_). In our case, the current pipeline object will first contain dir1, and then dir2.

Inside our ForEach-Object script block, we get a listing of the contents in our subdirectory, and filter out objects that are not containers (directories). The objects (files) are then sorted by the LastWriteTime in reverse order. Finally, we select the first object. Remember, we are working in base zero, so the first object has an index of 0.

That's it, not so bad is it Hal!

Tuesday, October 12, 2010

Episode #116: Stop! Haemer Time!

Hal can't touch this:

In response to our "blegging" in last week's Episode, we got a bunch of good ideas from our readers for future Episodes. But don't let that stop you from sending those cards, letters, and emails for topics you'd like to see us cover in the blog!

This week's Episode comes from long-time friend of the blog, Jeff Haemer. Not only did he send us a problem, he also sent the solution-- at least for the Unix side of the house. So, easy week for me as I sit back and explain Jeff's problem and solution.

Jeff's situation is that he's got a bunch of software build directories tagged with a software revision number and a date:

$ ls
1.2.00.00_devel-20100906 1.2.00.00_devel-20100910 2.0.00.00_devel-20100909
1.2.00.00_devel-20100907 2.0.00.00_devel-20100906 2.0.00.00_devel-20100910
1.2.00.00_devel-20100908 2.0.00.00_devel-20100907
1.2.00.00_devel-20100909 2.0.00.00_devel-20100908

The problem is that Jeff wants to clean up his build area by removing all but the last two date-stamped directories for each of the different software versions.

There are really two pieces to solving this problem and Jeff's solution is a nice little bit of "divide and conquer". The first problem is figuring out the different software version numbers that are present in each directory:

$ ls | cut -d- -f1 | uniq
1.2.00.00_devel
2.0.00.00_devel

Here we're just taking the directory listing, using cut to chop off the date stamps after the "-" and then uniq-ifying the list to get just one instance of each version number. Normally you would call sort before uniq, but in this case the ls command is sorting the directory listing for us.

The next problem is, for each version number, figure out the directories we need to remove-- i.e., everything but the two most recently date-stamped directories. The naive approach might be to start with a directory listing like this:

$ ls -d 1.2.00.00_devel*
1.2.00.00_devel-20100906
1.2.00.00_devel-20100907
1.2.00.00_devel-20100908
1.2.00.00_devel-20100909
1.2.00.00_devel-20100910

The directories we want to delete are everything except for the last two directories. You could try some tricks using head piped into tail, but that gets complicated pretty quickly. An easier approach is to just invert the problem:

$ ls -dr 1.2.00.00_devel*
1.2.00.00_devel-20100910
1.2.00.00_devel-20100909
1.2.00.00_devel-20100908
1.2.00.00_devel-20100907
1.2.00.00_devel-20100906

The "-r" flag reverses the sort order of ls. So now our problem is to extract everything except for the first two lines. And that's easy:

$ ls -dr 1.2.00.00_devel* | tail -n +3
1.2.00.00_devel-20100908
1.2.00.00_devel-20100907
1.2.00.00_devel-20100906

Notice that the correct syntax for tail is "-n +3"-- "start three lines into the input and output the rest". If you were thinking "-n +2", well let's just say you were probably in good company.

So now we know how to extract the various software versions, and how to get the names of all but the two most recent directories. The final solution is just a matter of putting those two ideas together:

$ for v in $(ls | cut -d- -f1 | uniq); do
ls -dr $v* | tail -n +3
done

1.2.00.00_devel-20100908
1.2.00.00_devel-20100907
1.2.00.00_devel-20100906
2.0.00.00_devel-20100908
2.0.00.00_devel-20100907
2.0.00.00_devel-20100906

In the for loop itself, I'm using our expression to obtain the directory version numbers inside of "$(...)", which is essentially the same thing as using backticks. However, the "$(...)" construct is preferable for reasons which we'll see in a moment. Then for each version number I'm using the expression we developed to output the names of the directories we want to remove.

Great! We're now outputting the names of all the directories we want to remove, now we want to actually remove them (note that it's always best to do this sort of confirmation before you do a dangerous operation like rm). There's a lot of different ways we could go here. I choose xargs:

$ !! | xargs rm -rf
for v in $(ls | cut -d- -f1 | uniq); do ls -dr $v* | tail -n +3; done | xargs rm -rf
$ ls
1.2.00.00_devel-20100909 2.0.00.00_devel-20100909
1.2.00.00_devel-20100910 2.0.00.00_devel-20100910

Whoa Nelly! What just happened there? Well, I used a quick command-line history substitution, namely "!!", to repeat the previous command (my for loop) and pipe the output into xargs.

Another alternative would be to use command output substitution:

rm -rf $(for v in $(ls | cut -d- -f1 | uniq); do ls -dr $v* | tail -n +3; done)

Constructs like this are why you want to use "$(...)" instead of backticks. If you tried doing the above command line with backticks, you'd get a syntax error because the shell doesn't parse "nested" backticks the way you want. On the other hand, "$(...)" nests quite nicely.

The only problem with the second solution is that if the number of directories we need to remove is large, you could theoretically overwhelm the limit for the length of a single command line. Using xargs protects you from that problem.

Anyway, thanks for the interesting problem/solution, Jeff! It looks like Tim's got his gold parachute pants on and he's ready to rock...

Tim busts the funky lyrics

I did not wear gold parachute pants in the 80s, at least there is no proof of it. And Hal, sorry to correct you on your 80s fashion, but Hammer pants are WAY different from parachute pants, and besides, I wore the silver ones.

Let's fast-forward 20 years and PowerShell this moth-ah. Similar to Hal's approach, we'll divide and conquer conqu-ah.
PS C:\> Get-ChildItem | Sort-Object Name -Descending

Directory: Microsoft.PowerShell.Core\FileSystem::C:\

Mode LastWriteTime Length Name
---- ------------- ------ ----
d---- 10/10/2010 10:10 PM <DIR> 2.0.00.00_devel-20100910
d---- 10/10/2010 10:10 PM <DIR> 2.0.00.00_devel-20100909
d---- 10/10/2010 10:10 PM <DIR> 2.0.00.00_devel-20100908
d---- 10/10/2010 10:10 PM <DIR> 2.0.00.00_devel-20100907
d---- 10/10/2010 10:10 PM <DIR> 2.0.00.00_devel-20100906
d---- 10/10/2010 10:10 PM <DIR> 1.2.00.00_devel-20100910
d---- 10/10/2010 10:10 PM <DIR> 1.2.00.00_devel-20100909
d---- 10/10/2010 10:10 PM <DIR> 1.2.00.00_devel-20100908
d---- 10/10/2010 10:10 PM <DIR> 1.2.00.00_devel-20100907
d---- 10/10/2010 10:10 PM <DIR> 1.2.00.00_devel-20100906


Ok, so nothing exciting. All we did was get our directory listing and sort it by name in reverse order. Now, to group them by the software version:
PS C:\> Get-ChildItem | Sort-Object Name -Descending | Group-Object { $_.Name.split("_")[0] }
Count Name Group
----- ---- -----
5 2.0.00.00 {2.0.00.00_devel-20100910, 2.0.00.00_devel-20100909, ...
5 1.2.00.00 {1.2.00.00_devel-20100910, 1.2.00.00_devel-20100909, ...


In this example, the Group-Object cmdlet uses a script block to define how the groups are created. The groupings are created by taking the Name property of the current object ($_.Name), splitting it using the underscore as a delimiter, and then using the first item (actually zeroth item, remember base 0) in the resulting array. This gives us groups of directores where the group is based on the software version.

So now we have two groups. But what does the group contain? Remember, in PowerShell everything is an object. So the groups are just collections of the objects. As such, the items in the groups can be treated the same way as a directory, since the items are the directories.

We can now use the ForEach-Object cmdlet to iterate through each item in each group.
PS C:\> Get-ChildItem | Sort-Object Name -Descending | Group-Object { $_.Name.split("_")[0] } |
ForEach-Object { Select-Object -InputObject $_ -ExpandProperty Group | Select-Oject -Skip 2 }


Directory: Microsoft.PowerShell.Core\FileSystem::C:\

Mode LastWriteTime Length Name
---- ------------- ------ ----
d---- 10/10/2010 10:10 PM <DIR> 2.0.00.00_devel-20100908
d---- 10/10/2010 10:10 PM <DIR> 2.0.00.00_devel-20100907
d---- 10/10/2010 10:10 PM <DIR> 2.0.00.00_devel-20100906
d---- 10/10/2010 10:10 PM <DIR> 1.2.00.00_devel-20100908
d---- 10/10/2010 10:10 PM <DIR> 1.2.00.00_devel-20100907
d---- 10/10/2010 10:10 PM <DIR> 1.2.00.00_devel-20100906


Let's look at the command inside the ForEach-Object script block, since that is the guts of the command.
Select-Object -InputObject $_ -ExpandProperty Group | Select-Oject -Skip 2


The Current Pipeline Object, represented as $_, contains a group. So we need to take that group, and remove the first two directories from the collection. We are then left with just the directories we want to delete. The Select-Object cmdlet is used to expand the group back into directory objects. That output is piped into another Select-Object cmdlet which removes (skips) the first two items, and leaves us with the directories to be deleted.

Now we have the directories we want to delete, so we can pipe the whole thing into Remove-Item. But before we do, let's make sure we have the correct directories, so we use the -WhatIf switch.
PS C:\> Get-ChildItem | Sort-Object Name -Descending | Group-Object { $_.Name.split("_")[0] } |
ForEach-Object { Select-Object -InputObject $_ -ExpandProperty Group | Select-Oject -Skip 2 } |
Remove-Item -WhatIf

What if: Performing operation "Remove Directory" on Target "C:\2.0.00.00_devel-20100908".
What if: Performing operation "Remove Directory" on Target "C:\2.0.00.00_devel-20100907".
What if: Performing operation "Remove Directory" on Target "C:\2.0.00.00_devel-20100906".
What if: Performing operation "Remove Directory" on Target "C:\1.2.00.00_devel-20100908".
What if: Performing operation "Remove Directory" on Target "C:\1.2.00.00_devel-20100907".
What if: Performing operation "Remove Directory" on Target "C:\1.2.00.00_devel-20100906".


If you want to perform the deletion, simply remove the WhatIf switch.

Ok, so that was the long version, what if we break it down using aliases and such to make it shorter.
PS C:\> ls | sort Name -desc | group { $_.Name.split("_")[0] } |
% { select -input $_ -expand Group | select -s 2 } | rm


And that's how we roll, PowerShell Style!

You really can't touch this!

Davide Brini has done it again:

ls -r | awk -F- '$1!=v{c=0; v=$1} {c++} c>2' | xargs rm -rf

Let me explain that awk for the two or three of you out there who may be having problems decoding it:


  • The "-F-" tells awk to split its input on hyphen ("-") instead of white space. So for each line of input from the ls command, $1 will be the version string and $2 will be the date stamp.

  • The awk code uses two variables: "c" is a line count, and "v" is the current version string we're working on.

  • The first section of code, "$1!=v{c=0; v=$1}", checks to see if the version string in the current line of input is different from the last version string we saw ("$1!=v"). If so, then the code block gets executed and the line counter variable is reset to zero and v is set to the new version string ("{c=0; v=$1}").

  • The next bit of code, "{c++}", is executed on every line of input and just increments the line counter.

  • The last expression, "c>2", means match the case where the line counter is greater than 2-- in other words when we're on the third or higher line of ls output for a particular version string (remember c gets reset every time the version string changes). Because there's no code block after the logical expression, "{print}" is assumed and the line gets output.


So the net result is that the awk expression outputs the directories we want to remove, and we just pipe that output into xargs like we did with the output of the for loop in the original solution.

Easy as pie...

Tuesday, October 5, 2010

Episode #115: Shadowy Directories

[A PLEA FROM THE BLOGGERS: Hey, if you think it's easy coming up with over 100 ideas for Command-Line Kung Fu Episodes... we really, really want to hear from you! Send us you ideas for things you'd like to see covered in future Episodes! Because we're lazy and we need you to program the blog for us we're running out of ideas here...]

Hal's got a lot on his mind

Lately I've been thinking a lot about directories.

Loyal reader Josh Olson wrote in with an idea for an Episode that involved making a copy of one directory hierarchy under a different directory. The idea is that you just copy the directory structure-- but not any of the files-- from the original directory.

Josh's solution involved a "for" loop, but I can go that one better with a little xargs action:

$ cd /some/path/to/source/dir
$ find * -type d -print0 | (cd /path/to/dest/dir; xargs -0 mkdir)

In the above example we're taking the output from find and piping it into a subshell that first moves over to our target directory and then invokes mkdir via xargs to create the parallel directory structure. In many ways this is similar to the traditional "tar cp" idiom that I mentioned back in Episode #73, except that here I'm just making directories, not copying the directory contents.

Of course, my solution here involves "find ... -print0" piped into "xargs -0", which is great if you live an environment that supports these options for null-terminated I/O. But this is not portable across all Unix-like operating systems. If you want portability, you need to go "old school":

$ find * -depth -type d | cpio -pd /path/to/dest/dir
0 blocks

Yes, that's right. I'm busting out the old cpio command to amaze and terrify you. In this case I'm using "-p" ("pass mode"), which reads in a list of files and/or directories from the standard input and recreates the same file/directory structure under the specified destination directory. The "-d" option tells cpio to create directories as necessary. As a bonus, cpio preserves the original file ownerships and permissions during the copy process.

And this can actually be a problem-- for example when you're copying a directory which has read-only permissions for you. That's why I've added the "-depth" option to the find command, so that cpio is told about the "deepest" objects in the tree first. With "-d", it will make the directory structure to hold these objects and copy them into place. However, cpio will not set the restrictive permissions on the parent directory until it reads the parent directory name in the input.

See what happens when I try the above command without the "-depth" option:

$ find * -type d -ls
656249 4 dr-xr-xr-x 5 hal hal 4096 Oct 2 09:26 read-only
656253 4 drwxr-xr-x 2 hal hal 4096 Oct 2 09:26 read-only/dir3
656251 4 drwxr-xr-x 2 hal hal 4096 Oct 2 09:26 read-only/dir1
656252 4 drwxr-xr-x 2 hal hal 4096 Oct 2 09:26 read-only/dir2
$ find * -type d | cpio -pd ../dest
cpio: ../dest/read-only/dir3: Cannot stat: Permission denied
cpio: ../dest/read-only/dir1: Cannot stat: Permission denied
cpio: ../dest/read-only/dir2: Cannot stat: Permission denied
0 blocks
$ find ../dest/* -type d -ls
656254 4 dr-xr-xr-x 2 hal hal 4096 Oct 2 11:05 ../dest/read-only

Without "-depth", the directory "read-only" appears first in the output before its sub-directories. So the cpio command makes this directory first and sets it to be mode 555, just like the original directory. But then when cpio goes to make the subdirectories that come next in the find output, it doesn't have the write permissions it needs and it fails. So the moral of the story here is always use "find -depth ..." when piping your input into cpio.

I'm curious to see what Tim's got for us this week. It's possible that this is one of those cases where Windows actually makes this task easier to do than in the Unix shell...

Tim's been on the road

Finally, an episode that is really easy in Windows. We can do this with XCopy, which stands for EXTREEEEEEME COPY! Actually, it stands for eXtended Copy, but for all intents and purposes it is EXTREEEEEEME (because it is really easy)!

C:\> xcopy originaldir mynewdir /T /E /I


The /T option creates the directory structure and does not copy the files, but it does not copy empty directories or subdirectories. To copy the empty directories we need to use the /E option. And the /I option, well, that is a weird option...

If we don't use the /I option xcopy isn't sure if our destination is a file or directory, so we get this prompt:

C:\> xcopy originaldir mynewdir /T /E
Does mynewdir specify a file name
or directory name on the target
(F = file, D = directory)? D


If you select F, then the copy obviously doesn't work. What a weird setting.

Hal copied the permissions too, and so can we. All that needs to be added is the /O switch to copy ownership and ACL information.

Wow, that was easy. Like really easy. And xcopy works in PowerShell and cmd.exe. That makes it even more EXTREEEEEEME!