Pages

Tuesday, August 23, 2011

Episode #157: I Ain't No Fortunate One

Hal to the rescue!

We were kicking around ideas for this week's Episode and Tim suggested a little command-line "Russian Roulette". The plan was to come up with some shell fu that would pick a random number between one and six. When the result came up one, you "lost" and the command would randomly delete a file in your home directory.

Holy carp, Tim! This is the kind of thing you do for fun? Why don't we do an Episode about seeing how many files you can delete from your OS before it stops working? It's not like our readers would come looking for us with torches and pitchforks or anything. Geez.

Now I'm as big a fan of rolling the dice as anybody, but let's try something a bit more gentle. What I'm going to do is pick random sayings out of the data files used by the "fortune" program. For those of you who've never looked at these files before, they're just text files with various pithy quotes delimited by "%" markers:

$ head -15 /usr/share/games/fortunes/linux

"How do you pronounce SunOS?" "Just like you hear it, with a big SOS"
-- dedicated to Roland Kaltefleiter
%
finlandia:~> apropos win
win: nothing appropriate.
%
C:\> WIN
Bad command or filename

C:\> LOSE
Loading Microsoft Windows ...
%
Linux ext2fs has been stable for a long time, now it's time to break it
-- Linuxkongreß '95 in Berlin
%

In order to pick one of these quotes randomly, I'm going to need to know how many there are in the file:

$ numfortunes=$(grep '^%$' /usr/share/games/fortunes/linux | wc -l)

$ echo $numfortunes
334

By the way, there's no off-by-one error here because there actually is a trailing "%" as the last line of the file.

OK, now that we know the number of fortunes we can pick from, I can choose which numbered fortune I want with a little modular arithmetic:

$ echo $(( $RANDOM % $numfortunes + 1 ))

109
$ echo $(( $RANDOM % $numfortunes + 1 ))
128
$ echo $(( $RANDOM % $numfortunes + 1 ))
325

I've used $RANDOM a couple of times in past Episodes-- it's simply a special shell variable that produces a random value between 0 and 32K. I'm just using arithmetic here to turn that into a value between 1 and $numfortunes.

But having selected the number of the fortune we want to output, how do we actually pull it out of the file and print it? Sounds like a job for awk:

$ awk "BEGIN { RS = \"%\" }; 

NR == $(( $RANDOM % $numfortunes + 1 ))" /usr/share/games/fortunes/linux


#if _FP_W_TYPE_SIZE < 32
#error "Here's a nickel kid. Go buy yourself a real computer."
#endif
-- linux/arch/sparc64/double.h

In awk, the "BEGIN { ... }" block happens before the input file(s) get read or any of the other awk statements get executed. Here I'm setting the "record seperator" (RS) variable to the percent sign. So rather than pulling the file apart line-by-line (awk's default RS value is newline), awk will treat each block of text between percent signs as an individual record.

Once that's happening, selecting the correct record is easy. We use our expression for picking a random fortune number and wait until awk has read that many records. The variable NR tracks the number of records seen, so when NR equals our random value we've reached the record we want to output. Since I don't have an action block after the conditional expression, "{ print }" is assumed and my fortune gets printed.

By the way, I'm sure that some of you are wondering why I'm using $RANDOM rather than the built-in rand() function in awk. Turns out that some versions of awk don't support rand(), so my method above is more portable. If your awk does support rand(), then the command would be:


$ awk "BEGIN { RS = \"%\"; srand(); sel = int(rand()*$numfortunes)+1 }; NR == sel" \

/usr/share/games/fortunes/linux


panic("Foooooooood fight!");
-- In the kernel source aha1542.c, after detecting a bad segment list

Frankly, the need to call srand() to reseed the random number generator at the start of the program, makes using the built-in rand() function a lot less attractive than just going with $RANDOM. By the way, our arithmetic is a little different here because rand() produces a floating point number between 0 and 1.

Meh. I like the $RANDOM version better.

So Tim, if you can stop deleting your own files for a second, let's see what you've got this week.

Tim steps into Mambi-pambi-land

We are 157 Episodes in and Hal (and Ed) still aren't up for manly commands; not willing to put it all on the line. Instead, we get fortune cookies. Alright, but once you guys grow some chest hair, let's throw down mano a mano computo a computo.

Let's start with cmd.exe. Similar to what Hal did we first need to figure out how many lines only contain a percent sign.

C:\> findstr /r "^%$" fortunes.txt | find /c "%"

431


We use FindStr with the /r to use a regular expression to look for the beginning of the line (^), a percent sign, end of line ($). Note, the file has to be saved with the Carriage Return Line Feed (CRLF) that Windows is used to, and not just a Carriage Return (CR) as text files are normally saved in Linux. The results are piped into Find with the /c switch to actually do the counting. But you mask ask, "Why both commands?"

Unfortunately, we can't just use Find, since there is no mechanism to ensure the percent sign is on a line by itself. We also can't just use FindStr as it doesn't count. Now that we have the number, lets cram it into a variable as an integer.

C:\> set /a count="findstr /r "%$" fortunes.txt ^| find /c ^"%^""

Divide by zero error.


I tried all sort of syntax options, different quotes, and escaping (using ^) to fix this error, but no luck. However, if you wrap it in a For loop and use the loop to handle the command output, it works. Why? Who knows. Don't come to cmd.exe if you are looking for things to make sense.

C:\> cmd.exe /v:on /c "for /F %i in ('findstr /r "^%$" fortunes.txt ^| find /c "%"') do @set /a x=%i"

334


This command uses delayed variable expansion (/v:on) so we can set a variable and use it right away. We then use a For loop that "loops" (only one loop) through the command output.

With a slight modification we can get a random fortune number.

C:\> cmd.exe /v:on /c "for /F %i in ('findstr /r "^%$" fortunes.txt ^| find /c "%"') do @set /a rnd=%random% % %i"

12
C:\> cmd.exe /v:on /c "for /F %i in ('findstr /r "^%$" fortunes.txt ^| find /c "%"') do @set /a rnd=%random% % %i"
169
C:\> cmd.exe /v:on /c "for /F %i in ('findstr /r "^%$" fortunes.txt ^| find /c "%"') do @set /a rnd=%random% % %i"
252
C:\> cmd.exe /v:on /c "for /F %i in ('findstr /r "^%$" fortunes.txt ^| find /c "%"') do @set /a rnd=%random% % %i"
42


We use the variable %RANDOM% and the modulus operator (%) to select a random number between 0 and 333 by using the method developed in Episode #49.

Now we need to find our relevant line(s) and display them. Of course, we will need another For loop to do this.

C:\> cmd.exe /v:on /c "(for /F %i in ('findstr /r "^%$" fortunes.txt ^| find /c "%"') do @set /a rnd=%random% % %i > NUL) & @set /a itemnum=0 > NUL & for /F "tokens=* delims=" %j in (fortunes.txt) do @(echo %j| findstr /r "^%$" > NUL && set /a itemnum=!itemnum!+1 > NUL || if !itemnum!==!rnd! echo %j)"

Be cheerful while you are alive.
-- Phathotep, 24th Century B.C.


Before our second For loop we initialize the itemnum counter, which will be used to keep track of the current comment number. We use base 0 for counting as that is what the modulus output gives us.

The options used with the For loop sets the tokens and delims options so we get the whole line (tokens=*) including leading spaces (delimes=<nothing>). Next we use Echo and FindStr to check if the current line contains only a percent sign. If the command has output it is successful, and with our short circuit logical And (&&) we increment the itemnum counter. If the line is not a percent sign, then the logical Or (||) will execute our If statement.

If the our itemnum counter matches the random number, then we output the current line. As the itemnum counter does not increment until the next time is sees a percent sign, it can output multiple lines of text.

To be honest, this command was a big pain. More than once I wished my `puter had been shot by that Russian bullet. At least the PowerShell version is much easier.

PowerShell

PowerShell is great with objects, so let's turn each fortune into an object.

PS C:\> $f = ((gc fortunes.txt) -join "`n") -split '^%$', 0, "multiline"


This command gives us an array of fortunes. We read in the file with Get-Content (alias gc). Get-Content will return an array of rows, but this isn't what we want. We then recombine all the lines using the New Line characters (`n) between each element. We then recut the string using the Split operator and some fancy options.

We give the split operator three parameters. The first is the regular expression to use in splitting. The second is the number of maximum number of substrings, where 0 means return everything. The third parameter is used to enable the MultiLine option so the split operator will handle multiple lines.

Now we have a list of fortunes and we can count how many.

PS C:\> $f.length

335


Wait, 335? What is going on? Let's check the last fortune. Remember, we are working with base 0, so the last item is 334.

PS C:\> $f[334]

<nothing>


This happens because the last item is a % and we have characters after it, a Carriage Return Line Feed. As long as we know this we can work around it. Now to output a random line number.

PS C:\> $f = ((gc fortunes.txt) -join "`n") -split '^%$', 0, "multiline"

PS C:\> $f[(Get-Random -Maximum $f.length) - 1]

Questionable day.

Ask somebody something.

PS C:\> $f[(Get-Random -Maximum $f.length) - 1]

Don't look back, the lemmings are gaining on you.

PS C:\> $f[(Get-Random -Maximum $f.length) - 1]

You need no longer worry about the future. This time tomorrow you'll be dead.


This may be my last week as I was just informed that "You will be traveling and coming into a fortune." YIPEE! I'm off to Tahiti! (Hopefully)

Tuesday, August 16, 2011

Episode #156: Row, Row, Row... You're Columns!

Hal receives stroking via email

I recently received an email from my old friend Frank McClain:

It is with much humility that I kneel before the masters and ask this request, which I am certain is but a simple task for such honored figures.


Well a little sucking up never hurts, Frank. Let's see what your issue is:

Tab-delimited text file containing multiple email addresses per row. The first such field is sender, and that's fine. The following fields are recipients. The first recipient can stay where it is, but the following for that row need to be moved individually into column-format below the first recipient, in new rows. If there is only one recipient in a row, nothing more needs to be done with that row.

Example:

7/27/2011    15:40:00    steve.jobes@place.com    jmarcus@someplace.com

ronsmith@someplace.com pgonzalez@someplace.com
6/17/2011 15:19:00 ssummers@someplace.com kevin.smart@provider.com
Pamla.Barras@store.com pamlabs@webmail.com
5/14/2011 12:35:00 amartelli@someplace.com apiska@business.com
jmilch@provider.net pampwanla@webmail.com

What I need to end up with is:

7/27/2011    15:40:00    steve.jobes@place.com    jmarcus@someplace.com

7/27/2011 15:40:00 steve.jobes@place.com ronsmith@someplace.com
7/27/2011 15:40:00 steve.jobes@place.com pgonzalez@someplace.com
6/17/2011 15:19:00 ssummers@someplace.com kevin.smart@provider.com
6/17/2011 15:19:00 ssummers@someplace.com Pamla.Barras@store.com
6/17/2011 15:19:00 ssummers@someplace.com pamlabs@webmail.com
5/14/2011 12:35:00 amartelli@someplace.com apiska@business.com
5/14/2011 12:35:00 amartelli@someplace.com jmilch@provider.net
5/14/2011 12:35:00 amartelli@someplace.com pampwanla@webmail.com


No worries, Frank. I got this one.

It's pretty clear to me that two nested loops are going to be required. We'll need one loop to read each line, and then another loop to output a series of lines listing each recipient individually:

$ while read date time from recips; do 

for r in $recips; do
echo -e "$date\t$time\t$from\t$r";
done;
done <input-file

7/27/2011 15:40:00 steve.jobes@place.com jmarcus@someplace.com
7/27/2011 15:40:00 steve.jobes@place.com ronsmith@someplace.com
7/27/2011 15:40:00 steve.jobes@place.com pgonzalez@someplace.com
6/17/2011 15:19:00 ssummers@someplace.com kevin.smart@provider.com
6/17/2011 15:19:00 ssummers@someplace.com Pamla.Barras@store.com
6/17/2011 15:19:00 ssummers@someplace.com pamlabs@webmail.com
5/14/2011 12:35:00 amartelli@someplace.com apiska@business.com
5/14/2011 12:35:00 amartelli@someplace.com jmilch@provider.net
5/14/2011 12:35:00 amartelli@someplace.com pampwanla@webmail.com

So the outer "while read ..." loop is what we're using to read the input file-- notice the "<input-file" hiding at the end of the loop construct. Since read will automatically split up fields on whitespace for us, we can quickly pull out the date, time, and from address. We then have one more variable, recips, that gobbles up everything else on the line-- i.e., all of the recipient addresses.

But the recipient addresses are themselves whitespace delimited, so we can just whack $recips down into our for loop and iterate over each email address in the list. For each one of those recipients we output a tab-delimited line of output containing $date, $time, $from, and the current recipient, $r. We need to use "echo -e" here so that the "\t"s get expanded as tabs.

Nothing could be easier. In fact, I bet Tim could even handle this one in CMD.EXE. But Frank was so moved by our solution that he replied:

Your meaningless servant is like unto a worm to be crushed beneath the might of your foot, nay, even but a toe. The mere fact that the Master has deemed to write an honored response to this insignificant gnat has caused tears of joy to stream in a veritable rain from my eyes, too blind to look upon the shining radiance of the Master.

Not much we can add to that.

Tim crushes worms

Because Frank asked so nicely (and because Hal threw me under the bus) I'll do some ugly cmd, first.

C:\> cmd.exe /v:on /c "for /f "tokens=1-25" %a in (input.txt) do @(

echo %a %b %c %d &&
echo %e | find "@" > NUL && echo %a %b %c %d %e &&
echo %f | find "@" > NUL && echo %a %b %c %d %f &&
echo %g | find "@" > NUL && echo %a %b %c %d %g &&
...
echo %y | find "@" > NUL && echo %a %b %c %d %y)"



In this command, we start off by reading our input file. The default delimiters of tab and space will work fine for us because 1) the only space we have is between the date and time and 2) using just the tab as a delimiter is a pain. We can do it, but we have to start a new shell with tab completion disabled, and I like tab completion.

Once we read the file we output the date (%a), time (%b), sender (%c), and the first recipient (%d). Next, we output the second recipient and see if it contains an "@". If it doesn't then our short circuit Logical And (&&) will stop the rest of the line from executing. If it does then we output the second recipient (%e). We do the same for the third (%f) through 22nd (%y) recipient (Frank said 22 was the max).

It isn't a brief command, but I do think it is quite elegant in its form and function. Building such a big command with just basic building blocks is like building fire with sticks. Any many times I feel that with cmd all I have is sticks.

Now for PowerShell...

The PowerShell version is pretty similar to what Hal did but with his Foreach loop replaced with a For loop and a little extra math.

PS C:\> gc input.txt | % {$s = $_.split("`t");

for ($i=2; $i -lt $s.length; $i++) { write-host $s[0] $s[1] $s[$i] } }


7/27/2011 15:40:00 steve.jobes@place.com jmarcus@someplace.com
7/27/2011 15:40:00 steve.jobes@place.com ronsmith@someplace.com
7/27/2011 15:40:00 steve.jobes@place.com pgonzalez@someplace.com
6/17/2011 15:19:00 ssummers@someplace.com kevin.smart@provider.com
...


We use Get-Content (alias gc) to read in our file. We then use the ForEach-Object cmdlet (alias %) to operate on each line. Each line is split, using tab as delimeter, and held in the array $s. We then use a for loop to output the 0th element (date), the 1st element (sender), and repent held in the Nth element (Ok, so technically the Ith element). This gives us output, but of course with PowerShell the right way to do it is with objects.

PS C:\> $r = "" | Select Date, Sender, Recipient

PS C:\> gc input.txt | % {$s = $_.split("`t"); $r.Date = (Get-Date $s[0]); $r.Sender = $s[1];
for ($i=2; $i -lt $s.length; $i++) {$r.Recipient = $s[$i]; $r}}

Date Sender Recipient
---- ------ ---------
7/27/2011 3:40:00 PM steve.jobes@place.com jmarcus@someplace.com
7/27/2011 3:40:00 PM steve.jobes@place.com ronsmith@someplace.com
7/27/2011 3:40:00 PM steve.jobes@place.com pgonzalez@someplace.com
6/17/2011 3:19:00 PM ssummers@someplace.com kevin.smart@provider.com
...


The approach is very similar to our original, the notable difference is the use of our custom object $r. To create this basic object we pipe nothing ("") into the Select-Object cmdlet (alias select) and select our new property names. This gives us our object with the properties we need. The shell of our object exists, but with no values.

Next, we use our same Get-Content cmdlet with our ForEach-Object loop. Instead of outputting the results, we set the relevant property in our object. In addition, the Date string is converted to a Date object so we could later use PowerShell's date comparisons and operators. Finally, we output the object.

Now, back to enjoying the groveling.

Tuesday, August 9, 2011

Episode #155: Copying Somebody Else's Work

Hal finds more gold in the mailbag

Just last Episode I was saying how much I like getting to critique other people's command lines, and lo and behold Philipp-- one of our intrepid readers-- sends me this little bit of fu to pick on:

In our company we were just recently talking about finding files according to the user that owns the files and copying/backing them up with the same structure of subdirectories to another directory. We as Linux guys came up with a solution pretty soon:

find . -user myuser -exec cp -a \{\} /path/to/directory \;

I'm not going to pick on this solution too much, since it solves Philipp's problem, but I will note a couple of issues here:


  1. As find traverses the directory structure, it's going to call "cp -a" on each file and directory. That means a lot of re-copying of the same files and directories over and over again as find descends through various levels in the directory tree.

  2. It sounds like Philipp only wants to copy files owned by a particular user. But the above solution will also copy files owned by other users if they live under a directory that's owned by the target user


Essentially Philipp's task is to find all files and directories owned by a particular user and replicate that structure in some other directory. And when I hear a task that's "find stuff that matches some set of criteria and copy it someplace else" I think of my little friend cpio:

find . -user myuser -depth | cpio -pd /path/to/directory

This will copy only the files owned by the given user with no extra copying, and the "-d" option to cpio will create directories as needed. So this seems like the most correct, straightforward approach to Philipp's conundrum.

At least for Unix folks, that is. I'll note that Philipp went on to "throw down the gauntlet" at the Windows half of our little team:

But the Windows guys got screwed a bit... So now I wanted to ask you if you know a [Windows] solution and if you want to hare it with me and/or the rest of the world in the blog.

How about it, Tim?

Tim is an original

Sorry to disappoint Hal, but this ain't too hard (even though it may be a bit more verbose).

PS C:\> Get-ChildItem -Recurse | Where-Object { (Get-Acl -Path $_).Owner -eq "mydomain\myuser" } |

Copy-Item -Destination "\SomeDir" -Recurse


We use a recursive directory listing and pipe it into our filter. In the filter the Owner property of the output from the Get-Acl cmdlet is compared against our target user. Any objects (files or directories) that match will be passed down the pipeline. From there the Copy-Item cmdlet does the heavy lifting; it accepts the input object and recursively copys it to the destination.

It should be noted that the same problems explained by Hal occur here as well. I would explain it again here, but I'm not a copy cat.

And for an additional trick, here is the same cmdlet, but shortened.

PS C:\> ls -r | ? { (Get-Acl -Path $_).Owner -eq "mydomain\myuser" } | cp -dest "\SomeDir" -r


So...how `bout that, Hal?

Tuesday, August 2, 2011

Episode #154: Line up alphabetically according to your size

Tim has been out and about

Hal and I have been busy the past weeks with SANS FIRE and then recuperating from said event. Oddly enough, that is the first time I ever met Hal. I would say something about how I hope it is the last, but I hear he reads this blog and I don't want to insult him publicly.

While we were away, one of our fantastic readers (at least I think he is fantastic) wrote in:


I've been reading the column for a while and when my boss asked me how to list all the directories in a path by size on a Linux system, I strung a bunch of stuff together quickly and thought I'd send it in to see what you thought:

$ SEARCHPATH=/home/username/; find $SEARCHPATH -type d -print0 |

xargs -0 du -s 2> /dev/null | sort -nr | sed 's|^.*'$SEARCHPATH'|'$SEARCHPATH'|' |
xargs du -sh 2> /dev/null


I'm sure you don't need an explanation but this finds all the directories in the given path, gets the size of each, sorts them numerically (largest first) and then removes the size from the front and prints the sizes again in a nice, human readable format.

Keep up the good work


Thank you! It is always great to hear from the readers, and we are always looking for new ideas that we can attempt in Windows (PowerShell and possibly cmd.exe) and in *nix-land. Keep sending ideas. On to the show...

The first portion of our command needs to gets the directories and their size. I wish I could say this command is simple in Windows, but it isn't. To get the size of a directory we need to sum the size (File Length property) of every object underneath the directory. Here is how we get the size of one directory:

PS C:\> Get-ChildItem -Recurse C:\Users\tim | Measure-Object -property Length -Sum


Count : 195
Average :
Sum : 4126436463
Maximum :
Minimum :
Property : Length


This command simple takes a recursive directory listing and sums the Lengths the objects. As files are the only objects with non-null Lengths, we get the combined size of all the files.

Take note, this command will take a while on directories with lots of files. When I tested it on the Windows directory it took nearly a minute. Also, the output isn't pretty. Unfortunately, displaying the size (4126436463) in human readable form is not super easy, but we'll come back to that later. First, let's display the directory name its Size.

PS C:\> Get-ChildItem C:\Users\tim | Where-Object { $_.PSIsContainer } | Select-Object FullName,

@{Name="Size";Expression={(Get-ChildItem -Recurse $_ | Measure-Object -property Length -Sum).Sum }}


FullName Size
-------- ----
C:\Users\tm\Desktop 330888989
C:\Users\tm\Documents 11407805
C:\Users\tm\Downloads 987225654
...


It works, but we would ideally like to keep the other properties of the directories objects, as that is the PowerShell way. To do this we use the Add-Member cmdlet, which we discuss in Episode #87. By adding a property to an existing object we can later use the properties further down the pipeline. We don't need the other objects down the pipeline for this example, but humor me. Here is what the full command using Add-Member looks like:

PS C:\> Get-ChildItem C:\Users\tim | Where-Object { $_.PSIsContainer } | ForEach-Object {

Add-Member -InputObject $_ -MemberType NoteProperty -PassThru -Name Length
-Value (Get-ChildItem -Recurse $_ | Measure-Object -property Length -Sum).Sum }


Directory: C:\Users\tm

Mode LastWriteTime Length Name
---- ------------- ------ ----
d-r-- 7/29/2011 2:50 PM 330889063 Desktop
d-r-- 7/25/2011 10:29 PM 11407805 Documents
d-r-- 7/29/2011 10:32 AM 987225654 Downloads
...


To sort, it is as simple as piping the previous command into Sort-Object (alias sort). Here is the shortened version of the command using aliases and shortened parameter names.

PS C:\> ls ~ | ? { $_.PSIsContainer } | % {

Add-Member -In $_ -N Length -Val (ls -r $_ | measure -p Length -Sum).Sum -MemberType NoteProperty -PassThru } |
sort -Property Length -Desc


Directory: C:\Users\tm

Mode LastWriteTime Length Name
---- ------------- ------ ----
d-r-- 7/29/2011 10:32 AM 987225654 Downloads
d-r-- 7/29/2011 2:50 PM 330889744 Desktop
d-r-- 7/25/2011 10:29 PM 11407805 Documents
...


The original *nix version of the command had to do some gymnastics to prepend the size, sort, remove the size, then add the human readable size to the end of each line. We don't have to worry about the back flips of moving the size around because we have objects and not just text. However, PowerShell does not easily do the human readable format (i.e. 10.4KB, 830MB, 4.2GB), but we can do something similar to Episode #79.

We can use Select-Object to display the Length property in different formats:

 PS C:\> <Previous Long Command> | format-table -auto Mode, LastWriteTime, Length,

@{Name="KB"; Expression={"{0:N2}" -f ($_.Length/1KB) + "KB" }},
@{Name="MB"; Expression={"{0:N2}" -f ($_.Length/1MB) + "MB" }},
@{Name="GB"; Expression={"{0:N2}" -f ($_.Length/1GB) + "GB" }},
Name


Mode LastWriteTime Length KB MB GB Name
---- ------------- ------ -- -- -- ----
d-r-- 7/29/2011 10:32:57 AM 987225654 964,087.55KB 941.49MB 0.92GB Downloads
d-r-- 7/29/2011 2:50:38 PM 330890515 323,135.27KB 315.56MB 0.31GB Desktop
d-r-- 7/25/2011 10:29:53 PM 11407805 11,140.43KB 10.88MB 0.01GB Documents
...


We could add a few nested If Statements to pick between the KB, MB, and GB, but that is a script, and that's illegal here.

Let's see if Hal is more human readable.

Edit: Marc van Orsouw wrote in with another, shorter options using the filesystemobject and using the switch statement to display the size

PS C:\> (New-Object -ComObject scripting.filesystemobject).GetFolder('c:\mowtemp').SubFolders | 

sort size | ft name ,{switch ($_.size) {{$_.size -lt 1mb} {"{0:N2}" -f ($_.Size/1KB) + "KB" };
{$_.size -gt 1gb} {"{0:N2}" -f ($_.Size/1GB) + "GB" };default {"{0:N2}" -f ($_.Size/1MB) + "MB" }}}


Hal is about out

All I know is that the first night of SANSFIRE I had dinner with somebody who claimed to be Tim, but then I didn't see him for the rest of the week. What's the matter Tim? Did you only have enough money to hire that actor for one night?

The thing I found interesting about this week's challenge is that it clearly demonstrates the trade-off between programmer efficiency and program efficiency. There's no question that running du on the same directories twice is inefficient. But it accomplishes the mission with the minimum amount of programmer effort (unlike, say, Tim's Powershell solution-- holy moley, Tim!). This is often the right trade-off: if you were really worried about the answer coming back as quickly as possible, you probably wouldn't have tackled the problem with the bash command line in the first place.

But now I get to come along behind our illustrious reader and critique his command line. That'll make a nice change from having my humble efforts picked apart by the rest of you reading this blog (yes, I'm looking at you, Haemer!).

If you look at our reader's submission, everything before the "sort -nr" is designed to get a list of directories and their total size. But in fact our reader is just re-implementing the default behavior of du using find, xargs, and "du -s". "du $SEARCHPATH | sort -nr" will accomplish the exact same thing with much less effort.

In the second half of the pipeline, we take the directory names (now sorted by size) and strip off the sizes so we can push the directory list through "du -sh" to get human-readable sizes instead of byte counts. What I found interesting was that our reader was careful to use "find ... -print0 | xargs -0 ..." in the first part of the pipeline, but then apparently gives up on protecting against whitespace in the pathnames later in the command line.

But protecting against whitespace is probably a good idea, so let's change up the latter part of the command-line as well:

$ du testing | sort -nr | sed 's/^[0-9]*\t//' | tr \\n \\000 | xargs -0 du -sh

176M testing
83M testing/base64
46M testing/coreutils-8.7
24M testing/coreutils-8.7/po
8.1M testing/refpolicy
7.9M testing/webscarab
7.5M testing/ejabberd-2.1.2
6.2M testing/selenium
6.0M testing/refpolicy/policy
5.9M testing/refpolicy/policy/modules
...

I was able to simplify the sed expression by simply matching "some digits at the beginning of each line followed by a tab" ("^[0-9]*\t") and just throwing that stuff away by replacing it with the empty string. Then I use tr to convert the newline to a null so that we can use the now null-terminated path names as input to "xargs -0 ...".

So, yeah, I just ran du twice on every directory. But I accomplished the task with the minimum amount of effort on my part. And that's really what's important, isn't it?