Tuesday, March 30, 2010

Episode #88: Massage Techniques

Ed's Getting in the Mood:

The GUI can be so enticing. Sitting there with a bunch of GUI windows on the screen can beguile even a shell geek into doing things the hard way without even thinking about it.

Case in point: A few days ago, I needed to send e-mail to about a hundred people. No, I'm not a spammer... I needed to thank some folks for taking my SANS class. SANS sent me a list of e-mail addresses in a format that was kinda ugly -- a space delimited list of a hundred lines formatted as follows:
EmailAddress FirstName LastName SomeCrazyNumber
So, I simply needed to strip off the e-mail addresses from this list. I first highlighted the list and copied it. Then, without thinking, my reptile brain launched Excel and pasted the list into a new spreadsheet. I guess I was kind of expecting that the spreadsheet paste action would parse things into columns based on those spaces. But, that's dumb... why would it do that? My reptile brain then started moving the mouse up to the menu bar to figure out a way to break my one column into four, so I could just peel off the e-mail addresses of the first column.

Happily, before my mouse even clicked on the first menu, higher brain functionality kicked in. "Skodo," my brain said to itself, "You could spend 5 minutes farting around in Excel trying to do this, or you could do it in, like, 10 seconds in a shell." Suitably chastised by my own brain, I went to a cmd.exe on my Vista box and did the following:

C:\> notepad names.txt
Hit Enter (to say Yes, I want to create the file)

CTRL-V (to paste)

ALT-F4 (to close notepad)

Enter (to tell notepad to save the file)

C:\> (for /f %i in (names.txt) do @echo %i >> email.txt) & clip < email.txt &
del email.txt names.txt

Then, with the list of e-mail addresses safely ensconced in my clipboard, I just pasted that list of e-mail addresses into my mail program. No fuss, no muss.

But, let's look at what's going on here in a little more detail, because we've introduced a new friend here, namely clip.exe. I start out by creating a file called names.txt using notepad. Yes, that's a GUI tool, but it's a quick way of taking your clipboard contents (which still had my list of addresses, names, and crazy numbers) and moving it into a file. Then, four simple keystrokes (which even the lizard brain knows) allows me to create the file, paste the data into it, close notepad, and save the file while notepad is closing. All pretty routine stuff, accomplished in about 2 seconds.

My remaining 8 seconds went into the FOR loop. I invoked a FOR /F loop to parse the names.txt file, with a single iterator variable (%i), using default delims (spaces and tabs). At each iteration through my loop, in my do clause, I turn off command display (@) and append (>>) the first column of the file (%i) to a file I call email.txt. That first column in names.txt contains e-mail addresses, making my email.txt file contain only the e-mail addresses. I surround that whole FOR loop in parens so that I can allow that command to finish running, letting me follow it up with more commands.

After the loop is done, I then invoke the clip command, which is included in Windows 2003, Vista, Win7, and 2008 Server. It's not in XP, although you can copy it there from another Windows box, and it'll run (consult your lawyer regarding the license implications of that maneuver). The clip command can take the contents of a file and put it into the Window clipboard (via the syntax "clip < filename") Or, you could take the output of any command and dump it into the clipboard with "command | clip".

The clip command really does come in handy. Unfortunately, it only deals with shuffling file contents or command output TO the clipboard. It does not allow you to take clipboard content and pipe it to another program as input or dump it into a file. For that capability, there are numerous free third-party tools, or the handy reptile-brain-memory technique using Notepad pasting described above.

Finally, with my email.txt file copied over to my clipboard, I then run the del command to remove my temporary email.txt and names.txt files, cleaning up after myself. Remember, del can delete multiple files at a time by simply listing them.

Bada-bing, bada-boom. I've now got the data that I want in the format I want it, loaded into my clipboard and ready to paste.

So, next time you need to massage some data, don't just blindly reach for a spreadsheet program. Instead, consider what your command-line friends can do for you!

Tim is feeling a little stiff:

Ah, the days of Vista. If Ed had just used Windows 7 he would have the built in capability to use PowerShell, version 2 even.

Per usual, we can do this multiple ways. I selected the option that didn't require the temporary input file. Instead of pasting the data into a text file, we can paste it right into our shell. Here is how it works:

PS C:\> @"
Hit Enter

Paste - Right click, Select Paste

>> "@ | % { $_.Split("`n") } | % { $_.Split(" ")[0] } | clip
Here is what it would look like in our command window.

PS C:\> @"
>> tim@yomammashouse.com Tim Medin 00001
>> ed@yomammashouse.com Ed Skoudis 31337
>> hal@yomammashouse.com Hal Pomeranz 80085
>> "@ | % { $_.Split("`n") } | % { $_.Split(" ")[0] } | clip
>> <hit enter a second time>
In PowerShell, the >> prompt means that PowerShell is expecting more input. In our case, the shell is waiting for us to finish our multiline string and multiline command. I typically omit this prompt in my write-ups so it is easier to read and less confusing. I don't want someone to that >> needs to be typed. To finish our multiline command, hit enter twice. We now have all of the email addresses on the clipboard and we can paste them into our favorite email program.

Now let's see how this command works.

We first use a multiline string that holds all the email addresses, names, and weird numbers. This multiline string is actually called a here-string. A here-string begins with @" on line by itself and ends with "@ on a line by itself. The here-string contains all of the email addresses, names and weird numbers. The text can contain line breaks, single quotes, double quotes, blanks spaces, and it doesn't require any delimiters. It is a pretty cool construct.

The here-string is piped into a ForEach-Object loop to break the string into multiple lines by splitting on the line break character (`n). The output is then piped into another ForEach-Object loop. In the second loop, we simultaneously split the line, using the space character as a delimiter, and output the 0th item (email address) of the newly created array. The results are piped into clip, which stores the results in the clipboard.

There's the rubdown in PowerShell, let's see what Hal's got oiled up for us.

Hal is very relaxed

My goodness! Once again my Windows compatriots have worked themselves into a lather over something that is very simple to do in the Unix shell. Man, if I only got a free massage every time that happens. I'd be the most relaxed person on the planet!

Our first option would be to just send the email directly from the command-line:

$ cat email-msg.txt | mailx -s 'Thank you!' `awk '{print $1}' names.txt`

Assuming we already have our canned "Thank you" email in the file email-msg.txt, we just shove that into the standard input of the mailx command. The "-s" option allows us to specify a Subject: line, and then we just use awk to extract the list of recipients from the first column of our text file.

But perhaps you'd prefer to use a GUI mail client to compose your "Thank you" email. Just like Windows, you can copy command-line output to the standard X clipboards. There are actually several tools that do this, but I generally will use xsel:

$ awk '{print $1}' names.txt | xsel
$ awk '{print $1}' names.txt | xsel --clipboard

The first form copies the standard input to the "primary" selection, which you can then paste into your GUI program by clicking the middle mouse button. The second form copies the standard input to the clipboard, which is normally accessed by right-clicking and selecting "Paste" from the context-sensitive pop-up menu.

And while I feel almost bad about harshing on Ed and Tim's mellow, I feel I must disclose that, unlike Windows, the xsel command also lets you get the output from the clipboard into your commands as well. You can even append the output of multiple commands into the current selection:

$ echo Unix rules! | xsel
$ xsel -o
Unix rules!
$ echo Windows drools! | xsel -a
$ xsel -o
Unix rules!
Windows drools!

As you can see, the "-o" (output) option outputs the value of the current selection. The "-a" (append) option can be used to add text to the current selection.

I'll let Ed and Tim get back to their massages now. Poor guys, they obviously need all the relaxation they can get after having to work so hard.

Tuesday, March 23, 2010

Episode #87: Making a Hash of Things

Tim needs some hash:

The idea for this week's episode is brought to you from Latvia, by one of our fantastic followers, Konrads Smelkovs. He writes:

I recently had the need to compare two directory trees based on their hash sum, as I could not rely on modified/created or size attributes.

This is what I came up with in powershell, but I am sure there is more elegant method:

< Konrads had a nice bit of hash fu right here, but it was stopped by those customs' dogs upon entry to the US.>

Hashing and PowerShell, I find that hashing cmdlets are one of the most glaring omissions from PowerShell. Hopefully this will be added in v3, but we will have to wait and see. Since it isn't built-in, we will have to add it, and we have a few ways to do it.

1. Use functions

This is exactly what Konrads did. In fact, I took most of his commands for use below. His function was a bit different than mine, but the results are pretty much the same. I'll skip the inner workings of this function since the guts aren't the important part and we have a lot to cover.

PS C:\> function Get-MD5Hash ($file) {
$hasher = [System.Security.Cryptography.MD5]::Create()
$inputStream = New-Object System.IO.StreamReader ($file)
$hashBytes = $hasher.ComputeHash($inputStream.BaseStream)
$builder = New-Object System.Text.StringBuilder
$hashBytes | Foreach-Object { [void] $builder.Append($_.ToString("X2")) }

On no, I think we left CommandLineLand and crossed over to Script-istan. Everyone needs to visit the Scripti people once and a while, and if we add this function to our profile we won't have to go back. At least we can now we can get the MD5 hash of any file:

PS C:\> Get-MD5Hash file.txt

If we wanted to run this recursively on a bunch of files we would do this.

PS C:\> ls -r | ? { -not $_.PSIsContainer } | % { Get-MD5Hash $_.FullName }

In this command we first get a recursive directory listing using the Get-ChildItem cmdlet (alias ls, dir, gci) with the Recurse parameter (-r for short). Next, we filter out all Containers (directories) using Where-Object (alias ?). Finally, we compute the hash of each item using our function inside the ForEach-Object scriptblock.

I would assume many of you are asking, "Why is there a property called PSIsContainer instead of IsDirectory or IsFolder?" The reason is, PowerShell was designed to be extensible. The Get-ChildItem cmdlet is used on most Providers, and the File System is one such Provider.

Next you ask, "What is a Provider?" Providers provide access to data and components that
would not otherwise be easily accessible at the command line. The data is presented in a consistent format that resembles the file system drive. Examples include the Registry (ls hklm:\) and the certificate store (ls cert:\).

Now that your questions have been answered, let's get back to our hashing.

Our previous output was terrible since we don't know from which file the hash is derived. We lost all the information about the file. What if we could just add the hash as a property of the file? Well, we can!

We can add properties to any object by using the Add-Member cmdlet:

PS C:\> ls -r | ? { -not $_.PSIsContainer } | % { $_ | Add-Member
-MemberType NoteProperty -Name Hash -Value (Get-MD5Hash $_.FullName) -PassThru } |
Select Name,Hash

Name Hash
---- ----
file1.txt C11D8024A08381ECD959DB22BC2E7784
file2.txt 71DE107BEFF4FC5C34CF07D6687C8D84
file3.txt CB45F2C9DC3334D8D6D9872B1A5C91F6
file4.txt 8B4A8D66EB3E07240EA83E96A7433984
file5.txt 0E3315D930E7829CCDE63B123DD61663

Add-Member is used to add custom properties and methods to an instance of an object. The first parameter we need is the MemberType. We are going to create a NoteProperty, a property with a static value. There are a lot of options here, and you can see a full list here. The next two parameters, Name and Value, are pretty self explanatory. The PassThru parameter passes the new object down the pipeline. Finally, we select the Name and Hash to be displayed.

So we manually added the property for each object, what if we wanted the property to be permanent?

2. Update the File Data Type

This sounds hard, but it is actually pretty easy. All we need to do is create a little xml file to extend the File object (technically the System.IO.FileInfo data type). One thing to note, the xml file you create must be saved with the .ps1xml extension. Here is the ps1xml file based on the function above:

<GetScriptBlock>$hasher = [System.Security.Cryptography.MD5]::Create()
$inputStream = New-Object System.IO.StreamReader ($this)
$hashBytes = $hasher.ComputeHash($inputStream.BaseStream)
$builder = New-Object System.Text.StringBuilder
$hashBytes | Foreach-Object { [void] $builder.Append($_.ToString("X2")) }

To extend the Data Type it is as simple as running this command:

PS C:\> Update-TypeData hash.ps1xml

If you add this command to your profile (see episode #83) it will load automatically. Now we can get the MD5 hash of any file.

PS C:\> ls file1.txt | select name, md5

Name MD5
---- ----
file1.txt C11D8024A08381ECD959DB22BC2E7784

One cool bit is that it won't compute the hash unless you access the property. The default output doesn't display the hash so it won't slow down your system. There is a weird catch though and I can't find an explanation for it. Here is what I mean.

PS C:\> ls -r | select name,md5

Name MD5
---- ---

PS C:\> ls -r *.* | select name,md5

Name MD5
---- ---
file1.txt 32F1F5E65258B74A57A4E4C20A87C946
file2.txt EA42DB0F1DCFE6D1519AAC64171D2F37

For some reason, besides accessing the md5 property, you also have to give it a path in order to get output. A little odd, but easy to get around if you know about it.

Now let's take a look at the third option.

3. PowerShell Community Extensions

We could use any add-in, but I find the PowerShell Community Extensions to be the best general purpose add-in. PSCX adds a number of cmdlets that are missing from PowerShell. Today, the cmdlet we care about is Get-Hash, and it is way more powerful than the function we wrote in section 1. Not only does it give us the ability to get MD5 (default), it also provides us with SHA1, SHA256, SHA384, SHA512, and RIPEMD160. But wait, there's more! It can even hash a string, while my function (as written) can't. It even creates a nice little object that contains the Path and the Hash, so we don't have to do all the object creation.

PS C:\temp> ls -r | ? { -not $_.PSIsContainer } | % { Get-Hash $_ } | select path,hashstring

Path HashString
---- ----------
c:\dir1\file1.txt C11D8024A08381ECD959DB22BC2E7784
c:\dir1\file2.txt 71DE107BEFF4FC5C34CF07D6687C8D84
c:\dir1\file3.txt CB45F2C9DC3334D8D6D9872B1A5C91F6
c:\dir1\sub\file4.txt 8B4A8D66EB3E07240EA83E96A7433984

Depending on where and what you are using it for, each of these options fits a different niche. But for day to day work on a system I recommend installing pscx.

Comparing directories

For this portion I am going to use the PowerShell Community Extensions since that is my preferred method. We touched on using the Compare-Object cmdlet in episode #73 but we have another cool way to find files that don't have matching hashes.

PS C:\> ls dir1,dir2 -r | ? { -not $_.PSIsContainer} | % { Get-Hash $_.FullName } | 
group HashString | ? { $_.Count -eq 1 } | select -ExpandProperty Group

Path : c:\dir1\sub\file4.txt
HashString : 8B4A8D66EB3E07240EA83E96A7433984

Path : c:\dir2\sub\file4.txt
HashString : 0B39DC79962BC3CEA40EBF14336BFC4D

Let's break this down:

PS C:\> ls dir1,dir2 -r | ? { -not $_.PSIsContainer} | % { Get-Hash $_.FullName }

In section 1 we did something very similar so we'll skip the detailed explanation. However, this is one cool trick here. The recursive directory listing is given two directories separated by commas. Now let's take a look at the rest of the command:

... | group HashString | ? { $_.Count -eq 1 } | select -ExpandProperty Group

Using the Group-Object cmdlet will create a collection of objects with matching HashStrings. We then filter on groups with only one item and we are then left with files that do not match any other files. The output of the Group-Object is a Group data type and we have to expand it back into our original object. We now have our output.

I'm getting mighty hungry after all that hash. Hey dude, your turn man.

Hal has the munchies too:

Synchronicity is an interesting thing. Shortly before Kondrads sent us his message, I had to solve exactly this problem during a forensic investigation. Law enforcement had obtained several copies of a suspect's email-- in maildir format, and captured months apart-- and wanted us to extract all of the unique messages from the directories.

Happily, this is a lot easier to do in Unix than it is in Windows. For one thing, there are typically a lot of options for generating file hashes in a typical Unix environment. I'll go with the md5sum command this time, because it produces output that's easy to deal with for this particular task:

$ md5sum dir1/cur/file1
b026324c6904b2a9cb4b88d6d61c81d1 dir1/cur/file1

Generating checksums across an entire directory is just a matter of applying a bit of find and xargs action:

$ find dir1 -type f | xargs md5sum
6d7fce9fee471194aa8b5b6e47267f03 dir1/cur/file3
b026324c6904b2a9cb4b88d6d61c81d1 dir1/cur/file1
1dcca23355272056f04fe8bf20edfce0 dir1/cur/file5

To find the common files between multiple directories, I simply put multiple directory names into my find command and then sorted the md5sum output so that the duplicate files were grouped together:

$ find dir1 dir2 -type f | xargs md5sum | sort
166d77ac1b46a1ec38aa35ab7e628ab5 dir2/new/file11
1dcca23355272056f04fe8bf20edfce0 dir1/cur/file5
1dcca23355272056f04fe8bf20edfce0 dir1/new/file5
1dcca23355272056f04fe8bf20edfce0 dir2/cur/file5

Adding a quick while loop allowed me to pick out the duplicated checksum values and output just a list of the unique file names:

$ find dir1 dir2 -type f | xargs md5sum | sort | 
while read hash file; do [ "X$hash" != "X$oldhash" ] && echo $file; oldhash=$hash; done


In the while loop we're using read to assign the hash and the file name to variables. If the current hash is different from the previous hash, then output the file name. Then assign the current hash value to be the "previous" hash and read in the next line.

I'd say we're playing a dangerous game of cat and mouse with the Scriptistanian border guards here, but not actually creating an incursion on Scripti soil. I can enter the above code pretty easily on the command-line-- indeed, that's how I generated the example output above. It's certainly much more reasonable than Tim's blatant violation of our "no scripting" rule.

If you wanted to clean up the output a bit, you could add one last sort command at the end:

$ find dir1 dir2 -type f | xargs md5sum | sort | 
while read hash file; do [ "X$hash" != "X$oldhash" ] && echo $file; oldhash=$hash; done | sort


And that's my final answer, Regis.

Or at least it was until I got this little tidbit from loyal reader and official "friend of the blog", Jeff Haemer:

$ find dir1 dir2 -type f | xargs md5sum | sort -u -k 1,1 | awk '{$1=""; print}'

While I had known about the "-u" ("unique") option in sort to eliminate duplicate lines, I had no idea you could combine it with "-k" to force the uniq-ifying to happen on specific columns. That's some sexy fu, Jeff!

You might be asking yourself why Jeff didn't just do "awk '{print $2}'" there. Remember that would only work as long as the file name didn't contain any spaces. Jeff's command, while more complicated, is less prone to error.

Note that you could also use sed instead of awk at the end:

$ find dir1 dir2 -type f | xargs md5sum | sort -u -k 1,1 | sed -r 's/^[^ ]+[ ]+//'

Either way, we're basically getting rid of the while loop and replacing it with an implicit while loop in the form of an awk or sed command. But it's cool anyway. Thanks, Jeff!

Tuesday, March 16, 2010

Episode #86: Get a Job

Tim clocks in:

With the US Government attempting to stimulate the economy, and the productivity of this blog down by 66%, I thought I would do my part by creating jobs. Ed and Hal were off causing trouble last week at SANS 2010, so I'm by myself for this episode. Hal and Ed have already done a similar episode a while ago, so I thought I would do the PowerShell version.

To create a job in PowerShell we use the cmdlet Start-Job.

PS C:\> Start-Job -ScriptBlock { Get-EventLog -Log System }
Id Name State HasMoreData Location Command
-- ---- ----- ----------- -------- -------
1 Job1 Running True localhost Get-EventLog...

The script block is where we specify the command to be run. Pretty easy, right?

One of the things I haven't explicitly mentioned before is positional parameters. These parameters do not require that the parameter name be used and are defined by their position in the command. As an example, this command does the same thing as the one above, but without the parameter name being used.

PS C:\> Start-Job { Get-EventLog System }

We've touched on the ScriptBlock parameter, now let's check out some of the others we have available to us with this command:

PS C:\> Start-Job -FilePath C:\TestScript1.ps1
-Name Mine
-ArgumentList "jdoe"
-InitializationScript { Add-PSSnapin Quest.ActiveRoles.ADManagement }
-Credential mydomain\tim

The Name parameter is used to specify the name of the job so we can reference it. The FilePath parameter is used to specify a script to be run. The ArgumentList parameter specifies the arguments (parameter values) for the script that is specified by the FilePath parameter. InitializationScript specifies commands that run before the job starts and is useful for loading Snap-ins or modules that are required by our script or command. Finally, the Credential parameter is used to run the command as a different user. In this case the user will be prompted for the password. There are other ways to pass credentials or to authenticate, but we'll leave that for an episode of its own. Back to jobs...

Some commands have a parameter (AsJob) that allow the command to be run as a job. This is handy for commands that take a long time. One such cmdlet is Get-WmiObject.

PS C:\> Get-WmiObject -query "Select * from
CIM_DataFile Where Extension = 'pst'"
-ComputerName (Get-Content C:\computers.txt)
-asJob -ThrottleLimit 10

This command will query a list of computers specified in the computers.txt file for pst files on each computer. The ThrottleLimit command specifies that we should only run 10 simultaneous queries at the same time. Obviously, this command will take a long time to run, so we run it as a job.

Now that we have created so many jobs, what do we do with them? We can get a list of the jobs by using the aptly named Get-Job cmdlet.

PS C:\> Get-Job

Id Name State HasMoreData Location Command
-- ---- ----- ----------- -------- -------
1 Job1 Completed True localhost Get-EventLog -Log Sys...
3 Mine Completed True localhost Get-Process
5 Job5 Running True localhost Get-WmiObject -query ...

Notice that the jobs are not numbered sequentially. That is because the main job does not perform any of the work. Each job has at least one child job because the child job actually performs the work. It is a little odd, but it really doesn't have much impact.

The fifth job will take a while, say we want to wait for that job to finish.

PS C:\> Wait-Job 5; Write-Host "`aAll Done"

This command will wait until Job #5 has completed, beep, and then echo "All Done" to the console. The "back-tick a" makes the beep. All of these commands can reference a specific job by number (5) or by name (Job5).

Getting tired of waiting for that job to finish?

PS C:\> Stop-Job 5

Really want that job done? Delete it.

PS C:\> Remove-Job 5

We have stopped the fifth job and deleted it. Now, let's find out what happened with the first job.

PS C:\> Get-Job 1 | fl
HasMoreData : True
StatusMessage :
Location : localhost
Command : Get-EventLog -Log System
JobStateInfo : Completed
Finished : System.Threading.ManualResetEvent
InstanceId : 69e9bd34-87c8-4492-a6af-af6f2fa4a77f
Id : 1
Name : Job1
ChildJobs : {Job2}
Output : {}
Error : {}
Progress : {}
Verbose : {}
Debug : {}
Warning : {}
State : Completed

As you can see the job has completed. If our command had bombed then the State would have been "Failed". As you can see above, the HasMoreData property is true so we know there is output that we can look at. How do we do that?

PS C:\> Receive-Job 1
[Get-EventLog Ouptut follows]

The Receive-Job cmdlet retrieves the results from the job. One thing to note after using the Receive-Job cmdlet, the job is deleted unless the Keep parameter is used. This is handy so you don't have to do the cleanup after you get the results since most of the time you only deal with the results one time.

The results of Receive-Job are exactly the same as running the orginal command out side of a job. To better illustrate this, these two commands would have the same output:

PS C:\> Receive-Job 1 | ? { $_.EntryType -eq "Error" }
PS C:\> Get-EventLog -Log System | ? { $_.EntryType -eq "Error" }

Hopefully I will have stimulated the economy (and Hal and Ed) enough to get some help next week.

Tuesday, March 9, 2010

Episode #85: Coincidence & Randomness

Ed Uses His Rod Serling Voice:

You unlock this door with the key of imagination. Beyond it is another dimension - a dimension of sound, a dimension of sight, a dimension of mind. You're moving into a land of both shadow and substance, of things and ideas. You've just crossed over into... the Command Line Zone.

It's funny how these little coincidences happen, almost as though they are stitched into the very fabric of the universe. Two completely unrelated people working on very different projects happen to identify a command line need at about exactly the same time. Each, unknowing of the other, sends in a question to the unassuming band of shell freaks at the CLKF blog.

The scene opens with an e-mail from √Čireann, a kind reader who wanted to run a command at some random time within the next 24 hours. The command in question should send an e-mail blast to a group of recipients around the globe.

Less than 48 hours earlier, a friend of the blog had submitted an eerily similar question. In designing a Capture the Flag game, he needed to run a script every hour, but at a random time within that hour.

Was this a mere coincidence? Was it a sinister plot for world domination? Or, has someone's imagination and nostalgia for 1960's TV shows just gotten out of hand?

Our story gets even weirder. The CtF game master was surprised because his command to generate random timing fu was simply not behaving randomly enough. He had tried:

C:\> for /L %i in (1,0,2) do @cmd.exe /v:on /c for /F %f in ('set /a !random!%3600') do
@echo %f & ping -n 3 >nul

As a first step in creating the random delay timer, the CtF designer was trying to spit out a stream of pseudo-random numbers by invoking a FOR loop to run continuously, launching a cmd.exe to activate delayed environment variable expansion, kicking off a FOR /F loop that used a little shell math to create a random number between 0 and 3599 by applying modulo arithmetic (!random!%3600), and then introducing a 2-second delay by pinging localhost thrice. But those numbers on the output look decidely unrandom.

When I received his e-mail, I had a pretty good suspicion of what the culprit was. My friend had been stomping on his own entropy by launching the cmd.exe to turn on delayed variable expansion within his FOR /L loop for continuous execution. And, it gets even worse. To perform the "set /a" command inside of the single quotes ('), a FOR /F loop will launch another cmd.exe. So, each iteration of the FOR /F loop launches a cmd.exe, which uses a FOR /F loop to launch yet another cmd.exe to process the 'set /a' command. It's a double entropy stomper. But, even launching one cmd.exe can cause problems for your randomness. Consider this simplified example:

C:\> for /L %i in (1,0,2) do @cmd.exe /v:on /c set /a !random!%10

Those digits between one and ten change only every second or so. I immediately set out to create a fix. We've got to swap our invocation of delayed variable expansion (cmd.exe /v:on /c) and our FOR /L loop to make it run continuously. Otherwise, the constant launching of a shell dips back into our same old weak entropy pool each time it is invoked, which doesn't change fast enough on a Windows machine to give us satisfactory results. Let's check out the swap:
C:\> cmd.exe /v:on /c for /L %i in (1,0,2) do @set /a !random!%10

This result is much nicer, and significantly faster as we don't have to continuously launch a shell for each random number.

Now, let's apply it to the task at hand... running a script once per hour, at a random interval sometime in that hour. First off, we'll create a little dummy script, which will simply print out the date and time it was executed. Simply append to this script any other command(s) you'd want to run:

C:\> echo @echo ^%date^% ^%time^%> script.bat

Instead of running the script at a random time within each hour, I'm going to speed things up by running it at a random time each minute. The following command achieves our goal:

C:\> cmd.exe /v:on /c "for /L %i in (1,0,2) do @(set /a delay=!random!%60+1+1>nul
& set /a finish=60+1+1-!delay!>nul & echo script to run after !delay! pings
and then pause for !finish! pings & ping -n !delay!>nul &
script.bat & ping -n !finish!>nul)"
script to run after 21 pings and then pause for 41 pings
Fri 02/26/2010 13:42:21.89
script to run after 20 pings and then pause for 42 pings
Fri 02/26/2010 13:43:21.04
script to run after 28 pings and then pause for 34 pings
Fri 02/26/2010 13:44:29.20
script to run after 51 pings and then pause for 11 pings
Fri 02/26/2010 13:45:52.38

If you want to run the script randomly timed every hour, replace the two occurrences of 60 above with 3600 (60 seconds times 60 minutes). If you want to run at a random time once every 24 hour interval, replace 60 with 86400 (60 sec times 60 min times 24 hours).

So, what is this monstrosity of applied technology doing? First, we invoke cmd.exe to perform delayed variable expansion (cmd.exe /v:on /c) so we can let our variables change value as the command runs. Then, we start a FOR /L loop to run forever, counting between 1 and 2 in steps of zero. At each iteration of the loop, we turn off display of command (@). That's routine. Here's where things get more interesting.

We now use the set /a command to do some math, having it set the variable called "delay" to a random number modulo 60 (!random!%60). That'll give us a nice number between 0 and 59. But, why do I add 1 to it twice? Well, I'm going to later introduce delays using pings. To introduce an N second delay, I have to ping myself N+1 times. And, I'll introduce two delays: one before the script runs, and one after the script runs. See, we don't want to just run the command multiple times with a random delay between each run. If we did that, we might have the command run 18 times in one minute, just because our randomness returned a bunch of small numbers. Instead, we want it to run 18 times in 18 minutes, but at a random time within each minute. Therefore, we'll need to have a delay up front, followed by the script execution, followed by a delay for the rest of that minute. Each of those two delays will be implemented with pings, which each consuming 0 seconds for their first ping. I have to add one twice here to account for the two sets of pings gobbling up their first ping in almost no time.

After calculating my delay, I then calculate a finish number of pings by taking 60, adding 2, and subtracting !delay!. With all my math done, I simply display the number of pings before the script will run and after it runs. Finally, I then run the first pings, then the script, and the remaining pings. After all that, we loop.

You can put pretty much anything you want in script.bat. Unfortunately, sending e-mail at the command line is something that cmd.exe itself is not capable of doing using only built-in tools. You can run "start mailto:EmailAddress" at the command line, which invokes Outlook Express to send e-mail. But, it would require a user to hit the Send button. There are other tools for sending e-mail at the command line that rely on third party commands, described here.

Our random timing script invoker above is ugly, complex, but very effective. Such are the lessons in for cmd.exe in... the Command Line Zone.

Hal has all the time in the world

I can do a loop that's essentially the bash version of Ed's idea:

$ while :; do delay=$(($RANDOM % 60)); sleep $delay; date; sleep $((60 - $delay)); done
Tue Mar 2 11:07:55 PST 2010
Tue Mar 2 11:08:02 PST 2010
Tue Mar 2 11:09:43 PST 2010

Here I'm setting $delay to a random value between 0 and 59 then sleeping for that amount of time. I'm calling the date command in the middle of the loop so that you can more easily see the random time intervals, but you could substitute any commands here that you want. Finally, we sleep for the remainder of the time interval and then start the loop all over again.

This loop works fine for one-minute intervals and even one-hour intervals, but $RANDOM only ranges from 0 to 32767, so I'd have to do two calculations to cover an entire day-- pick a random hour between 0 and 24 for example, then pick a random time within that hour. Alternatively, we could just use the trick from Episode 58 to generate a larger random number:

while :; do 
delay=$((`head /dev/urandom | tr -dc 0-9 | sed s/^0*// | cut -c1-8` % 86400))
sleep $delay
sleep $((86400 - $delay))

I've modified the solution from Episode 58 slightly, including a sed command to strip off any leading zeroes from the result. Otherwise the random value we calculate may be interpreted as an octal number, which causes problems if there are any 8's or 9's elsewhere in the number. Let me demonstrate what I mean with a quick example using a hard-coded value that has a leading zero:

$ delay=$((09999999 % 86400))
bash: 09999999: value too great for base (error token is "09999999")

Actually, aside from the type of random loops we're doing here, this kind of random delay is also useful for scheduled tasks like cron jobs. For example, in larger enterprises you might have hundreds or thousands of machines that all need to do the same task at a regular interval. Often this task involves accessing some central server-- grabbing a config file or downloading virus updates for example. If a thousand machines all hit the server at exactly the same moment, you've got a big problem. So staggering the start times of these jobs across your enterprise by introducing a random delay is helpful. You could create a little shell script that just sleeps for a random time and then introduce it at the front of all your cron jobs like so:

0 * * * * /usr/local/bin/randsleeper; /path/to/regular/cronjob

The cron job will fire every hour, the "randsleeper" script will sleep for part of that time, and then your regular cron job will execute. This is a well-known old sysadmin trick.

Well that's my weekly effort to serve man. Let's see what Tim's got cooking.

Tim[e] is on my side

My loop is pretty much a clone of Hal's, and the explanation is very similar:

PS C:\> while (1) { $delay = (New-Object Random).next(1,60); Start-Sleep $delay;
Get-Date; Start-Sleep (60 - $delay) }

Saturday, February 27, 2010 9:55:45 PM
Saturday, February 27, 2010 9:56:47 PM
Saturday, February 27, 2010 9:58:05 PM

The variable $delay is set to a random number between 1 and 59, and unlike Ed, we have good entropy. To get a random number we need to use the .Net System.Random class. However, there is a bit of goofyness, the lower bound is inclusive, while the upper bound is exclusive. So if we wanted to get a random number between 1 and 6, like on a die, we would use a lower bound of 1 and an upper bound of 7. Why did they do it that way? I don't know, and I been asking that since the beginning of the Microsoft shells.

Once we have the delay, we sleep for that amount of time. After waking up from our nap, the date and time are displayed. Of course any command (or commands) can be used. Finally, we sleep for the balance of the minute. The infinite While loop ensures that the process starts over again.

To change our loop to execute a command every hour, all that needs to be done is change 60 to 3600. If we wanted it to execute daily we would change the upper bound to 86400, and we don't have Hal's problem with big numbers. We can use really big numbers, up to 2,147,483,647. If the command was to run after 2.1 billion seconds it would be the year 2078, long after all the computers we are using are dead.

Finally, Ed mentioned sending an email at the random interval. Sending email from the Windows shells is a pain. It isn't possible with cmd, but we can do it with PowerShell by using the .Net framework. The long version of the commands looks like this.

PS C:\> $emailFrom = "tim@domain.com"
PS C:\> $emailTo = "ed@domain.com"
PS C:\> $subject = "Mail"
PS C:\> $body = "How's it going?"
PS C:\> $smtpServer = "smtp.domain.com"
PS C:\> $smtp = new-object Net.Mail.SmtpClient($smtpServer)
PS C:\> $smtp.Send($emailFrom, $emailTo, $subject, $body)

We can condense it to one line for use in our fu.

PS C:\> (New-object Net.Mail.SmtpClient("smtp.domain.com")).Send("tim@domain.com",
"ed@domain.com", "Mail", "How's it going?")

If we wanted to send an email once every hour at a random time the command here is how we would do it.

PS C:\> while (1) { $delay = (New-Object Random).next(1,60); Start-Sleep $delay;
(New-object Net.Mail.SmtpClient("smtp.domain.com")).Send("tim@domain.com",
"ed@domain.com", "Mail", "How's it going?"); Start-Sleep (60 - $delay) }

That wraps it up for now, see you next week. Start-Sleep 604800

Tuesday, March 2, 2010

Episode #84: Fixing the Filenames

Hal Helps Out

A friend of mine contacted me the other day with an interesting problem. She was trying to recover some files from the backup of an old BBS. In particular, she was trying to get at the attachments for various postings.

The attachment files were in a big directory, but the file names unhelpfully used an internal attachment ID number from the BBS. So we had file names like "attachment.43567". Now my friend also had a text file she extracted from the BBS that mapped attachment IDs to the real file names:

43567  sekrit plans.doc
44211 pizza-costs.xls

So the task was to take the file of "attachment ID to file name mappings" and use that to rename the files in the attachments directory to their correct file names.

I thought about it for a minute, and realized the solution was actually pretty straightforward:

$ while read id file; do mv attachment.$id "$file"; done <id-to-filename.txt

The trickiest part of the exercise was dealing with the file names that had spaces in them, like "sekrit plans.doc". Luckily the format of the input file was "ID filename", which meant that I could treat everything after the first whitespace as the file name. And this is exactly what the builtin "read" command will do for you: in this case it puts the first whitespace delimited token into the $id variable and then jams whatever is left over into the last "$file" variable. Once I got the right information into $file, it was simply a matter of making sure to quote this variable appropriately in the "mv" command inside the loop.

So there you go-- a quick one-liner for me, but a real time-saver for my friend. And possibly a real time-sink for Tim and Ed as they try and figure out how to do this in their shells. Let's see, shall we?

Ed Frustrates Hal
Sorry, Hal, but this one just isn't crushing for me. I know that disappoints you, but sometimes (on fairly rare occasions) we don't have to work too hard to coax little cmd.exe to do what we want. It does take two little tricks, though, but nothing too freakish.

Here's the fu:
C:\> for /f "tokens=1,*" %i in (id_to_filename.txt) do @copy attachment.%i "%j"
I'm using a FOR /F loop to read the contents of id_to_filename.txt, one line at a time. Default delimiters of FOR /F parsing are spaces and tabs, which will work just fine for us here, so there's no need to mess with custom delims. I've specified custom parsing of "tokens=1,*", which will make it assign the first column of the file (the integer in Hal's example) to my first iterator variable (which is %i). Then, the ,* stuff means to assign all of the rest of the line to my second iterator variable, which will be auto-allocated as %j. The ,* stuff is the first trick, which really comes in handy.

Then, in the body of my loop, I turn off display of commands (@) and invoke the copy command to take the contents of attachment.%i and place it into "%j". The second trick, those quotes around %j, are important in allowing us to handle any spaces in the file name. Note that I'm using copy instead of move here, because I don't wanna play Ed-Zilla stomping over the city just in case something goes awry (who's to say that our id_to_filename.txt file will always look like we expect it to?). I guess you could call it the Hipposhellic oath: First do no harm. After we verify that our copy worked like we wanted with a quick dir command, we can always run "del attachment.*"

Whatcha got, Tim?

Tim frustrates most people
Sorry Hal, this isn't too bad in PowerShell either. There are a few ways we can accomplish this task, but I elected to pick the shortest version, which also happens to be the one that brings up something we haven't covered before. Here are the long version and short version of the fu. (The short version is identical but uses built in aliases)

PS C:\> Get-Content id-to-filename.txt | ForEach-Object { $id,$file =
$_.Split(" ",2); Rename-Item -Path attachment.$id -NewName $file }

PS C:\> gc id-to-filename.txt | % { $id,$file = $_.Split(" ",2); ren
attachment.$id $file }

The Get-Content cmdlet is used to read the contents of the file, and it is piped into Foreach-Object. Inside the Foreach-Object script block is where the line is split. The first parameter used in the Split method defines the delimiter and the second defines how many items it should be split into.

The only problem, the Split method's output is multi-line. Here is an illustration:

PS C:\> gc id-to-filename.txt -TotalCount 1 | % { $_.Split(" ",2); }
sekrit plans.doc

We need both portions of the split to do the rename, so here is where we bring up a new little trick. We can assign the output of split into variables. Each line is assigned to a variable, the first variable ($id) is assigned the first line and the second variable ($file) receives the remainder. After we have the Id and the Filename we can easily rename the files.

If we wanted to be a little safer then we could use Copy-Item (alias cp or cpi) instead of Rename-Item (alias ren or rni). Once we confirmed the copy was successful we can delete all the attachment files by using "Remove-Item attachment.*" (alias del, erase, ri, or rm).