Tuesday, October 20, 2009

Episode #65: Feeling Loopy

Ed is back in the saddle again:

Well, I'm back from my adventures on the other side of Planet Earth. Many thanks to Tim Medin for holding down the Windows fort while I was away. He really did an awesome job sparring with Hal! In fact, Tim was so good that we're going to have him work as a regular contributor here, adding his thoughts from a PowerShell perspective to each episode. Before now, he'd throw in his insights on occasion, but now he's a regular -- our very own Command Line Kung Fu blog FNG, if you will. Oh, and Tim... don't forget to empty the wastebaskets and scrub the bathroom floor before you leave tonight. No, you don't have to wear the maid's outfit. Hal just thought you'd like it.

Anyway, where was I? Oh yeah, writing a new episode for this week.

Faithful readers (yes, both of you) know that we often use various kinds of loops in the commands we construct here. Individual commands are certainly powerful, but to really mess up your computer, it's helpful to have _iteration_, doing the same thing again and again with some subtle variations, repeating a process to do the same thing again and again, with some sutble variations. If you look at our episodes, I think about 80% of them actually use some sort of loop. And that got me thinking. I have an intuitive feel for what kinds of loops are available in cmd.exe and when to use each kind. But, I'd like to learn more about the looping options within bash and PowerShell, and what specific uses are best for each kind of loop. So, I figured the easiest way for me to learn about bash and PowerShell looping was to throw down some cmd.exe options, and invite my Kung Fu partners to respond in kind. I'll show you mine... if you show me yours. So, here goes.

In cmd.exe, we really have just one command that implements loops: FOR. Sadly, we don't have a WHILE. I'm not going to talk about GOTO, which we do have, but it is for scripts and not for individual commands, the relentless focus of our blog. Within the FOR command, however, we have numerous different kind of looping options. Let me explain each, and talk about what it's most useful for. Depending on how you count, there are 5 or 6 different kinds of FOR loops! (The 5 versus 6 depend on whether you consider a FOR /R, a FOR /D, and a FOR /D /R to be two or three different kinds of loops.) What was Microsoft thinking? Well, when you only have a hammer, the whole world looks like a nail... and with our FOR loops in cmd.exe, we can attack many different types of problems.

Note that each loop has a similar structure: a FOR statement, an iterator variable, the IN component, a (set) that describes what we iterate over (and is always included inside of parentheses ()), a DO clause, and a command for our iteration.

FOR /L loops: These are iterating counters, working their way through integers. Sorry, but they don't work through fractions, letters, or words.... just integers. Their syntax is:

C:\> FOR /L %[var] in ([start],[step],[stop]) do [command]

The %[var] is the iterator variable, a value that will change at each iteration through the loop. You can use any one letter of the alphabet for this variable, such as %a or %i. Most people use %i as the canonical variable, unless there is a specific reason to use something else. Also, note that %i and %I are different variables, which gives us a total of 52 possible different letters, the upper case and lower case sets.

So, if you want to count from 1 to 100, you could run:

C:\> FOR /L %i in (1,1,100) do @echo %i

Or, if you want a loop that'll run forever, you start counting at 1, count in steps of zero, and count all the way to 2:

C:\> FOR /L %i in (1,0,2) do @echo Infinite Loop

FOR /L loops are useful any time you have to count (obviously) but also any time you need the equivalent of a "while (1)" loop to run forever.

I covered FOR /L loops first, because they are both very easy and very useful, and I wanted to set them aside before we start covering loops that iterate over objects in the directory structure, namely FOR, FOR /D, FOR /R, and FOR /R /D.

Plain ol' FOR loops: These loops iterate over files, with the iterator variable taking on the value of the names of files you specify in the (set). For example, to list all .ini files inside of c:\windows, you could run:

C:\> FOR %i in (c:\windows\*.ini) do @echo %i

It's a little-known fact that the (set) in these file/directory FOR loops can have a space-separated list of file specifiers, so you could get all of the .ini files in c:\windows\*.ini and c:\windows\system32\*.ini by just running:

C:\> FOR %i in (c:\windows\*.ini c:\windows\system32\*.ini) do @echo %i

Now, you might think, "Dude... I can do that same thing with the dir command" and you'd be right. But, there is another aspect of file-iterating FOR loops that give us more flexibility than the dir command. By using a variation of the iterator variable, we can get other information about files, including their size, their date/time, their attributes and what not. Access to these items is available via:

   %~fi        - expands %I to a fully qualified path name
%~di - expands %I to a drive letter only
%~pi - expands %I to a path only
%~ni - expands %I to a file name only
%~xi - expands %I to a file extension only
%~si - expanded path contains short names only
%~ai - expands %I to file attributes of file
%~ti - expands %I to date/time of file
%~zi - expands %I to size of file

So, we could list the file's name, attributes, and size by running:

C:\> FOR %i in (c:\windows\*.ini) do @echo %i %~ai %~zi

FOR /D loops: These loops iterate through directories instead of files. So, if you want all directory names inside of c:\windows, you could run:

C:\> FOR /D %i in (c:\windows\*) do @echo %i

FOR /R loops: Ahhh... but you may have noted that neither the plain ol' FOR loops nor the FOR /D loops listed above actually recurse through the directory structure. To make them do that, you'd need to do a /R. The FOR /R loop has a slightly different syntax, though, in that we need to specify a path before the iterator variable to tell it where to start recursion. By itself, FOR /R recurses the directory structure, pulling out files names:

C:\> FOR /R c:\windows %i in (*.ini) do @echo %i

That one will go through c:\windows and find all .ini files, displaying their names.

Now, what if you want just directories and not files? Well, you do a FOR /D with a /R, as follows:

C:\> FOR /D /R c:\windows %i in (*) do @echo %i

This will list all directories inside of c:\windows and its subdirectories.

And that leaves us with the most complex kind of FOR loop in all of Windows.

FOR /F loops: These loops iterate through... uhhh... stuff. Yeah, stuff. The syntax is:

C:\> FOR /F ["options"] %[var] IN (stuff) DO [command]

The stuff can be all manner of things. If the (stuff) has no special punctuation around it, it's interpreted as a file set. But, the file set will be iterated over in a different manner than what we saw with plain ol' FOR loop and even FOR /R loops. With FOR /F, you'll actually iterate over each line of the _contents_ of every file in the file set! The iterator variable will take on the value of the line, which you can then do all kinds of funky stuff with, searching for specific text, parsing it out, using it as a password, etc.

If we specify the stuff with double quotes, as in ("stuff"), the FOR /F loop will interpret it as a string, which we can then parse.

If we specify the stuff with single quotes, as in ('stuff'), the FOR /F loop will interpret stuff as a command, and run the command, iterating on each line of output from the command.

Regardless of the stuff (whether it be files, a string, or a command), we can parse the iterator variable using those "options" in the FOR /F loop. I covered that parsing in more detail in Episode #48, Parse-a-palooza, and I won't repeat it here. There's also some examples of FOR /F in action there.

Suffice it to say, though, that if you master each of these FOR loops, you are rockin' and rollin' at the cmd.exe command line!


Tim, reporting for duty, Sirs!

After washing Ed's car and mowing Hal's lawn, they sent me on a hunt to find a strings command in the standard Windows shell. I haven't found it yet, but I'll keep looking after I finish painting. Anyway, back to the hazing, er, episode.

PowerShell also has five or six different types of loops. The difference is that they aren't all named FOR, and we do have the While loop. The available loop types are:
Do While
While
Do Until
For
ForEach-Object (& ForEach statment)


The first three loops are very similar so I'll cover them together. Also, since you are reading a blog such as this I'll assume you have at least a fundamental understanding of programming and understand control flow so I won't go into great depth on the basics.

While, Do While, and Do Until loops

Do While Loop
do {code block} while (condition)

Execute "while" the condition is true.

While Loop
while (condition) {code block}

Same as above, except the condition is checked before the block is executed, the control structure is often also known as a pre-test loop

Do Until Loop
do {code block} until (condition)

Executes "until" the condition is true. In other words it runs while the condition value is False.

These loops are much more commonly used in scripts and not in one-liner commands. However, I use the following command to beep when a host goes down (drops four pings).

PS C:\> do {ping 10.10.10.10} while ($?); write-host `a


...and this command to let me know when a host comes back up (four successful pings in a row)

PS C:\> do {ping 10.10.10.10} until ($?); write-host `a


The $? variable contains a boolean value which represents the result status of the previous command. A true value indicates the command completed successfully. The first loop continues to run while the ping command result is successful. The second loops runs until the ping command is successful. After exiting either loop the write-host `a command produces the beep. Note, the `a uses a back quote, not the standard single quote.

For loop
The standard use of the For statement is to run the code block a specified number of times.

for (initialization; condition; repeat) {code block}


If we wanted to count to 100 by 2's we could use this command.
PS C:\> for ($a=2; $a -le 100; $a=$a+2) {echo $a}
2
4
6
...


So far nothing new, but now it gets cool.

ForEach-Object
ForEach-Object is a looping cmdlet that executes in the pipeline and uses $_ to reference the current object. The ForEach-Object cmdlet is the most powerful and most commonly used loop in PowerShell. It is used so much that it is given the single character alias %. Here is the typical syntax of the ForEach-Object cmdlet:

... | ForEach-Object { script block } ...


Let's use it to view the contents of all the files in the current directory:

PS C:\> Get-ChildItem | ForEach-Object { Get-Content $_ } 

Shorter versions using built-in aliases:
PS C:\> dir | % { gc $_ } 
PS C:\> gci | % { gc $_ }


This command gets the files in the current directory using Get-ChildItem. Within our script block the current file is referenced by $_, the current pipeline variable. In our script block, denoted with the curly braces "{}", we call the Get-Content cmdlet on the current file. The loop automatically handles iterating through the objects passed down the pipeline and we get the contents of all the files.

With the addition of PowerShell to the regularly scheduled programming, you will see the ForEach cmdlet used regularly in the coming weeks.

ForEach
The ForEach statement is very similar to the ForEach-Object. The differences are formatting, performance, and memory utilization.

The formatting is different, but no so much different that it should be confusing.

ForEach ($item in $collection) {command_block}


If we rewrote the example above using the ForEach statment this is how it would look:

PS C:\> ForEach ($f in Get-ChildItem) { Get-Content $f }


Not a huge difference. The big difference comes with the resource usage. ForEach will load the entire collection in to memory before executing the script block, and it is usually a bit faster if it doesn't have to load something too large. Conversely, the ForEach-Object cmdlet will process it as it receives it.

If we use each method to multiple the numbers from 1 to 100,000 by the number 2 we can see that the ForEach cmdlet is 30 times faster. In short, the reason for the speed difference is that the ForEach is run as a single function instead of three or more functions.

PS C:\> Measure-Command { 1..100000 | %{$_*2} } |
select TotalMilliseconds

TotalMilliseconds
-----------------
5471.2111

PS C:\> Measure-Command { foreach ($i in (1..100000) ){$i*2} } |
select TotalMilliseconds

TotalMilliseconds
-----------------
177.7249


This difference is much less noticeable when there are other factors involved, such as disk access, rather than just pure computing power. Here is a similar test when accessing the Windows Security Event Log.

PS C:\> measure-command {get-eventlog -logname security | 
% {echo $_.eventid}} | select TotalMilliseconds

TotalMilliseconds
-----------------
1559.6163

PS C:\> measure-command {foreach ($i in get-eventlog -logname
security) { echo $i.eventid}} | select TotalMilliseconds

TotalMilliseconds
-----------------
1500.1738


I use ForEach-Object with the Get-EventLog cmdlet so my results are displayed as soon as they are processed and the time difference isn't as great. Personally, I think the ForEach-Object is more readable and is much easier to tack on to the end of an existing command.

I look forward to showing more PowerShell tips in the coming weeks. Now back to polishing Hal's car.

Hal finishes up:

Bash looping constructs are actually very simple: there's essentially two different types of for loops plus while loops and that's it. The most common type of loop in command-line tasks is the simple "for <var> in <list of values> ..." type loop:

for f in *.gz; do
echo ===== $i
zcat $i | grep -i pattern
done

The trick is that the "<list of values>" can be pretty much anything you can imagine, because Unix makes command output substitution so natural. For example, here's one of our previous solutions from Episode #56: Find the Missing JPEG:

for i in $(seq -w 1 1300); do [ ! -f $i.jpg ] && echo $i.jpg; done

You can have the for loop iterate over a directory structure simply by having it iterate over the output of a find command, though usually "find ... -exec ..." or "find ... | xargs ..." suffices instead of a loop. In any event, the ability to do arbitrary command substitution for the list of values the for loop iterates over is why bash only needs a single simple for loop construct rather than separate "for /D", "for /R", etc like Windows does.

Bash does have a C-style for loop for iterating over a series of numbers. For example, here's the alternate solution from Episode #56 that doesn't require the seq command:

for ((i=1; $i <= 1300; i++)); do file=$(printf "%04d.jpg" $i); \
[ ! -f $file ] && echo $file; done

On systems that have seq, I actually find it easier to type "for i in $(seq ...); do ..." than the C-style for loop, but your mileage, as always, may vary.

The other loop construct that bash has is a while loop. The simplest kind of while loop is an infinite loop. For example, there's our first solution in Episode #3 for watching the file count in a directory:

while :; do ls | wc -l; sleep 5; done

The ":" in this context is a special marker in bash that always evaluates to true.

However, you can use any conditional expression in the while loop that you wish. One example is the common idiom for reading data out of a file:

while read l; do ...; done </path/to/some/file

In this case, the read command returns true as long as it is able to read a line from the input file. When EOF is reached, read returns false and the loop terminates.

Here's another example with a more general conditional statement at the top of the loop. This little bit of code tries to periodically unmount a busy file system. It will continue to iterate until the umount command actually succeeds and the mount point no longer appears in the output of df:

umount $MOUNTPT
while [[ "X$(df -P $MOUNTPT | grep $MOUNTPT)" != "X" ]]; do
sleep 10
umount $MOUNTPT
done


What a lot of folks don't know is that bash also has an "until" loop. But until loops are really just while loops where the condition has been negated. So we could use an until loop to rewrite the example above very easily:

umount $MOUNTPT
until [[ "X$(df -P $MOUNTPT | grep $MOUNTPT)" = "X" ]]; do
sleep 10
umount $MOUNTPT
done

The only changes are replacing "while" with "until" and "!=" with "=".

There are also other commands in Unix that are essentially implicit iteration operators: find which iterates over a list of directories, xargs which iterates over a list of input values, and sed and awk which iterate over the lines of a file. Very often you can use these operators instead of a traditional for or while loop.