Tuesday, February 16, 2010

Episode #82: Hippy Barfday Spew Do You?

Ed's Got Sed (well, a little bit of it at least):

In a celebratory mood, I belt out:
C:\> cmd.exe /v:on /c "for /f "delims=" %i in ('echo Hippy barfday spew do you!')
do @set stuff=%i& echo !stuff! & set stuff=!stuff:i=a! & echo !stuff! & set
stuff=!stuff:arf=irth! & echo !stuff! & set stuff=!stuff:spew=to! & echo !stuff!
& set stuff=!stuff: do=! & echo !stuff! & set stuff=!stuff:yo=f! & echo. & echo
Hippy barfday spew do you!
Happy barfday spew do you!
Happy birthday spew do you!
Happy birthday to do you!
Happy birthday to you!

Happy birthday to fu!

If you couldn't tell, we're celebrating the FIRST BIRTHDAY of this blog! Yes, we've already made it through one orbit of that big ball of gas at the center of our Solar System, and we're looking forward to even more. It was one year ago today that three merry command line bandits decided to take a fun little brawl of command-line one-upmanship from Twitter to blog format, so we could get into deeper fu. As a birthday present, my command above shows how we can perform string substitutions at the Windows cmd.exe command line.

I've never hid the fact that I've often longed for the sed command, which allows for nifty stream editing. Although it's got a ton of flexible features, one of sed's most common uses is replacing a string with another string in a stream of data, such as Standard Output. What's not to like? Well, the fact that we don't have a built-in equivalent in cmd.exe is one thing that's a bummer.

So, I got to thinking about this problem the other day, when I realized that I could do the substitution and replacement thing using string altering options in cmd.exe. I could take the data I want to alter, put it in a variable, and then change all occurrences of given strings to other strings using the notation:
set string=%string:original=replacement%

Or, if we use delayed environment variable expansion, we rely on:
set string=!string:original=replacement!

I've done just that above, starting by turning on delayed environment variable expansion (cmd.exe /v:on) to execute the command (/c) of FOR /F. My FOR /F command is designed to take the output of my echo command and put it in the variable %i. Alternatively, if I wanted to change text inside of a file, I could have used the "type filename" command instead of echo, iterating through each line of the file making substitutions. I turn off default parsing on spaces and tabs using "delims=", so that my whole line of data gets shoved into %i. This is all very routine stuff for life inside of cmd.exe, even on your barf^h^h^h^hbirthday.

Then, I move my iterator value into a variable that I can take action on (set stuff=%i). I now can use my string replacement technique to start altering that variable, in a shallow and pale (but useful) mimicking of but one of the features of sed.

I can change individual characters into other characters, such as "i" to "a":
set stuff=!stuff:i=a!

I can change multi-character substrings like barf into birth:
set !stuff:arf=irth!

I can replace whole words, changing "spew" into "to":
set stuff=!stuff:spew=to!

I can delete whole words, like " do":
set stuff=!stuff: do=!

This one might be worth a note. Here, I'm replacing something with nothing by placing nothing after the equals sign. The item I'm replacing is " do", with a space in it. Otherwise, I'd have a double space left behind. And, yes, I could have alternatively replaced "do " (with a space after it) with nothing. Or, I could have replaced " do " with " " using !stuff: do = !. There are many options.

And I can even take bigger strings and replace them with smaller strings:

This is really cool and useful, but it does have some limitations. Note that every instance of the substring I specify is replaced, and there really is no means for just changing, for example, the first occurrence of the substring. Also, this doesn't work for non-printable ASCII characters. You have to be able to type it to get it into that syntax. I've also gotta shove everything into a string to make this work, but that's not so bad.

So, there you have it... a highly obfuscated command for wishing ourselves Happy Birthday.

Whatcha got Hal & Tim?

Tim blows out the candles:

Tim let's it rip:
PS C:\> "Hippy barfday spew do you!" | Tee-Object -Variable stuff; 
$stuff -replace "i","a" | Tee-Object -Variable stuff;
$stuff -replace "arf","irth" | tee -var stuff;
$stuff -replace "spew","to" | tee -va stuff;
$stuff -replace " do","" | tee -va stuff;
Write-Object; $stuff -replace "yo","f"

Hippy barfday spew do you!
Happy barfday spew do you!
Happy birthday spew do you!
Happy birthday to do you!
Happy birthday to you!

Happy birthday to fu!

One orbit around that big ball of gas huh? I'm sure there is joke related to Ed and his love of beans, but we are here to celebrate not disgust.

For those of you who followed the blog since the beginning, the original third bandit was Paul Asadoorian, not me. For those The Three Stooges fans out there, I guess you call me Curly, and Paul would be Shemp. Although, I can't decide if that would make Hal Moe or Larry. Let's get back to business and this week's PowerShell nyuk, nyuk, nyuk.

It is rather fitting that PowerShell is second this week. Cmd's string replacement is pretty weak and the syntax is terrible, while Linux is the opposite. PowerShell is somewhere in between, but much closer to the linux side of things.

Before we dig into the entire command above, we'll first do the string substituion without all the extra output.

PS C:\> "Hippy barfday spew do you!" -replace "i","a" 
-replace "arf","irth" -replace "spew","to" -replace " do",""
-replace "yo","f"

Happy birthday to fu!

The Replace operator is used to replace strings, duh! By default, the Replace operator is case insensitive, but to be explicitly case insensitive use the IReplace operator. For a case senstitive replace use the CReplace operator.

Now, let's do all of Ed's tricks:

Change individual characters into other characters, such as "i" to "a":
... -replace "i","a"

Change multi-character substrings like barf into birth:
... -replace "arf","irth"

Replace whole words, changing "spew" into "to":
... -replace "spew","to"

Delete whole words, like " do":
... -replace " do",""

We can even to tricks that Ed can't, by using regular expressions:
Replace "y" with "f" but only if it is the first character in a word:
PS C:\> "Happy birthday to do yu!" -replace "\sy"," f"
Happy birthday to do fu!

Swap the first two words in a line:
PS C:\> "birthday Happy to fu!" -replace "^(\w+)\s(\w+)","`$2 `$1"
Happy birthday to fu!

The last command uses regular expression groups. We won't go into the depths of regex, but in short, "\w+" will grab a word and "\s" will grab a space. The caret (^) is used to anchor the search to the beginning of the string, and the parentheses are used to define the groups. In the replacment portion we use `$1 and `$2 to represent (respectively) the first and second groups (words) found. Since we want to output them in reverse order we use "`$2 `$1" to put the second word before the first word.

Back to the original command:

PS C:\> "Hippy barfday spew do you!" | Tee-Object -Variable stuff; 
$stuff -replace "i","a" | Tee-Object -Variable stuff;
$stuff -replace "arf","irth" | tee -var stuff;
$stuff -replace "spew","to" | tee -va stuff;
$stuff -replace " do","" | tee -va stuff;
Write-Object; $stuff -replace "yo","f"

We want to display each change as it happens. To pull this off we will have to use the Tee-Object cmdlet. Similar to the linux's tee command, Tee-Object takes the command output and saves in in a file or variable, as well as sending it down the pipeline or to the console.

If we break it down, this command has three parts that are repeated.

<input object> | Tee-Object -Variable stuff | $stuff
-replace <original> <replacment>

We start with the input object "Hippy barfday spew do you!" and pipe it into Tee-Object (alias tee). The only reason we use Tee-Object is so we can display the output and work with it further down the pipeline. After tee, we do the replace. The output of the previous portion becomes the input for the next. Rinse and repeat.

Towards the end of command we throw in the Write-Object cmdlet (alias write, echo) to add the extra line break.

One quick thing to note, when using the Tee-Object cmdlet's Variable parameter, do not use a $. The parameter accepts a string, which is the name of the variable.

So that a more lucid version of Ed's highly obfuscated command, and now it is time for Hal to hand out the Birthday spankings.

MoeHal sedsSays

Huh, I was sure Ed was Curly. At Ed's current rate of hair loss, he's going to resemble Curly before too much longer.

Hmmm, my Windows colleagues are desperately trying to achieve some of the functionality of sed in their silly little command shells. Here's a hint guys: Cygwin is your friend. Then you could do things like this:

$ echo 'Hippy Barfday Spew Do You!' | 
sed 's/\(H\)i\(ppy B\)arf\(day \)Spew D\(o \)Yo\(u!\)/\1a\2irth\3T\4F\5/'

Happy Birthday To Fu!

The whole trick here is leveraging sub-expressions-- the text enclosed in "\(...\)"-- in the first part of the sed substitution expression. Essentially I'm using the sub-expressions to "save" the bits of the line that I want to keep. You'll notice the bits of our input string that I don't want are carefully kept outside the "\(...\)" boundaries.

You can refer to the contents of the sub-expressions on the righthand side of the substitution using the \1, \2, ... variables. Sub-expressions are numbered left to right by opening parenthesis-- this is an important distinction when you start doing crazy stuff like nested sub-expressions. In this case, however, all I have to do is output the contents of my sub-expressions in order with appropriate text in between them to form our final message.

So really I'm just using sed sub-expressions like a cookie cutter here to chop out the bits of the line I want. This functionality makes sed very useful as a surgical tool for reformatting text into a regular format. Another example of this comes from one of our earliest Episodes where I showed ShempPaul how to bring the sed madness to parse the output of the host command.

Now the problem is that these sed expressions always end up looking awkward because of all of the backwhacks floating around. If you have GNU sed handy, you can use the "-r" (extended regex) option. This allows you do create sub-expressions with "(...)", saving yourself a lot of backwhack abuse:

$ echo 'Hippy Barfday Spew Do You!' | 
sed -r 's/(H)i(ppy B)arf(day )Spew D(o )Yo(u!)/\1a\2irth\3T\4F\5/'

Happy Birthday To Fu!

Still ugly, but definitely more readable.

Thanks everybody for taking time out of your busy lives to keep up with our humble little blog in the past year. We'll save you a bit of the birthday cake!