Tuesday, March 8, 2011

Episode #137: Free-base64-ing

Hal spreads the fu

Lately I've been teaching Lenny Zeltser's Reverse Engineering Malware course for SANS. It's chock full of great information and is a lot of fun to teach. Plus there are all kinds of opportunities for me to bust out the Command Line Kung Fu.

For example, in one exercise we analyze the behaviors of a trojan that's receiving command and control messages via base64 encoded strings inside of web page comments. The comments themselves are easy enough to extract from a memory image of the suspicious process:

$ strings proc-memory.img | grep '<!-- '
<!--
<!-- BgAAAA== --><br><br><html>
<!-- Y21kIC9jIGRlbCBzeXN0ZW1pbmZvLnR4dA== --><br><br><html>
<!-- YzpcYm9vdC5pbmk= --><br><br><html>
<!-- Y21kIC9jIGRlbCBwcm9jZXNzZXMudHh0 --><br><br><html>
<!-- cHJvY2Vzc2VzLnR4dA== --><br><br><html>
<!-- BAAAAA== --><br><br><html>
<!-- AgAAAA== --><br><br><html>
<!-- AgAAAA== --><br><br><html>
<!-- BAAAAA== --><br><br><html>
...

As you can see, once we decode the strings there is going to be a certain amount of duplication. So what we'd like to do in order to get an idea of the capabilities of this trojan is to dump out a list of the unique, decoded command strings.

Happily, most Linux distros these days include the base64 command line utility which does both encoding and decoding. However, we need to extract just the base64 encoded text from the comments before feeding it into the decoding routine. So we'll modify our command line a little bit and use awk instead of grep:

$ strings proc-memory.img | awk '/<!-- / {print $2}' | base64 -d
cmd /c del systeminfo.txtc:\boot.inicmd /c del processes.txtprocesses.txt...

I'm using the built-in pattern matching operator in awk ("/.../") to replace grep, and then I simply "{print $2}" to extract the base64 encoded text out of each comment. That output simply gets fed into "base64 -d" which decodes anything it gets on the standard input.

The only problem here is that our base64 encoded text doesn't include newlines. So our output gets all run together. This is going to be a problem when we want to extract the unique strings from the output. I decided to insert a loop so that I could format the output a little more nicely:

$ strings proc-memory.img | awk '/<!-- / {print $2}' | 
while read comment; do echo $comment | base64 -d; echo; done

...
cmd /c del systeminfo.txt
c:\boot.ini
cmd /c del processes.txt
processes.txt
...

My while loop just feeds one comment at a time into the base64 program, and then uses echo to output a newline after each line of output. It's less efficient, but the output is more readable.

Now all we have to do to get the unique strings is pipe the whole mess into "sort -u":

$ strings proc-memory.img | awk '/<!-- / {print $2}' | 
while read comment; do echo $comment | base64 -d; echo; done | sort -u

...
c:\boot.ini
cmd /c copy /B /Y *dump + 201* pwdump.txt
cmd /c del 201*
cmd /c del drivers.txt
cmd /c del *dump*
...

In the course materials, Lenny actually gives the students a Python program for accomplishing this task. But we don't need no stinking Python when we've got the Unix shell! Heck, I'm pretty sure even PowerShell can handle this one, right Tim?

Tim spreads the Brie

We can handle it! Yes we can! Although...

Windows (still) doesn't have a built in strings command, but we can use Select-String with Regular Expressions to find the strings we want.

PS C:\> gc proc-memory.img | Select-String -AllMatches '(?<=<!-- )[a-zA-Z0-9+/]+?=*(?= -->)' |
% { $_.Matches } | % { $_.Value }

BgAAAA==
Y21kIC9jIGRlbCBzeXN0ZW1pbmZvLnR4dA==
YzpcYm9vdC5pbmk=
Y21kIC9jIGRlbCBwcm9jZXNzZXMudHh0
cHJvY2Vzc2VzLnR4dA==
BAAAAA==
AgAAAA==
AgAAAA==
BAAAAA==
Y21kIC9jIHNjIHF1ZXJ5ID4gc2MudHh0
...
Get-Content (alias gc) is used to output the file, which is then piped into Select-String. The Select-String cmdlet's -AllMatches switch returns all matches in a line, not just the first (default). This cmdlet also accepts regular expressions for searching, and we used a regular expression to find the begin comment tag (<!--), followed by base64 characters (A-Z, a-z, 0-9, +, /), optional padding (=), and finally the end comment tag (-->). Since we don't actually want the comment tags, rather just the text between the tags, we can use a regular expression look behind ((?<=) and a regular expression look ahead ((?=)) to make sure the tags exist but not actually select that text.

The Select-String cmdlet populates the $matches variable, which contains all the matches on a line. A ForEach-Object cmdlet (alias %) is used to access each match, which is piped into another ForEach-Object cmdlet that outputs the matched strings.

Now that we have the strings, but we need to base64 decode them. Unfortunately, there isn't a native command to this. There are a few ways to add this feature to the shell, but we'll use PowerShell's ability to access the .NET framework to accomplish this portion of our task.

To base64 decode a string, we use this command:

PS C:\> [text.encoding]::utf8.getstring([convert]::FromBase64String("Y21kIC9jIGRlbCBzeXN0ZW1pbmZvLnR4dA=="))
cmd /c del systeminfo.txt
We can then combine these two command to output the decoded versions of all the matched strings.

PS C:\> gc proc-memory.img | Select-String -AllMatches '(?<=<!-- )[a-zA-Z0-9/+]+?=*(?= -->)' | %
{ $_.Matches } | % { [text.encoding]::utf8.getstring([convert]::FromBase64String($_.Value)) }

...
cmd /c del systeminfo.txt
c:\boot.ini
cmd /c del processes.txt
processes.txt
...
There is a lot of output, so let's do what Hal did by sorting the output and removing any duplicates.

PS C:\> <previous command> | Sort-Object -Unique
...
c:\boot.ini
cmd /c copy /B /Y *dump + 201* pwdump.txt
cmd /c del *dump*
cmd /c del 201*
cmd /c del drivers.txt
cmd /c del ipconfig.txt
...
To answer Hal's qustion, yes, "even" PowerShell can do it!