Tuesday, April 26, 2011

Episode #144: What the Hex?

Tim has a sugar high

Recently, I've been going through some exploit development tutorials. One of the exploits was for a file format vulnerability. I created my malicious file and saved it on my Windows machine, and I wanted to take a quick peek at the file and confirm that I had written the correct information. I wanted to take a look at the bytes, but not all the bytes are printable. Of course there are a million ways to view this file, but I decided to try PowerShell since I didn't have any other tools installed in this clean install of Windows.

PS C:\> get-content -encoding byte -totalcount 1000 shellcode.bin
80
83
81
82
86
87
85
156
...


This uses the good ol' Get-Content cmdlet. It reads raw bytes by setting the Encoding parameter to Byte. I used the TotalCount parameter since I only want to see the first 1000 bytes.

Of course, the output isn't nice. First, the output is in decimal form instead of hex. We can quickly fix that:

PS C:\> get-content -encoding byte -totalcount 1000 shellcode.bin | % { "{0:X2} " -f $_ }
50
53
51
52
56
57
55
9C
...


We use the ForEach-Object cmdlet (alias %) to format each byte as hex. Now to fix the second output option, the vertical display.

PS C:\> (get-content -encoding byte -totalcount 1000 shellcode.bin | % { "{0:X2}" -f $_ }) -join " "
50 53 51 52 56 57 55 9C ...


Simply wrapping the entire command in paranthesis and using the Join operator effectively rotates our output.

If you want a different format you can modify the format string. Say we wanted it with a prepended '\x'.

PS C:\> (get-content -encoding byte -totalcount 1000 shellcode.bin | % { "\x{0:X2}" -f $_ }) -join " "
\x50 \x53 \x51 \x52 \x56 \x57 \x55 \x9C ...


Not to bad huh? Of course, this is super simple for Hal. <Sigh>

Hal is still Bali Hai

Thanks for taking it easy on me as I'm coming down off my vacation, Tim. How's that competition snow shoveling working out for you?

As Tim points out, this one is easy for Unix because (being a programmers' operating system) we have built-in hex dumpers. The old standby is the "od", or "octal dump" command. Despite the name, od can output in a variety of formats, including hex:

$ od -N 48 -x /bin/bash
0000000 457f 464c 0102 0001 0000 0000 0000 0000
0000020 0002 003e 0001 0000 f210 0041 0000 0000
0000040 0040 0000 0000 0000 3a80 000e 0000 0000
0000060

Here I'm using the "-N" option to only dump out the first 48 bytes of the file, and "-x" specifies that I want the output in hex format. If I wanted the output grouped as single bytes rather than pairs, I could use a slightly different option:

$ od -N 48 -t x1 /bin/bash
0000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
0000020 02 00 3e 00 01 00 00 00 10 f2 41 00 00 00 00 00
0000040 40 00 00 00 00 00 00 00 80 3a 0e 00 00 00 00 00
0000060

The "-t" option lets you specify the output format more completely. The first letter denotes the type of output you want: "x" for hex, "o" for octal, "a" and "c" for character types, "f" for floating point, and so on. The number following the letter lets you choose the size groups you want in the output.

But since I generally only ever want hex dumps, I usually use xxd instead of od:

$ xxd -l 48 -g 1 /bin/bash
0000000: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 .ELF............
0000010: 02 00 3e 00 01 00 00 00 10 f2 41 00 00 00 00 00 ..>.......A.....
0000020: 40 00 00 00 00 00 00 00 80 3a 0e 00 00 00 00 00 @........:......

xxd is a pure hex dumper (although it can dump bits if you use "-b") so we don't need a special option to output hexadecimal. Use "-l" to specify how many bytes you want to read, and "-g" to specify the number of bytes for grouping the output. Like od, the default output grouping is 2 bytes, but I specified 1 byte output in the command above.

If you don't want all the extra frippery that the normal xxd output gives you-- like byte offsets and ASCII translation-- you can use the "-p" (plain) option to just get the raw bytes:

$ xxd -p -l 48 /bin/bash
7f454c4602010100000000000000000002003e000100000010f241000000
00004000000000000000803a0e0000000000

This format is particularly useful when you want to pipe the output into something else. For example, I could now use sed to produce the "\x" notation similar to Tim's last example:

$ xxd -p -l 48 /bin/bash | sed 's/\(..\)/\\x\1 /g'
\x7f \x45 \x4c \x46 \x02 \x01 \x01 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x02 ...
\x00 \x00 \x40 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x80 \x3a \x0e \x00 \x00 \x00 \x00 \x00

Now if I could only figure out a way to use the shell to dump myself as quickly from one side of the Pacific to the other, that would be quite a feat! Oh well, perhaps the long flight will inspire me with some new shell fu for next week.

Tuesday, April 19, 2011

Episode #143: Unicode to Shellcode

Hal checks in from Bali

Just a quick one this week. After finishing up with my posse of hard-rocking malware-crazed weasels from SANS Bali, I'm due for a little R&R and, hey, did I mention I'm in Bali? But sun, sand, surf, and sky will not prevent me from providing our faithful readers with their weekly dose of Command Line Kung Fu. Besides, teaching Lenny Zeltser's excellent Reverse Engineering Malware class always gives me ideas for new Fu.

This week's challenge is a fairly common one for malware analysts. Many times you have Javascript code from web pages, Microsoft Office documents, or PDF files that contain shellcode payloads in various different obfuscated forms. One encoding type is two-byte, little-endian unicode represented by strings such as %u8806. Today's sample goes that one better though, by introducing extra whitespace and punctuation like so:

'%u5350%u5', '251%u5756%', 'u9c55%u00', 'e8%u0', '000%u5d00', ...

Lenny came up with some tasty Perl code to extract the necessary bytes and reformat them into hex notation such as "\x06\x88" (and from there they can be converted to an executable to analyze with a tool such as shellcode2exe.py). While I'm a big fan of Perl, it's against the rules of our little blog experiment. So I started wondering what the sed solution would look like.

Wonder no more:

$ sed 's/[^0-9A-Fa-f]//g; 
s/\(..\)\(..\)/\\x\2\\x\1/g' shellcode.uni >shellcode.hex

The first substitution gets rid of anything that isn't a hex digit. Not only does this clean up the whitespace, quotes, and commas, it even eliminates all those "%u"s for us. Next we need to grab each byte-- represented by two hex digits-- and put a "\x" in front of it. The tricky part is that the original unicode was in little-endian format, so we must swap each pair of bytes as we work through the string of characters. That's why you see me grabbing two bytes at a time and outputting them with "...\2...\1..." on the right-hand side of the substitution.

Well that's all from my little corner of the South Pacific. I must go now as I hear something calling me...

Tim checks in from a spot where the @$%#$%^ snow is falling:

The approach is the same with PowerShell: 1) read the file, 2) remove characters that don't represent hex, and 3) swap the pairs of characters while adding a "\x".

PS C:\> Get-Content shellcode.uni |
ForEach-Object { $_ -replace '[^0-9A-F]', '' } |
ForEach-Object { $_ -replace '(..)(..)', '\x$2\x$1' }


\x50\x53\x51\x52\x56\x57\x55\x9c


Notice the command above uses single quotes. That is because PowerShell will expand any strings inside double quotes before our Replace method has a chance to do any replacing. This means that PowerShell would try to convert $1 into a variable and not pass the literal string to the Replace method. If you happened to use double quotes you would get output like this:

PS C:\> Get-Content shellcode.uni |
ForEach-Object { $_ -replace '[^0-9A-F]', '' } |
ForEach-Object { $_ -replace '(..)(..)', "\x$2\x$1" }


\x\x\x\x\x\x\x\x\x\x...


A longer explanation can be found toward the end of Episode #126.

Now we have the output we want, but let's shorten up the command by using aliases:
PS C:\> gc shellcode.uni |
% { $_ -replace '[^0-9A-F]', '' } |
% { $_ -replace '(..)(..)', '\x$2\x$1' }


We can also remove the ForEach-Object cmdlets and shorten it further:

PS C:\> ((gc shellcode.uni) -replace '[^0-9A-F]', '') -replace '(..)(..)', '\x$2\x$1'


We have one minor problem, if shellcode.uni contains line breaks then each line will be read separately and the breaks won't be removed. If there are multiple lines of text in the file then Get-Content will return an array of objects instead of a multiline string.

PS C:\> (gc shellcode.uni).getType()

IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Object[] System.Array


We can fix this by converting it to a string.

PS C:\> ([string](gc shellcode.uni)).getType()

IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True String System.Object


Our robust shortened version of the command looks like this:

PS C:\> ([string](gc shellcode.uni) -replace '[^0-9A-F]', '') -replace '(..)(..)', '\x$2\x$1'


The final piece is to output our results, and we can use the Out-File cmdlet to do it. However, the default output encoding for PowerShell is unicode which isn't what a disassembler is expecting since it isn't the shellcode we want. We have to tell PowerShell to use ASCII by using the -Encoding parameter.

PS C:\> ... | Out-File shellcode.hex -Encoding ASCII


This our final command looks like this:

PS C:\> ([string](gc shellcode.uni) -replace '[^0-9A-F]', '') -replace '(..)(..)', '\x$2\x$1' |
Out-File shellcode.hex -en ASCII


Well that's all from my frozen corner of the Minnesota. I must go now as I hear something calling me...

Tuesday, April 12, 2011

Episode #142: XML in Shell

Hal gets email from an old friend

We got another email from Ryan Tuthill. We'd been counting the days until his next missive, and suddenly, like the promise of Spring, there it was in our inboxes.

Ryan had been doing a little web surfing and had come across this article for creating a rotating slide show of images on your desktop in Ubuntu. Now I have to admit that my first thought upon reading that article was, "Holy carp! Why didn't they just set up a cron job to change the image on a regular basis instead of mucking around with all that XML nonsense?" But Ryan's question was perhaps more interesting: "Why are they entering all that XML by hand when there ought to be a command line which could generate it for you?"

So let's look at the problem Ryan has set for us. Aside from a header and a footer which we can generate by hand, we need to create a series of XML blocks like this:

<static>
<duration>1795.0</duration>
<file>/some/path/to/img1.png</file>
</static>

<transition>
<duration>5.0</duration>
<from>/some/path/to/img1.png</from>
<to>/some/path/to/img2.png</to>
</transition>

The values that are changing are the image file names in each block.

So it seems we need a list of files and a loop. I think I'll take a cue from our last Episode and just generate my list of files like this:

$ find ~/Documents/My\ Pictures -type f | egrep '\.(gif|jpg|png)$'
/home/hal/Documents/My Pictures/aus-pics/2007-12-06/2007 12 06 Australia 001.jpg
/home/hal/Documents/My Pictures/aus-pics/2007-12-06/2007 12 06 Australia 012.jpg
/home/hal/Documents/My Pictures/aus-pics/2007-12-06/2007 12 06 Australia 006.jpg
,,,

Now I could do an output substitution here with "$(...)", but I'm worried about the size of my input, so I think I'll just pipe things into a nice little while loop:

$ find ~/Documents/My\ Pictures -type f | egrep '\.(gif|jpg|png)$' |
while read file; do
echo "<transition><duration>5.0</duration><from>$prev</from><to>$file</to></transition>";
echo "<static><duration>1795.0</duration><file>$file</file></static>";
prev=$file;
done

<transition><duration>5.0</duration><from></from><to>...001.jpg</to></transition>
<static><duration>1795.0</duration><file>...001.jpg</file></static>
<transition><duration>5.0</duration><from>...001.jpg</from><to>...012.jpg</to></transition>
<static><duration>1795.0</duration><file>...012.jpg</file></static>
<transition><duration>5.0</duration><from>...012.jpg</from><to>...006.jpg</to></transition>
<static><duration>1795.0</duration><file>...006.jpg</file></static>
...

Hey, the XML doesn't have to be beautiful, it just has to be correct.

What my loop does here is track two variables: $file which is the next picture we'll be doing a <transition> block to, and $prev which is the last picture I did a <static> block for. But I cheat a little bit on the first time through the loop. I know I'm not going to have $prev set at this point, so my first <transition> block is going to be erroneous. But I go ahead and dump it out anyway, and follow it up with a (correct) <static> block for the first file name. After I've "primed the pump" so to speak, the <transition> and <static> blocks for the rest of the input will be just fine.

So the first line of output is junk. But we can get rid of the junk line easily with a little post-processing:

$ find ~/Documents/My\ Pictures -type f | egrep '\.(gif|jpg|png)$' |
while read file; do
echo "<transition><duration>5.0</duration><from>$prev</from><to>$file</to></transition>";
echo "<static><duration>1795.0</duration><file>$file</file></static>";
prev=$file;
done | tail -n +2 >slideshow.xml

$ head slideshow.xml
<static><duration>1795.0</duration><file>...001.jpg</file></static>
<transition><duration>5.0</duration><from>...001.jpg</from><to>...012.jpg</to></transition>
<static><duration>1795.0</duration><file>...012.jpg</file></static>
<transition><duration>5.0</duration><from>...012.jpg</from><to>...006.jpg</to></transition>
<static><duration>1795.0</duration><file>...006.jpg</file></static>
...

A little "tail -n +2" after the while loop to scrape off the first line and we're golden. And then all we have to do is redirect our output into a file. You can edit this file by hand and add the necessary additional XML tags per the original article.

So the trick here is to be willing to make a "mistake" at the beginning of your input in order to simplify your loop. You can always clean up after yourself later. And speaking of cleaning up after your own mistakes, here's Tim who's had plenty of practice in that area...

Tim never cleans up after himself

Hal's XML "handling" is just text with extra >'s and <'s thrown in. A few months ago I asked about doing some XML parsing, and his archaic shell couldn't handle it. So I'll play along and create some XML his way.

In brief, we take the same approach as Hal and start off by finding all of the images we want:

PS C:\Users\tm\Pictures> Get-ChildItem -Recurse -Include *.jpg,*.png,*.gif

C:\Users\tm\Pictures\aus-pics\2007-12-06\2007 12 06 Australia 001.jpg
C:\Users\tm\Pictures\aus-pics\2007-12-06\2007 12 06 Australia 002.jpg
C:\Users\tm\Pictures\aus-pics\2007-12-06\2007 12 06 Australia 003.jpg
...


Here is the same command, but using a Get-ChildItem alias and shortened parameter names:

PS C:\Users\tm\Pictures> ls -r -i *.jpg,*.png,*.gif


This gives us all our file objects. From here the quickest way to create our output is to do exactly what Hal did, generate the output, plus mistake, and then clean up.

PS C:\Users\tm\pictures> ls -r -i *.jpg,*.png,*.gif | % {
echo "<transition><duration>5.0</duration><from>$prev</from><to>$_</to></transition>";
echo "n<static><duration>1795.0</duration><file>$_</file></static>"; $prev = $_
} | select -Skip 1


We use Echo (alias of Write-Output) to output each line. The current pipeline object ($_) represents our iterator as we go through our ForEach-Object (alias %) loop. We then use Select-Object (alias select) with the Skip parameter to skip (duh) the first "mistake" line.

Outputting text is fine since we are creating the XML from scratch. Hal would be crying if we actually needed to manipulate or parse XML. We can treat XML as just another object and access the various elements and properties.

PS C:\> $xml = [xml](Get-Content file.xml)
PS C:\> $xml.objects.static
duration file
-------- ----
1795.0 2007 12 06 Australia 001.jpg
1795.0 2007 12 06 Australia 002.jpg
1795.0 2007 12 06 Australia 003.jpg


Oh, if only Hal and his shell from the 1900's could keep up.

Tuesday, April 5, 2011

Episode #141: Bonus Fu

Tim reads the fine print

The mail we received last week included a second part. The sender, Mr Thomas the Anonymous, asked:

Bonus question: we have random files all over the place create by Excel/Word that were exactly eight characters in length and have no file extension such as D0F5KLM3. What's the syntax for use for looking? If I do a "DIR ??????" on my root drive, it appears to be showing all directory names six characters in length and shorter, when I am only looking for those that are exactly six characters in length?

One important piece that Tom the Masked One missed is the /A:-D option to filter out results that have the directory attribute set. This will leave us with only files. Here is the resulting command with the bare format (/B) and recursive (/S) options:

C:\> dir /b /s /a:-d ??????
C:\Program Files\Wireshark\manuf
C:\Program Files\Wireshark\etc\gtk-2.0\gtkrc
C:\Program Files\Wireshark\help\toc
...


The output isn't quite what we wanted, as we are getting files that are shorter than 6 characters.

As it turns out, the "?" wildcard character represents any character...including no character. That means ?????? will match a, aa, aaa, aaaa, aaaaa, and aaaaaa. However, since you can't have "no character" at the beginning (??.txt won't match a.txt) or in the middle (a??a.txt won't match aa.txt) of a string, this special case only occurs when the question mark is at the end of the search string.

We can use our initial command to narrow down our search, but we'll need to use a more aggressive filter to get exactly what we want. FindStr's regular expression filtering to the rescue.

C:\> dir /b /s /a:-d ?????? | findstr /E
"\\[A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9]"


C:\Users\theawesomeone\Desktop\randomdir\D0F5KL
...


The /E option matches the string at the end of the line. The regular expression looks for a backslash (requires an escape character, which is a backslash), followed by 6 characters of A-Z and/or 0-9. Unfortunately, this command is a bit verbose since FindStr doesn't allow us to use [A-Z0-9]{6} to specify 6 A-Z and/or 0-9 characters. To do that we have to use...

PowerShell

Unlike CMD.EXE, PowerShell's version of the "?" wildcard requires a character, so it will not match 5 character file names. The first pass of our command is:

PS C:\> Get-ChildItem -Recurse -Include ??????


This command will do a recursive listing and only return objects with a 6 character name; however, the results still include directories and 6 character file names that include a dot. We'll have to do more filtering further down the pipeline. We could do it all further down the pipeline, but the earlier we do the filter the faster the command will be.

Here is our final command:

PS C:\> Get-ChildItem -Recurse -Include ?????? | 
Where-Object { -not $_.PSIsContainer -and $_.Name -match "[A-Z0-9]{6}" }


This further filters our results for objects that are not containers (files) and objects whose name match our regular expression. This regular expression looks for 6 character names that only contain A-Z and/or 0-9.

Per usual, we can shorten up this command significantly.

PS C:\> ls -r -i ?????? | 
? { !$_.PSIsContainer -and $_.Name -match "[A-Z0-9]{6}" }


Now Tom the Hunter of Office Files can clean up some space.

Looks like I scored some bonus points, and in two categories even. Hal, any scoring in *n?x land?

Hal is blue

Given that I'm on the road and only seeing my wife about once every other week, there's not much scoring going on at all. And thanks for rubbing salt into that particular wound, partner.

I'm also a little confused about the challenge. Mystery Tom starts out saying he's looking for eight character file names, and then faster than you can say "original Blade Runner theatrical release", we're suddenly looking for six character file names. Well, let's continue with six character file names because it makes our examples shorter. I'm sure you'll be able to figure out how to do eight character file names without much trouble.

In Unix-land, the "?" actually matches exactly one character. So we can do:

$ find testing/ -type f -name '??????'
[...]
testing/Mac-PropertyList-1.33/t/dict.t
testing/Mac-PropertyList-1.33/t/time.t
testing/Mac-PropertyList-1.33/t/load.t
testing/Mac-PropertyList-1.33/README
testing/Mac-PropertyList-1.33/examples/README
[...]

The only problem is that "?" matches any character, including dots and other punctuation. If we really only want to match files containing upper-case letters and numbers, then we have to use an expression like "[A-Z0-9]". And just like CMD.EXE, find won't let us do something like "[A-Z0-9]{8}". So we're left with:

$ find testing/ -type f -name '[A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9]'
[...]
testing/Mac-PropertyList-1.33/README
testing/Mac-PropertyList-1.33/examples/README
[...]

But rather than doing all that typing, let's just use egrep instead:

$ find testing/ -type f | egrep '/[A-Z0-9]{6}$'
[...]
testing/Mac-PropertyList-1.33/README
testing/Mac-PropertyList-1.33/examples/README
[...]

Notice that I'm using "/" at the front of the regex, and "$" at the end to ensure that what I'm matching against is only the final file name component of each line.

So there are my two (OK, three) solutions. Score!