Tuesday, December 28, 2010

Episode #127: Making a Difference

Hal went to school

I recently got the opportunity to sit in on (fellow SANS instructor) Lenny Zeltser's "Reverse Engineering Malware" class. It's a terrific course, and I highly recommend it.

During the material on memory analysis, we were comparing the output of "volatility pslist" and "volatility psscan2". It's relatively straightforward for rootkits to hide themselves from pslist, but psscan2 does a much more thorough job of finding the hidden processes. So the differences in the output are always very interesting to the analyst. Here's an example of what I mean:

$ volatility pslist -f memory.img
Name Pid PPid Thds Hnds Time
System 4 0 55 260 Thu Jan 01 00:00:00 1970
smss.exe 540 4 3 21 Thu Jan 28 16:11:40 2010
csrss.exe 604 540 12 363 Thu Jan 28 16:11:46 2010
lsass.exe 684 628 18 341 Thu Jan 28 16:11:47 2010
vmacthlp.exe 836 672 1 24 Thu Jan 28 16:11:47 2010
svchost.exe 848 672 18 201 Thu Jan 28 16:11:47 2010
svchost.exe 1024 672 51 1178 Thu Jan 28 16:11:47 2010
svchost.exe 1072 672 4 75 Thu Jan 28 16:11:47 2010
svchost.exe 1132 672 15 212 Thu Jan 28 16:11:48 2010
spoolsv.exe 1476 672 10 115 Thu Jan 28 16:11:49 2010
explorer.exe 1592 1572 12 4021 Thu Jan 28 16:11:50 2010
VMwareUser.exe 1656 1592 8 416 Thu Jan 28 16:11:50 2010
VMwareService.e 1996 672 3 1026 Thu Jan 28 16:11:58 2010
wscntfy.exe 1396 1024 1 27 Thu Jan 28 16:12:03 2010
taskmgr.exe 1624 628 3 20201 Tue Feb 02 02:45:05 2010
mike022.exe 1956 672 2 30 Tue Feb 02 03:25:29 2010
wordpad.exe 1992 1260 4 102 Tue Feb 02 22:17:03 2010
calc.exe 828 1592 1 26 Thu Feb 04 00:01:00 2010
cmd.exe 968 1592 1 32 Thu Feb 04 00:01:13 2010
wordpad.exe 2008 1256 5 101 Thu Feb 04 00:02:56 2010
$ volatility psscan2 -f memory.img
PID PPID Time created Time exited Offset PDB Remarks
------ ------ ------------------------ ------------------------ ---------- ---------- ----------------

932 672 Thu Jan 28 16:11:47 2010 0x01ea3558 0x082c0100 svchost.exe
1744 848 Thu Feb 04 00:02:53 2010 Thu Feb 04 00:04:23 2010 0x01eaea88 0x082c0380 wmiprvse.exe
1132 672 Thu Jan 28 16:11:48 2010 0x01eb4970 0x082c0160 svchost.exe
1956 672 Tue Feb 02 03:25:29 2010 0x020155d8 0x082c02c0 mike022.exe
1072 672 Thu Jan 28 16:11:47 2010 0x02016978 0x082c0140 svchost.exe
1172 1592 Tue Feb 02 02:40:48 2010 0x0204c850 0x082c01c0 cmd.exe
1476 672 Thu Jan 28 16:11:49 2010 0x0209db38 0x082c01a0 spoolsv.exe
1996 672 Thu Jan 28 16:11:58 2010 0x021f0da0 0x082c0180 VMwareService.e
1664 1592 Thu Jan 28 16:11:50 2010 0x021feb88 0x082c0240 msmsgs.exe
1024 672 Thu Jan 28 16:11:47 2010 0x02202880 0x082c0120 svchost.exe
604 540 Thu Jan 28 16:11:46 2010 0x0221f020 0x082c0040 csrss.exe
1624 628 Tue Feb 02 02:45:05 2010 0x02256da0 0x082c02e0 taskmgr.exe
272 1820 Thu Feb 04 00:00:55 2010 0x02293b08 0x082c0300 wordpad.exe
1012 672 Thu Jan 28 16:12:02 2010 0x023a78b0 0x082c0260 alg.exe
1656 1592 Thu Jan 28 16:11:50 2010 0x023a9c28 0x082c0220 VMwareUser.exe
1648 1592 Thu Jan 28 16:11:50 2010 0x023ae980 0x082c0200 VMwareTray.exe
848 672 Thu Jan 28 16:11:47 2010 0x023b3020 0x082c00e0 svchost.exe
1748 1592 Thu Feb 04 00:02:10 2010 Thu Feb 04 00:06:19 2010 0x0240b9a0 0x082c03a0 cmd.exe
836 672 Thu Jan 28 16:11:47 2010 0x02412b58 0x082c00c0 vmacthlp.exe
672 628 Thu Jan 28 16:11:47 2010 0x02448cf8 0x082c0080 services.exe
968 1592 Thu Feb 04 00:01:13 2010 0x024707e8 0x082c0340 cmd.exe
684 628 Thu Jan 28 16:11:47 2010 0x02483da0 0x082c00a0 lsass.exe
1992 1260 Tue Feb 02 22:17:03 2010 0x02491130 0x082c0360 wordpad.exe
1396 1024 Thu Jan 28 16:12:03 2010 0x02492d78 0x082c0280 wscntfy.exe
2008 1256 Thu Feb 04 00:02:56 2010 0x02494988 0x082c03e0 wordpad.exe
828 1592 Thu Feb 04 00:01:00 2010 0x024c86b8 0x082c02a0 calc.exe
1592 1572 Thu Jan 28 16:11:50 2010 0x024ddda0 0x082c01e0 explorer.exe
540 4 Thu Jan 28 16:11:40 2010 0x024f8368 0x082c0020 smss.exe
628 540 Thu Jan 28 16:11:46 2010 0x025314e8 0x082c0060 winlogon.exe
4 0 0x025c8830 0x00319000 System

Visually you can see that the psscan2 output lists several more processes than pslist, but just using your eyeballs it can be difficult to figure out exactly what the differences are. Seems like a job for command-line kung fu!

My first thought was to simply extract the list of .EXEs from each command and diff them. In order to do the diff properly, I'll need to sort them into canonical order, but that's no problem. Here's how we manage the output from pslist:

$ volatility pslist -f memory.img | tail -n +2 | awk '{print $1}' | sort
calc.exe
cmd.exe
csrss.exe
...

I use tail to chop off the header line, then awk to extract the name of the .EXE from the first column, and finally pipe the whole thing into sort.

Dealing with the psscan2 output is very similar:

$ volatility psscan2 -f memory.img | tail -n +4 | awk '{print $NF}' | sort
alg.exe
calc.exe
cmd.exe
...

In this case, there are three header lines we need to skip. Also the .EXE name is in the last column of output-- "print $NF" is a useful awk idiom for printing the value in the last column.

So now we need to diff the output of these two commands. We could do this by creating temporary files, but why bother when have the magic bash "<(...)" syntax that lets us substitute command output in a place where a command would normally be looking for a file name:

diff <(volatility psscan2 -f memory.img | tail -n +4 | awk '{print $NF}' | sort) \
<(volatility pslist -f memory.img | tail -n +2 | awk '{print $1}' | sort)

1d0
< alg.exe
4,5d2
< cmd.exe
< cmd.exe
10,11d6
< msmsgs.exe
< services.exe
18d12
< svchost.exe
23d16
< VMwareTray.exe
25,27d17
< winlogon.exe
< wmiprvse.exe
< wordpad.exe

Wicked! There are 10 processes that appear in the psscan2 output that don't show up in the pslist output. Since we don't see any lines starting with ">" there are no processes in the pslist output that don't show up in psscan2-- this is what we'd expect, but it's always nice to get confirmation.

The only problem here is that as we got further into the in-class exercises, I realized I really wanted all of the extra detail about each of the hidden processes from the psscan2 output. For example, the hex offset values end up being very useful, and I'd like to know exactly which two of the three command.exe processes are the hidden ones. Let me show you the command line I came up with and then explain it to you:

$ join -v 1 -1 1 -2 2 \
<(volatility psscan2 -f memory.img | tail -n +4 | sort -n -k 1,1) \
<(volatility pslist -f memory.img | tail -n +2 | sort -n -k2,2)

272 1820 Thu Feb 04 00:00:55 2010 0x02293b08 0x082c0300 wordpad.exe
628 540 Thu Jan 28 16:11:46 2010 0x025314e8 0x082c0060 winlogon.exe
672 628 Thu Jan 28 16:11:47 2010 0x02448cf8 0x082c0080 services.exe
932 672 Thu Jan 28 16:11:47 2010 0x01ea3558 0x082c0100 svchost.exe
join: file 1 is not in sorted order
join: file 2 is not in sorted order
1012 672 Thu Jan 28 16:12:02 2010 0x023a78b0 0x082c0260 alg.exe
1172 1592 Tue Feb 02 02:40:48 2010 0x0204c850 0x082c01c0 cmd.exe
1648 1592 Thu Jan 28 16:11:50 2010 0x023ae980 0x082c0200 VMwareTray.exe
1664 1592 Thu Jan 28 16:11:50 2010 0x021feb88 0x082c0240 msmsgs.exe
1744 848 Thu Feb 04 00:02:53 2010 Thu Feb 04 00:04:23 2010 0x01eaea88 0x082c0380 wmiprvse.exe
1748 1592 Thu Feb 04 00:02:10 2010 Thu Feb 04 00:06:19 2010 0x0240b9a0 0x082c03a0 cmd.exe

In this case I'm using join rather than diff because the output of the two commands is so differently formatted. Essentially I'm doing a join on the PID columns of the psscan2 ("-1 1") and pslist ("-2 2") output and telling join to output the non-matching lines from psscan2 ("-v 1"). The tricky bit is that each command output needs to be sorted by its PID column for join to work. So if you look in the "<(...)" clauses, you'll see that the final element of the pipeline in each case is a numeric sort on the PID column. Easy, right?

The only fly in the ointment is the "not in sorted order" error messages from join. The problem is that join only understands alphabetic sorting. So when we go from 9xx PIDs to 1xxx PIDs, join thinks the file has gone all unsorted. There's no "-n" option to join like there is for sort, but in some versions of join we can use the "--nocheck-order" option to suppress the error messages:

$ join -v 1 -1 1 -2 2 --nocheck-order \
<(volatility psscan2 -f memory.img | tail -n +4 | sort -n -k 1,1) \
<(volatility pslist -f memory.img | tail -n +2 | sort -n -k2,2)

272 1820 Thu Feb 04 00:00:55 2010 0x02293b08 0x082c0300 wordpad.exe
628 540 Thu Jan 28 16:11:46 2010 0x025314e8 0x082c0060 winlogon.exe
672 628 Thu Jan 28 16:11:47 2010 0x02448cf8 0x082c0080 services.exe
932 672 Thu Jan 28 16:11:47 2010 0x01ea3558 0x082c0100 svchost.exe
1012 672 Thu Jan 28 16:12:02 2010 0x023a78b0 0x082c0260 alg.exe
1172 1592 Tue Feb 02 02:40:48 2010 0x0204c850 0x082c01c0 cmd.exe
1648 1592 Thu Jan 28 16:11:50 2010 0x023ae980 0x082c0200 VMwareTray.exe
1664 1592 Thu Jan 28 16:11:50 2010 0x021feb88 0x082c0240 msmsgs.exe
1744 848 Thu Feb 04 00:02:53 2010 Thu Feb 04 00:04:23 2010 0x01eaea88 0x082c0380 wmiprvse.exe
1748 1592 Thu Feb 04 00:02:10 2010 Thu Feb 04 00:06:19 2010 0x0240b9a0 0x082c03a0 cmd.exe

The other alternative is obviously to sort the PID columns alphabetically, but that offends my sensibilities somehow.

Mmmm, hmmm! That was some tasty fu! Hey Tim, volatility runs on Windows-- what can you do with the output? I double-dog-dare you to try it in CMD.EXE first...

Tim skipped school:

Do cmd.exe, dang Hal. Happy Freaking New Year to me, huh?

Here is what I came up with based on the assumption that pslist returns a subset of psscan2.

C:\> python.exe volatility psslist -f memory.img > plist.txt
C:\> cmd /v:on /c "for /F "skip=2 tokens=1,5,10,15" %a in ('python.exe volatility psscan2 -f lab3.img') do
@(if not "%d"=="" (set name=%d) else (if not "%c"=="" (set name=%c) else (set name=%b))) &
set pid=%a & (type pslist.txt | findstr /B /R /C:"!name! *!pid! " > NUL || echo !name! !pid!)"


svchost.exe 932
wmiprvse.exe 1744
cmd.exe 1172
msmsgs.exe 1664
wordpad.exe 272
alg.exe 1012
VMwareTray.exe 1648
cmd.exe 1748
services.exe 672
winlogon.exe 628


I split this command into two for the sake of readability; however, it could be easily combined into a one-liner. But I'll leave that simple experiment to you. The first line takes the output of psslist and dumps the contents into a file. This file will be read numerous times so it is significantly faster to just read the file in the second "half" of our command. Now, regarding that second half...

We start off by using invoking our shell with /v:on to enable delayed variable expansion and /c to cause our spawned shell to exit upon completion. Inside the shell we use our trusty For loop. The first three lines are skipped as they are headers. The For loop then splits the line based on white space. We are trying to get the name of the process, and due to spacing, it may be in the 5th, 10th, or 15th token. Yes, it is that confusing. Here is a little diagram of what I mean:

PID    PPID   Time created             Time exited              Offset     PDB        Remarks
------ ------ ------------------------ ------------------------ ---------- ---------- ----------------

Token1 2 3 4 5 6 7 8 9 10
932 672 Thu Jan 28 16:11:47 2010 0x01ea3558 0x082c0100 svchost.exe

Token1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1744 848 Thu Feb 04 00:02:53 2010 Thu Feb 04 00:04:23 2010 0x01eaea88 0x082c0380 wmiprvse.exe

Token1 2 3 4 5
4 0 0x025c8830 0x00319000 System


Our for loop will give us 4 variables a, b, c, and d which represent the 1st, 5th, 10th, and 15th token. We have to use a little trick to figure out which of the three variables contains the process name by checking each variable from right to left. If %d is not empty, then it contains the process name so we set Name equal to %d. If %d is empty we try %c, and if %c is empty we use %b. For the sake of nice variable names we set !pid! equal to %a. We then have the variable !pid!, which contains the process id, and !name!, which contains the process name.

We then search the pslist.txt file to see if the current process, represented by !name! and !pid!, is in the file. We output the file, using the Type command, and use FindStr to search for the matching name and process id. The /B switch says our search string must be at the beginning of the line, the /R enables regular expression searches. The default FindStr setting is to treat a space in our search string as a logical OR, but the /C switch "uses [the] specified string as a literal search string," meaning it doesn't treat a space as a logical OR. In short, it looks for the process name at the beginning of the line, followed by some number of spaces, then the process id, and then another space.

We then use the logical OR (||) in conjunction with the FindStr command to determine whether FindStr found something or not. This trick has been used repeatedly, but most recently in episode 122. If FindStr doesn't find anything we then output the process name and PID. This effectively gives us a list of processes that are found with psscan2 but not pslist.

Now for a more robust solution using...

PowerShell

I'm going to deviate into script land here, only because this mini-script may be very useful for manipulating the output of these commands. It will take the output and objectify it.

Objectifying psscan2:

PS C:\> $null, $pslist = python volatility pslist -f memory.img
PS C:\> [regex]$regex = '(?<Name>\S+)\s+(?<PID>[0-9]+)\s+(?<PPID>[0-9]+)\s+(?<Threads>[0-9]+)\s+(?<Handles>[0-9]+)\s+(?<Time>.*)'
PS C:\> $pslistobjects = foreach ($p in $pslist) {
... $psobj = "" | Select-Object Name, PID, PPID, Threads, Handles, Time
... $p -match $regex | Out-Null
... $psobj.Name = $matches.Name
... $psobj.PID = $matches.PID
... $psobj.PPID = $matches.PPID
... $psobj.Threads = $matches.Threads
... $psobj.Handles = $matches.Handles
... $psobj.Time = [datetime]::ParseExact($matches.Time.Trim(), "ddd MMM dd HH:mm:ss yyyy", $null)
... $psobj
... }

PS C:\> $pslistobjects | Format-Table
Name PID PPID Threads Handles Time
---- --- ---- ------- ------- ----
System 4 0 55 260 1/1/1970 12:00:00 AM
smss.exe 540 4 3 21 1/28/2010 4:11:40 PM
csrss.exe 604 540 12 363 1/28/2010 4:11:46 PM
...


This takes the output from pslist and converts it to PowerShell objects. Let's look at each line, one at a time.

PS C:\> $null, $pslist = python volatility pslist -f memory.img


Here we get the output from pslist, send the first line to null, and the remainder is put into the variable pslist. This effectively skips the first line (header).

PS C:\> [regex]$regex = '(?<Name>\S+)\s+(?<PID>[0-9]+)\s+(?<PPID>[0-9]+)\s+(?<Threads>[0-9]+)\s+(?<Handles>[0-9]+)\s+(?<Time>.*)'


The next chunk sets up our Regular Expression with named groupings.

PS C:\> $pslistobjects = foreach ($p in $pslist) {
... $psobj = "" | Select-Object Name, PID, PPID, Threads, Handles, Time
... $p -match $regex | Out-Null
... $psobj.Name = $matches.Name
... $psobj.PID = $matches.PID
... $psobj.PPID = $matches.PPID
... $psobj.Threads = $matches.Threads
... $psobj.Handles = $matches.Handles
... $psobj.Time = [datetime]::ParseExact($matches.Time.Trim(), "ddd MMM dd HH:mm:ss yyyy", $null)
... $psobj
... }


Inside the ForEach-Object loop is where the heavy lifting is done. First, an empty object is created. Then the Match operator is used to match the string using the regular expression and automatically populate the $matches variable. We then set each property of our object. The Time property is a bit special since the time format used by pslist isn't one of the formats that PowerShell/Windows natively understands. The variable $pslistobjects then contains PowerShell'ed objects from volatility's pslist. We can then sort, filter, or do perform all sorts of tricks once it has been PowerShellized.

A similar mini-script will objectify the output from psscan2:

PS C:\> $null, $null, $null, $psscan2 = \python25\python.exe volatility psscan2 -f memory.img
PS C:\> [regex]$regex = '\s*?(?<PID>[0-9]+)\s+(?<PPID>[0-9]+)\s(?<Created>.{24})\s(?<Exited>.{24})
\s(?<Offset>[0-9a-fx]{10})\s(?<PDB>[0-9a-fx]{10})\s(?<Name>.+)'

PS C:\> $psscan2objects = foreach ($p in $psscan2) {
... $psobj = "" | Select-Object Name, PID, PPID, Created, Exited, Offset, PDB
... $p -match $regex | Out-Null
... $psobj.Name = $matches.Name
... $psobj.PID = $matches.PID
... $psobj.PPID = $matches.PPID
... $psobj.Offset = $matches.Offset
... $psobj.PDB = $matches.PDB
... if ($matches.Created.Trim()) {
... $psobj.Created = [datetime]::ParseExact($matches.Created, "ddd MMM dd HH:mm:ss yyyy", $null)
... }
... if ($matches.Exited.Trim()) {
... $psobj.Exited = [datetime]::ParseExact($matches.Exited, "ddd MMM dd HH:mm:ss yyyy", $null)
... }
... $psobj
... }

PS C:\> $psscan2objects | ft

Name PID PPID Created Exited Offset PDB
---- --- ---- ------- ------ ------ ---
svchost.exe 932 672 1/28/2010 4:11:47 PM 0x01ea3558 0x082c0100
wmiprvse.exe 1744 848 2/4/2010 12:02:53 AM 2/4/2010 12:04:23 AM 0x01eaea88 0x082c0380
svchost.exe 1132 672 1/28/2010 4:11:48 PM 0x01eb4970 0x082c0160
mike022.exe 1956 672 2/2/2010 3:25:29 AM 0x020155d8 0x082c02c0
...


If you are going to use these commands often I would highly suggest making these into script files. You could even pass the file name to these scripts and have it wrap the volititlity commands.

Ok, so now we have two variables, each contains the output of the respective volatility command.

PS C:\> $pslistobjects | ft

Name PID PPID Threads Handles Time
---- --- ---- ------- ------- ----
System 4 0 55 260 1/1/1970 12:00:00 AM
smss.exe 540 4 3 21 1/28/2010 4:11:40 PM
csrss.exe 604 540 12 363 1/28/2010 4:11:46 PM
lsass.exe 684 628 18 341 1/28/2010 4:11:47 PM
...


PS C:\> $psscan2objects | ft

Name PID PPID Created Exited Offset PDB
---- --- ---- ------- ------ ------ ---
svchost.exe 932 672 1/28/2010 4:11:47 PM 0x01ea3558 0x082c0100
wmiprvse.exe 1744 848 2/4/2010 12:02:53 AM 2/4/2010 12:04:23 AM 0x01eaea88 0x082c0380
svchost.exe 1132 672 1/28/2010 4:11:48 PM 0x01eb4970 0x082c0160
mike022.exe 1956 672 2/2/2010 3:25:29 AM 0x020155d8 0x082c02c0
...


Finally Now, we can then use the Compare-Object cmdlet to compare the two sets of processes.

PS C:\> Compare-Object $pslistobjects $psscan2objects -Property name,pid

name pid SideIndicator
---- --- -------------
svchost.exe 932 =>
wmiprvse.exe 1744 =>
cmd.exe 1172 =>
msmsgs.exe 1664 =>
wordpad.exe 272 =>
alg.exe 1012 =>
VMwareTray.exe 1648 =>
cmd.exe 1748 =>
services.exe 672 =>
winlogon.exe 628 =>


The Property parameter is used to specify the properties to use for comparison. We can either use a single property or a comma separated list of property names.

From this output it is quickly apparent that there are 10 processes found by psscan2 that were not found by pslist.

Whew, that was a lot of work this week. I hope it gets me on Santa's Nice list...next year.

Davide is too cool for school

Davide Brini has once again punk'd me with this full-on awk attack:

awk 'FNR>1 && NR==FNR {a[$1,$2]; next} 
FNR>3 && !(($NF,$1) in a)' \
<(volatility pslist -f memory.img) \
<(volatility psscan2 -f memory.img)

Obviously, Davide has a PhD in awk, so let me explain what's going on here. FNR is an internal awk variable that tracks the current "input record number"-- usually the line number-- of the current file. NR, on the other hand, tracks the total number of records (lines) seen so far across all files.

If you look at the first awk clause, the "FNR>1" is how Davide is skipping the first header line in the pslist output. The "NR=FNR" expression will only be true if we're processing the first input "file", i.e. the output of "volatility pslist ...". Once awk moves on to the second "file" (the psscan output), NR will keep on accumulating, but FNR will be reset to zero.

So the first clause is for handling the psscan output. If you look at what's happening in the curly braces, Davide is creating empty array entries indexed by process name ($1) and PID ($2). The "next" just tells awk to read and process the next line of input, skipping the second clause which applies to the psscan output.

So let's look at that second clause. We can only get here if "NR!=FNR", which means we're dealing with the psscan output from the second input "file". Here Davide is using "FNR>3" to skip the header lines. For all the other lines, "!(($NF,$1) in a)" is true if and only if there is no entry in the array "a" for this combination of process name ($NF) and PID ($1). If we don't find an entry then psscan is telling us about a process that's been hidden from pslist and we want to output the information about this process. Davide is relying on the implicit "{print}" behavior of awk to make this happen.

Davide points out that the output from the above command will not be sorted, but you can always pipe the results into sort if that's important to you:

awk 'FNR>1 && NR==FNR {a[$1,$2]; next} 
FNR>3 && !(($NF,$1) in a)' \
<(volatility pslist -f memory.img) \
<(volatility psscan2 -f memory.img) | sort -n -k2,2

Nice job, Davide!

Michael has to stay late for passing notes

Wow, this Episode sure provoked a lot of interesting commentary. Michael Hale Ligh gave us a shout out from the volatility camp. He even wrote a small plugin for volatility, psdiff.py, that does the same thing as our command line kung fu:

# For http://volatility.googlecode.com/svn/branches/Volatility-1.4_rc1

import volatility.plugins.psscan as psscan
import volatility.win32.tasks as tasks
import volatility.utils as utils

class PsDiff(psscan.PSScan):
"""Produce a process diff"""

def calculate(self):
addr_space = utils.load_as(self._config)

# Build a dictionary of processes found by scanning. The keys are
# physical addresses and the values are the objects
procs_scan = dict((p.obj_offset, p) for p in psscan.PSScan.calculate(self))

# Build a dictionary of processes found by walking the linked list.
# The virtual addresses are converted to physical with vtop.
procs_list = dict((addr_space.vtop(p.obj_offset), p) for p in tasks.pslist(addr_space))

# Create two sets of addresses so we can easily compute the difference
scan_addrs = set(procs_scan.keys())
list_addrs = set(procs_list.keys())

# Yield any objects that are found by psscan but not pslist
for addr in (scan_addrs - list_addrs):
yield procs_scan[addr]

def render_text(self, outfd, data):
for p in data:
outfd.write("{0:<8} {1:<16} {2}\n".format(p.UniqueProcessId, p.ImageFileName, p.ExitTime))

Michael's plugin uses "psscan" instead of "psscan2", so the output will be slightly different, but it shouldn't be that hard to switch things over to use "psscan2" instead if you prefer. Michael also provided a bit more explanation in his original email:

$ python volatility.py psdiff -f memory.dmp

Volatile Systems Volatility Framework 1.4_rc1
0 Idle 1970-01-01 00:00:00
940 cmd.exe 2008-11-26 07:45:49
660 services.exe 1970-01-01 00:00:00
808 taskmgr.exe 2008-11-26 07:45:40
924 svchost.exe 1970-01-01 00:00:00
592 csrss.exe 1970-01-01 00:00:00
992 alg.exe 1970-01-01 00:00:00
1016 svchost.exe 1970-01-01 00:00:00
828 svchost.exe 1970-01-01 00:00:00

The exit time of "1970-01-01 00:00:00" just means the field is empty (process is still active). I am doing the diff based on the address of EPROCESS objects, however its possible, though not very likely, that an address could get re-used...so for a more robust diff you may check other fields as well.

If you want to see other fields in the output, its rather easy because the Volatility types are auto-generated from Microsoft's PDB symbol files. For example since Windows defines a structure like this:

typedef struct _EPROCESS {
...
char ImageFileName[16];
DWORD UniqueProcessId;
...
} EPROCESS, *PEPROCESS;

You can print those fields like p.ImageFileName and p.UniqueProcessId in the plugin.

Lastly, the csrpslist plugin discussed in Malware Analyst's Cookbook produces a diff using two alternate sources of process listings (the csrss.exe handle table and an internal linked list found in the memory of csrss.exe). There are many other sources as well...

Tuesday, December 21, 2010

Episode #126: Cleaning Up The Dump

Hal's directories are bloated

It's not politically correct to say, but sometimes in Unix your directories just get fat. And like most of us, as your directories get fat, they also get slow. This is because in standard Unix file systems, directories are implemented as sequential lists of file names. They aren't even sorted, so you can't binary search them.

For example, suppose you'd just been dumping your logs into a single directory for years. You could end up with a big pile of stuff that looks like this:

# ls -ld logs
drwxr-xr-x 2 root root 266240 Dec 18 15:49 logs
# ls logs | wc -l
7188
# ls logs
authpriv.20070808.gz
authpriv.20070809.gz
authpriv.20070810.gz
...

Almost 7200 files-- and as you can see the directory itself has grown to be about a quarter of a megabyte! In our example, the file names are "<log>.YYYYMMDD" with an optional ".gz" extension on the older log files that have been compressed to save space.

Well I want my directories to be fit and lean again, so I decided to move the files into a tree structure based on year and month. So I'll need to move each file to a new location such as "YYYY/MM/<log>.YYYYMMDD". That should prevent any single sub-directory from getting too bloated.

I think there are a lot of ways you could attack this one, but I decided to make some noise with sed:

# cd logs
# for file in *; do
dir=$(echo $file | sed 's/.*\.\([0-9][0-9][0-9][0-9]\)\([0-9][0-9]\).*/\1\/\2/');
mkdir -p $dir;
mv $file $dir;
done

Yep, that sed expression sure is noisy-- as in "line noise". What's going on here? Well I'm taking the file name as input and using sed to pull out the YYYY and the MM and reformatting them into a subdirectory name like "YYYY/MM". First I match "anything followed by a literal dot", aka ".*\.". Then I match four digits-- four instances of the set "[0-9]"-- followed by two digits. However, I enclose both groups of digits in parens-- "\( ... \)"-- so that I can use the matched values on the righthand side of the substitution. On the RHS, "\1" is the four-digit year we matched in the first parenthesized expression and "\2" is the month we matched second. So "\1\/\2" is the year and the month with a literal slash in between-- "YYYY/MM". Obvious, right?

But the sed is the hard part. Once that's over, it's a simple task to make the directory and move the file. And now our directory should be nice and skinny:

# ls
2007 2008 2009 2010
# ls -ld .
drwxr-xr-x 6 root root 266240 Dec 18 15:58 .

Wait a minute! We've only got four top-level directories under our logs directory, but the logs directory itself hasn't shrunk at all. Unfortunately, this is normal behavior for Unix-- once a directory gets big, it never loses the weight.

So how do we stop our directory from looking like Jabba the Hut? In Unix, you make a new directory and wipe out the old one:

# mkdir ../newlogs
# mv * ../newlogs
# cd ..
# rmdir logs
# mv newlogs logs
# ls -ld logs
drwxr-xr-x 6 root root 4096 Dec 18 16:09 logs

It's liposuction via cloning! A miracle of the modern age! OK, really it's a lame mis-feature of the Unix file system. But at least you now know what to do about it.

And now I want to see Tim push his big directories around. Hey Tim, your directory is so fat...
Tim feels bloated from all the Christmas food:

Hal, yo directories is so fat, when they floated around the ocean Spain claimed them as a new world.

Ok, so the joke is terrible, but the problem is real. Directories with a lot of files can really be a pain.

On Windows there isn't one directory that contains all the logs. Each service typically has its own subdirectory under C:\Windows\System32\LogFiles\. For example, the subdirectory W3SVC1 would contain the logs for the first instance of an IIS webserver. Also, with older version of Windows C:\Windows is replaced with C:\WinNT.

This LogFiles directory is used by Microsoft products and some third-party products, but of course the third-party products can put their log files in all sorts of other weird locations. For the sake of this article, we'll assume we are looking at IIS logs.

By default IIS log files are created daily with the naming convention of exyymmdd.log. Microsoft doesn't put the full four digit year, so we'll assume 20XX. Why assume post 2000? Because if you are running an IIS server from the last millennium it probably isn't your server any more (see pwned).

Let's start off by getting the names for our directories, and then we'll build on that. According to Microsoft's IIS Log File Naming Syntax, no matter what file format or regular rotation interval (month, week, day, hour), the format always is always:

<some chars describing format><YY><MM><other numbers as used in date format>.log
We can build a regular expression replace pattern to derive directory names from the file names:

PS C:\Windows\System32\LogFiles\W3SVC1> ls *.log | % { [regex]::Replace($_.name, '[^0-9]*([0-9]{2})([0-9]{2}).*', '20$1\$2') }
2010\01
...
2010\02
...
2010\03
...
We use a ForEach-Object (alias %) loop on the output of our directory listing (Get-ChildItem is aliased as ls). Inside the loop we use .Net to call the static Replace method in the Regex class. The Replace method takes three arguments: the input, the search pattern, and the replacement string. The input is the name of the file. The search pattern is slightly more complicated. Here is how the search pattern maps to the portions of the log created on January 16th of 2009, ex090116.log.

[^0-9]*    = ex (all the non-digits at the beginning of the file name)
([0-9]{2}) = 09 (two digit year)
([0-9]{2}) = 01 (two digit month)
.* = 16.log (the rest of the name)
We then use the replacement string to build the directory name, where $1 represents the first grouping (year) and $2 represents the second grouping. Each grouping is designated by parenthesis. For more information on .Net and Regular Expression Replacement, see this article.

Notice, in our command above we used single quotes. That is because PowerShell will expand any strings inside double quotes before our Replace method had a chance to do any replacing. This means that PowerShell would try to convert $1 into a variable and not pass the literal string to the Replace method. Here is what I mean:

PS C:\> echo "Here is my string $1"
Here is my string

PS C:\> echo 'Here is my string $1'
Here is my string $1
We could use double quotes, but we would have to add a backtick (`) before the dollar sign. The resulting command would look like this:

PS C:\Windows\System32\LogFiles\W3SVC1> ls *.log | % {
[regex]::Replace($_.name, '[^0-9]*([0-9]{2})([0-9]{2}).*', "20`$1\`$2") }


So now we have the directory name, let's create the directory structure and move some files! I'm not going to show the full prompt so the command is less cluttered.

> Get-ChildItem *.log | ForEach-Object {
$dir = [regex]::Replace($_.Name, '[^0-9]*([0-9]{2})([0-9]{2}).*', "20`$1\`$2");
mkdir $dir -ErrorAction SilentlyContinue;
Move-Item $_ $dir }
Wow, that is a rather large command, so let's trim it down with aliases and shortened parameter names. We can't have a big ol' fat command with our nice lean directories.

> ls *.log | % {
$dir = [regex]::Replace($_.name, '[^0-9]*([0-9]{2})([0-9]{2}).*', "20`$1\`$2");
mkdir $dir -ea;
move $_ $dir }
Inside our ForEach-Object loop we set $dir equal to the new directory name. We then create the directory. The ErrorAction (ea for short) switch tells the shell not to show us an error message or stop processing if there is a problem. In our case, we want to make sure the command continues to run even if the directory already exists. After the directory is created we move the file, which is represented by $_.

PS C:\Windows\System32\LogFiles\W3SVC1> ls

Directory: Microsoft.PowerShell.Core\FileSystem::C:\Windows\System32\LogFiles\W3SVC1

Mode LastWriteTime Length Name
---- ------------- ------ ----
d---- 12/19/2010 12:28 AM <DIR> 2008
d---- 12/19/2010 12:28 AM <DIR> 2009
d---- 12/19/2010 12:28 AM <DIR> 2010


So now we can enter the new year with leaner and meaner directories. And yes, they are meaner. Directories get pretty ticked off when you trim their children.

Tuesday, December 14, 2010

Episode #125: Find Yourself

Tim takes credit for someone else's work:

One of our faithful readers, John, wrote in. Well, we presume he is faithful to us, but we've heard he cheats on us with other blogs, and that's the worst kind of cheating. Since we are short of other ideas I guess we'll have to use his email.

Seriously though, John Ahearne has a nice bit of fu. On one particular assignment, John had carved over 1,200,000 files, where there were over 1,000 per directory. The files were named based on a particular file header in a proprietary file format. The client asked him to look for several files and gave him a text file with the file names. He started with this command to search for his files:

C:\> findstr /s /g:filestofind.txt


He used the command with the /s option to do a recursive search, and the /g option to load the search strings from a file. But there was a problem, slowness. The reason, this command searches inside the file, and we just want to search for the file name. He then tried another command to see if that would work more quickly.

C:\> dir /b /s | findstr /g:filestofind.txt
C:\Windows\System32\cmd.exe
C:\Windows\System32\en-US\cmd.exe.mui
C:\Windows\winsxs\x86_microsoft-windows-c..c87d\cmd.exe.mui
C:\Windows\winsxs\x86_microsoft-windows-c..1ee0\cmd.exe
C:\Windows\winsxs\x86_microsoft-windows-i..d1e2\appcmd.exe.mui
C:\Windows\winsxs\x86_microsoft-windows-i..ecbd\appcmd.exe
C:\Windows\winsxs\x86_microsoft-windows-s..5b54\evntcmd.exe.mui
C:\Windows\winsxs\x86_microsoft-windows-s..b805\evntcmd.exe


This is much quicker, and it searches what we actually want! How would we do the same thing in PowerShell?

PS C:\> ls -r -i (cat .\filestofind.txt)

Directory: C:\Windows\System32

Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 7/13/2009 8:14 PM 301568 cmd.exe

Directory: C:\Windows\winsxs\x86_microsoft-windows-commandpro...

Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 7/13/2009 8:14 PM 301568 cmd.exe


We use Get-ChildItem (alias ls) with the Recursive option (r for short). Also used is the Include parameter (i for short), which is used to find items that match our search string where our search string is taken from the file, via Get-Content (alias cat). One other thing, notice a difference between the output of the two commands?

One command shows only files named "cmd.exe", the other looks for files containing "cmd.exe". The difference is due to the way each command expects the search strings to be presented. Here is a little chart describing how to get similar searches from each command:




Search TypePowerShellcmd.exe
Name is exactly cmd.execmd.exe^cmd.exe$
Name contains cmd.exe*cmd.exe*cmd.exe
Name ends with cmd.execmd.exe*cmd.exe$


Note, in the second case our cmd command will return any file with cmd.exe in the path, so one of the other options might be a better choice.

We can get the same results with each command. Obviously, when searching 1,200,000 files we want to use the faster command. Let's do a little test to see which is faster. We'll use search strings that return identical results. More specifically, we'll use the search string that exactly matches a file named cmd.exe. Before each search I modified the file filetofind.txt accordingly. Now how do we measure the duration of each command?

PowerShell has the measure-command cmdlet, but cmd.exe does not have a way to measure time. However, Ed used a cool method in episode #49 that I'll borrow.

PS C:\> measure-command { ls -r C:\Windows -Include (cat .\filestofind.txt) } | Select TotalSeconds

TotalSeconds
------------
55.7072613


C:\> cmd.exe /v:on /c "echo !time! & (dir C:\Windows /s /b | findstr /g:filestofind.txt > NUL) & echo !time!"
23:55:25.16
23:55:41.52


Cmd.exe took just 16.36 seconds, which is 3.4 times faster than PowerShell's 55.7 seconds. Wow! The cmd.exe command is obviously the command we are going to use.

After John found the files, he needed a way to copy the files to a location of his choosing. Here is the cool little command he came up with:

C:\> dir /b /s | findstr /g:filestofind.txt > c:\foundthem.txt &
FOR /F %i in (d:\foundthem.txt) do copy %i d:\neededfiles\


This takes the output from our search and dumps it into foundthem.txt. We then use a For loop to read the contents of the file and copy each file to the neededfiles directory.

Well done John.

I have to say thanks to John, since he came up with the idea and wrote the commands; making my life much easier. I wonder if Hal has found anyone to write his episode for him?

Hal stands alone

Geez, Tim, since John did all your work for you I was sort of hoping that you'd write the Unix bit this week. Some friend and co-author you are!

The Unix equivalent of what John's trying to do would be something like this:

# find /etc -type f | grep -f filestofind.txt
/etc/passwd.bak
/etc/shadow.bak
/etc/passwd
/etc/security/group.conf
/etc/security/opasswd
...

Here I'm using find to output a list of all regular files ("-type f") under /etc. Then I pipe that output into a grep command and use the "-f" option to tell grep to read a list of patterns from a text file. In this case my patterns were things like "passwd", "shadow", "group", and so on, which actually match a surprisingly large number of files under /etc.

Since were talking about performance improvements here, it's worth noting that if you're searching for fixed strings rather than regular expressions, then using fgrep is going to be faster:

# time find /etc -type f | grep -f filestofind.txt >/dev/null

real 0m0.052s
user 0m0.030s
sys 0m0.020s
# time find /etc -type f | fgrep -f filestofind.txt >/dev/null

real 0m0.026s
user 0m0.010s
sys 0m0.010s

Now /etc is a fairly small directory-- we'd probably get better numbers if we tried running this on a larger portion of the file system. And we should probably run multiple trials to get a more average value. But at least in this case you can see that fgrep is twice as fast as grep.

John's actual challenge is to copy the matching files into another directory. We can use the cpio trick from Episode #115 to actually copy the files:

# find /etc -type f | fgrep -f filestofind.txt | cpio -pd /root/saved
39 blocks

"cpio -p" is the "pass through" option that reads file/directory names from the standard input and copies them from their current location to the directory name specified with "-d". You don't even have to create the target directory-- if it doesn't exist, cpio will create it for you.

So this one really wasn't that difficult. Tim may need our readers to help him, but us Unix folks can get it done on our own.

Tuesday, December 7, 2010

Episode #124: Levelling Up

Tim set himself up to bomb:

So I came up with the idea for this episode, totally my fault. And I knew going into it that I was setting myself for a significant beating from Hal. My guess is that it will take him all of five minutes to write his portion. So here goes.

One of the nice features of Windows is the extremely granular permissions that can be granted on files and directories. This functionality comes at a price, it makes auditing of permissions a big pain. Especially when it comes to groups, and even worse, nested groups. A few of my colleagues and I were looking for files that would allow us to elevate our privileges from the limited user account one with more privileges. Files run by service accounts, or possibly an administrator, and are also modifiable by a more limited user. In short, we were looking for files owned by an admin but writeable by a limited user. Before we get into the fu, we need to look at how file permissions look in PowerShell.

To get file permissions we need to use the Get-Acl cmdlet. The output of the command looks like this (fl is an alias for Format-List and is used to display the results in list form):
PS C:\> get-acl test | fl

Path : Microsoft.PowerShell.Core\FileSystem::C:\test
Owner : MYDOM\tim
Group :
Access : BUILTIN\Administrators Allow FullControl
NT AUTHORITY\SYSTEM Allow FullControl
MYDOM\tim Allow FullControl
CREATOR OWNER Allow 268435456
BUILTIN\Users Allow ReadAndExecute, Synchronize
BUILTIN\Users Allow AppendData
BUILTIN\Users Allow CreateFiles
Audit :
Sddl : O:S-1-5-21-236840484-2123344539-2455687859-23475G:DUD:(A;OICIID;FA;;;BA)
(A;OICIID;FA;;;SY)(A;ID;FA;;;S-1-5-21-236840484-2123344539-2455687859-23475)
(A;OICIIOID;GA;;;CO)(A;OICIID;0x1200a9;;;BU)(A;CIID;LC;;;BU)(A;CIID;DC;;;BU)


If you look at the Access property you can see that I, MYDOM\Tim, have full access to the folder test. This means I can do what ever I want to the file. Let's take a closer look at this property and expand it using the Select-Object cmdlet with the ExpandProprty option.

PS C:\> get-acl test | select -ExpandProperty Access
FileSystemRights : FullControl
AccessControlType : Allow
IdentityReference : MYDOM\tim
IsInherited : True
InheritanceFlags : None
PropagationFlags : None


In order for me to have write permission, the IdentityReference needs to my user account or a group of which I am a member. The FileSystemRights must be something that allows me to modify the file. Finally, the AccessControlType needs to be Allow. Ok, great, but what groups am I a member of?

To get a list of all the groups a user is a member of you can use the Get-ADAccountAuthorizationGroup cmdlet. The problem, it requires a Windows Server 2008 R2 domain controller or an instance of AD LDS running on a Windows Server 2008 R2 server. It also requires that you have ability to query the domain controller. We'll assume we don't have permissions to do this, so we'll just look for some known groups on the local computer that I should be a member of:

  • MYDOM\Users

  • MYLAPPY\Users

  • MYLAPPY\Guests

  • Everyone



Now we have the list of groups. All we need to do is add my user account and we have the list of IdentityReference values we need to look for.

We also need to filter for specific permissions which will allow us modify the file. Here is what we are looking for:

  • FullControl

  • WriteData

  • CreateFiles

  • AppendData

  • ChangePermissions

  • TakeOwnership

  • Write

  • Modify



So now with all this knowledge of what to look for, we can now do our search for executable files in the Windows directory which we can modify. MYLAPPY is the name of my computer, and MYDOM is the name of my domain.

PS C:\Windows> ls -r -include *.exe,*.ps1,*.bat,*.com,*.vbs,*.dll | Get-Acl |
? { select -InputObject $_ -ExpandProperty Access |
? { ("MYDOM\tim","MYDOM\Users","MYLAPPY\Users","MYLAPPY\Guests","Everyone" -contains $_.IdentityReference)
-and ( "FullControl","WriteData","CreateFiles","AppendData","ChangePermissions","TakeOwnership","Write","Modify"
-contains $_.FileSystemRights) -and $_.AccessControlType -eq "Allow" }
} |
select path


We start off with a recursive directory listing that finds executable files. The results are piped into Get-Acl. A giant Where-Object (alias ?) filter is used to find the files we want. In this case use a nested Where-Object filter. If the inner filter returns an object (an Access object), the outer filter returns true and will return the parent object (the Acl object).

The outer filter just sets up our inner filter. In the inner filter we check to see if the current Access object matches our username or group. This is done by creating a collection of principles and checking if the IdentityReference property of the Access object is in the collection. We take a similar approach with the File System Rights property. Finally, we check the Access Control Type is Allow, rather than Deny. If all three parts are true, then the Acl object is passed down the pipeline where we just output the path to the file. The only problem is that this command does not check to see if a Deny rule supercedes the Allow rule.

We could also add a filter for files owned by MYLAPPY\Administrators.

PS C:\> ls -r -include *.exe,*.ps1,*.bat,*.com,*.vbs,*.dll | Get-Acl |
? { $_.Owner -eq "MYLAPPY\Administrators" } ...


The problem with this approach is that the file we are looking for may be owned by a Domain Admin or some other service account with elevated permissions so we might have to do another collection of principles like we did above. The nice thing with MYLAPPY\Administrators is that group is the default owner of any object that is created by a member of the group, meaning John is an Administrator and he creates a file it will be owned by MYLAPPY\Administrators. Of course there are options in Windows to change this setting, but it is the default.

So there you have it. And by it, I mean a big, confusing, complex command. An now Hal is going to give it to you. And by it I mean a simple short easy to read command.

Hal says, "Unix is the bomb!"

Here's a reasonable Unix approximation for what Tim is trying to do. It's surprisingly not all that terse:

find / -type f -user root \( -perm -0020 -o -perm -0002 \) \
\( -perm -0100 -o -perm -0010 -o -perm -0001 \)


The basic idea is simple. We want to find executable files that are owned by root but which are group or world writable. "Files owned by root" is no problem: that's just "-type f -user root". The verbosity comes from how you have to specify permissions with find.

If I want to say "group or world writable", I end up having to specify each bit with its own "-perm -...." clause and then gang them together with "or" ("-o") and parens ("\( ... \)"). Similarly, defining "executable" means checking each of the three possible execute bits individually. I've often wanted find to have a terser syntax for doing this kind of thing.

But there's a solution for you in any event. Unix's much less granular ownership and permissions model makes things considerably easier on this side of the house than on Windows.