Tuesday, December 28, 2010

Episode #127: Making a Difference

Hal went to school

I recently got the opportunity to sit in on (fellow SANS instructor) Lenny Zeltser's "Reverse Engineering Malware" class. It's a terrific course, and I highly recommend it.

During the material on memory analysis, we were comparing the output of "volatility pslist" and "volatility psscan2". It's relatively straightforward for rootkits to hide themselves from pslist, but psscan2 does a much more thorough job of finding the hidden processes. So the differences in the output are always very interesting to the analyst. Here's an example of what I mean:

$ volatility pslist -f memory.img
Name Pid PPid Thds Hnds Time
System 4 0 55 260 Thu Jan 01 00:00:00 1970
smss.exe 540 4 3 21 Thu Jan 28 16:11:40 2010
csrss.exe 604 540 12 363 Thu Jan 28 16:11:46 2010
lsass.exe 684 628 18 341 Thu Jan 28 16:11:47 2010
vmacthlp.exe 836 672 1 24 Thu Jan 28 16:11:47 2010
svchost.exe 848 672 18 201 Thu Jan 28 16:11:47 2010
svchost.exe 1024 672 51 1178 Thu Jan 28 16:11:47 2010
svchost.exe 1072 672 4 75 Thu Jan 28 16:11:47 2010
svchost.exe 1132 672 15 212 Thu Jan 28 16:11:48 2010
spoolsv.exe 1476 672 10 115 Thu Jan 28 16:11:49 2010
explorer.exe 1592 1572 12 4021 Thu Jan 28 16:11:50 2010
VMwareUser.exe 1656 1592 8 416 Thu Jan 28 16:11:50 2010
VMwareService.e 1996 672 3 1026 Thu Jan 28 16:11:58 2010
wscntfy.exe 1396 1024 1 27 Thu Jan 28 16:12:03 2010
taskmgr.exe 1624 628 3 20201 Tue Feb 02 02:45:05 2010
mike022.exe 1956 672 2 30 Tue Feb 02 03:25:29 2010
wordpad.exe 1992 1260 4 102 Tue Feb 02 22:17:03 2010
calc.exe 828 1592 1 26 Thu Feb 04 00:01:00 2010
cmd.exe 968 1592 1 32 Thu Feb 04 00:01:13 2010
wordpad.exe 2008 1256 5 101 Thu Feb 04 00:02:56 2010
$ volatility psscan2 -f memory.img
PID PPID Time created Time exited Offset PDB Remarks
------ ------ ------------------------ ------------------------ ---------- ---------- ----------------

932 672 Thu Jan 28 16:11:47 2010 0x01ea3558 0x082c0100 svchost.exe
1744 848 Thu Feb 04 00:02:53 2010 Thu Feb 04 00:04:23 2010 0x01eaea88 0x082c0380 wmiprvse.exe
1132 672 Thu Jan 28 16:11:48 2010 0x01eb4970 0x082c0160 svchost.exe
1956 672 Tue Feb 02 03:25:29 2010 0x020155d8 0x082c02c0 mike022.exe
1072 672 Thu Jan 28 16:11:47 2010 0x02016978 0x082c0140 svchost.exe
1172 1592 Tue Feb 02 02:40:48 2010 0x0204c850 0x082c01c0 cmd.exe
1476 672 Thu Jan 28 16:11:49 2010 0x0209db38 0x082c01a0 spoolsv.exe
1996 672 Thu Jan 28 16:11:58 2010 0x021f0da0 0x082c0180 VMwareService.e
1664 1592 Thu Jan 28 16:11:50 2010 0x021feb88 0x082c0240 msmsgs.exe
1024 672 Thu Jan 28 16:11:47 2010 0x02202880 0x082c0120 svchost.exe
604 540 Thu Jan 28 16:11:46 2010 0x0221f020 0x082c0040 csrss.exe
1624 628 Tue Feb 02 02:45:05 2010 0x02256da0 0x082c02e0 taskmgr.exe
272 1820 Thu Feb 04 00:00:55 2010 0x02293b08 0x082c0300 wordpad.exe
1012 672 Thu Jan 28 16:12:02 2010 0x023a78b0 0x082c0260 alg.exe
1656 1592 Thu Jan 28 16:11:50 2010 0x023a9c28 0x082c0220 VMwareUser.exe
1648 1592 Thu Jan 28 16:11:50 2010 0x023ae980 0x082c0200 VMwareTray.exe
848 672 Thu Jan 28 16:11:47 2010 0x023b3020 0x082c00e0 svchost.exe
1748 1592 Thu Feb 04 00:02:10 2010 Thu Feb 04 00:06:19 2010 0x0240b9a0 0x082c03a0 cmd.exe
836 672 Thu Jan 28 16:11:47 2010 0x02412b58 0x082c00c0 vmacthlp.exe
672 628 Thu Jan 28 16:11:47 2010 0x02448cf8 0x082c0080 services.exe
968 1592 Thu Feb 04 00:01:13 2010 0x024707e8 0x082c0340 cmd.exe
684 628 Thu Jan 28 16:11:47 2010 0x02483da0 0x082c00a0 lsass.exe
1992 1260 Tue Feb 02 22:17:03 2010 0x02491130 0x082c0360 wordpad.exe
1396 1024 Thu Jan 28 16:12:03 2010 0x02492d78 0x082c0280 wscntfy.exe
2008 1256 Thu Feb 04 00:02:56 2010 0x02494988 0x082c03e0 wordpad.exe
828 1592 Thu Feb 04 00:01:00 2010 0x024c86b8 0x082c02a0 calc.exe
1592 1572 Thu Jan 28 16:11:50 2010 0x024ddda0 0x082c01e0 explorer.exe
540 4 Thu Jan 28 16:11:40 2010 0x024f8368 0x082c0020 smss.exe
628 540 Thu Jan 28 16:11:46 2010 0x025314e8 0x082c0060 winlogon.exe
4 0 0x025c8830 0x00319000 System

Visually you can see that the psscan2 output lists several more processes than pslist, but just using your eyeballs it can be difficult to figure out exactly what the differences are. Seems like a job for command-line kung fu!

My first thought was to simply extract the list of .EXEs from each command and diff them. In order to do the diff properly, I'll need to sort them into canonical order, but that's no problem. Here's how we manage the output from pslist:

$ volatility pslist -f memory.img | tail -n +2 | awk '{print $1}' | sort
calc.exe
cmd.exe
csrss.exe
...

I use tail to chop off the header line, then awk to extract the name of the .EXE from the first column, and finally pipe the whole thing into sort.

Dealing with the psscan2 output is very similar:

$ volatility psscan2 -f memory.img | tail -n +4 | awk '{print $NF}' | sort
alg.exe
calc.exe
cmd.exe
...

In this case, there are three header lines we need to skip. Also the .EXE name is in the last column of output-- "print $NF" is a useful awk idiom for printing the value in the last column.

So now we need to diff the output of these two commands. We could do this by creating temporary files, but why bother when have the magic bash "<(...)" syntax that lets us substitute command output in a place where a command would normally be looking for a file name:

diff <(volatility psscan2 -f memory.img | tail -n +4 | awk '{print $NF}' | sort) \
<(volatility pslist -f memory.img | tail -n +2 | awk '{print $1}' | sort)

1d0
< alg.exe
4,5d2
< cmd.exe
< cmd.exe
10,11d6
< msmsgs.exe
< services.exe
18d12
< svchost.exe
23d16
< VMwareTray.exe
25,27d17
< winlogon.exe
< wmiprvse.exe
< wordpad.exe

Wicked! There are 10 processes that appear in the psscan2 output that don't show up in the pslist output. Since we don't see any lines starting with ">" there are no processes in the pslist output that don't show up in psscan2-- this is what we'd expect, but it's always nice to get confirmation.

The only problem here is that as we got further into the in-class exercises, I realized I really wanted all of the extra detail about each of the hidden processes from the psscan2 output. For example, the hex offset values end up being very useful, and I'd like to know exactly which two of the three command.exe processes are the hidden ones. Let me show you the command line I came up with and then explain it to you:

$ join -v 1 -1 1 -2 2 \
<(volatility psscan2 -f memory.img | tail -n +4 | sort -n -k 1,1) \
<(volatility pslist -f memory.img | tail -n +2 | sort -n -k2,2)

272 1820 Thu Feb 04 00:00:55 2010 0x02293b08 0x082c0300 wordpad.exe
628 540 Thu Jan 28 16:11:46 2010 0x025314e8 0x082c0060 winlogon.exe
672 628 Thu Jan 28 16:11:47 2010 0x02448cf8 0x082c0080 services.exe
932 672 Thu Jan 28 16:11:47 2010 0x01ea3558 0x082c0100 svchost.exe
join: file 1 is not in sorted order
join: file 2 is not in sorted order
1012 672 Thu Jan 28 16:12:02 2010 0x023a78b0 0x082c0260 alg.exe
1172 1592 Tue Feb 02 02:40:48 2010 0x0204c850 0x082c01c0 cmd.exe
1648 1592 Thu Jan 28 16:11:50 2010 0x023ae980 0x082c0200 VMwareTray.exe
1664 1592 Thu Jan 28 16:11:50 2010 0x021feb88 0x082c0240 msmsgs.exe
1744 848 Thu Feb 04 00:02:53 2010 Thu Feb 04 00:04:23 2010 0x01eaea88 0x082c0380 wmiprvse.exe
1748 1592 Thu Feb 04 00:02:10 2010 Thu Feb 04 00:06:19 2010 0x0240b9a0 0x082c03a0 cmd.exe

In this case I'm using join rather than diff because the output of the two commands is so differently formatted. Essentially I'm doing a join on the PID columns of the psscan2 ("-1 1") and pslist ("-2 2") output and telling join to output the non-matching lines from psscan2 ("-v 1"). The tricky bit is that each command output needs to be sorted by its PID column for join to work. So if you look in the "<(...)" clauses, you'll see that the final element of the pipeline in each case is a numeric sort on the PID column. Easy, right?

The only fly in the ointment is the "not in sorted order" error messages from join. The problem is that join only understands alphabetic sorting. So when we go from 9xx PIDs to 1xxx PIDs, join thinks the file has gone all unsorted. There's no "-n" option to join like there is for sort, but in some versions of join we can use the "--nocheck-order" option to suppress the error messages:

$ join -v 1 -1 1 -2 2 --nocheck-order \
<(volatility psscan2 -f memory.img | tail -n +4 | sort -n -k 1,1) \
<(volatility pslist -f memory.img | tail -n +2 | sort -n -k2,2)

272 1820 Thu Feb 04 00:00:55 2010 0x02293b08 0x082c0300 wordpad.exe
628 540 Thu Jan 28 16:11:46 2010 0x025314e8 0x082c0060 winlogon.exe
672 628 Thu Jan 28 16:11:47 2010 0x02448cf8 0x082c0080 services.exe
932 672 Thu Jan 28 16:11:47 2010 0x01ea3558 0x082c0100 svchost.exe
1012 672 Thu Jan 28 16:12:02 2010 0x023a78b0 0x082c0260 alg.exe
1172 1592 Tue Feb 02 02:40:48 2010 0x0204c850 0x082c01c0 cmd.exe
1648 1592 Thu Jan 28 16:11:50 2010 0x023ae980 0x082c0200 VMwareTray.exe
1664 1592 Thu Jan 28 16:11:50 2010 0x021feb88 0x082c0240 msmsgs.exe
1744 848 Thu Feb 04 00:02:53 2010 Thu Feb 04 00:04:23 2010 0x01eaea88 0x082c0380 wmiprvse.exe
1748 1592 Thu Feb 04 00:02:10 2010 Thu Feb 04 00:06:19 2010 0x0240b9a0 0x082c03a0 cmd.exe

The other alternative is obviously to sort the PID columns alphabetically, but that offends my sensibilities somehow.

Mmmm, hmmm! That was some tasty fu! Hey Tim, volatility runs on Windows-- what can you do with the output? I double-dog-dare you to try it in CMD.EXE first...

Tim skipped school:

Do cmd.exe, dang Hal. Happy Freaking New Year to me, huh?

Here is what I came up with based on the assumption that pslist returns a subset of psscan2.

C:\> python.exe volatility psslist -f memory.img > plist.txt
C:\> cmd /v:on /c "for /F "skip=2 tokens=1,5,10,15" %a in ('python.exe volatility psscan2 -f lab3.img') do
@(if not "%d"=="" (set name=%d) else (if not "%c"=="" (set name=%c) else (set name=%b))) &
set pid=%a & (type pslist.txt | findstr /B /R /C:"!name! *!pid! " > NUL || echo !name! !pid!)"


svchost.exe 932
wmiprvse.exe 1744
cmd.exe 1172
msmsgs.exe 1664
wordpad.exe 272
alg.exe 1012
VMwareTray.exe 1648
cmd.exe 1748
services.exe 672
winlogon.exe 628


I split this command into two for the sake of readability; however, it could be easily combined into a one-liner. But I'll leave that simple experiment to you. The first line takes the output of psslist and dumps the contents into a file. This file will be read numerous times so it is significantly faster to just read the file in the second "half" of our command. Now, regarding that second half...

We start off by using invoking our shell with /v:on to enable delayed variable expansion and /c to cause our spawned shell to exit upon completion. Inside the shell we use our trusty For loop. The first three lines are skipped as they are headers. The For loop then splits the line based on white space. We are trying to get the name of the process, and due to spacing, it may be in the 5th, 10th, or 15th token. Yes, it is that confusing. Here is a little diagram of what I mean:

PID    PPID   Time created             Time exited              Offset     PDB        Remarks
------ ------ ------------------------ ------------------------ ---------- ---------- ----------------

Token1 2 3 4 5 6 7 8 9 10
932 672 Thu Jan 28 16:11:47 2010 0x01ea3558 0x082c0100 svchost.exe

Token1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1744 848 Thu Feb 04 00:02:53 2010 Thu Feb 04 00:04:23 2010 0x01eaea88 0x082c0380 wmiprvse.exe

Token1 2 3 4 5
4 0 0x025c8830 0x00319000 System


Our for loop will give us 4 variables a, b, c, and d which represent the 1st, 5th, 10th, and 15th token. We have to use a little trick to figure out which of the three variables contains the process name by checking each variable from right to left. If %d is not empty, then it contains the process name so we set Name equal to %d. If %d is empty we try %c, and if %c is empty we use %b. For the sake of nice variable names we set !pid! equal to %a. We then have the variable !pid!, which contains the process id, and !name!, which contains the process name.

We then search the pslist.txt file to see if the current process, represented by !name! and !pid!, is in the file. We output the file, using the Type command, and use FindStr to search for the matching name and process id. The /B switch says our search string must be at the beginning of the line, the /R enables regular expression searches. The default FindStr setting is to treat a space in our search string as a logical OR, but the /C switch "uses [the] specified string as a literal search string," meaning it doesn't treat a space as a logical OR. In short, it looks for the process name at the beginning of the line, followed by some number of spaces, then the process id, and then another space.

We then use the logical OR (||) in conjunction with the FindStr command to determine whether FindStr found something or not. This trick has been used repeatedly, but most recently in episode 122. If FindStr doesn't find anything we then output the process name and PID. This effectively gives us a list of processes that are found with psscan2 but not pslist.

Now for a more robust solution using...

PowerShell

I'm going to deviate into script land here, only because this mini-script may be very useful for manipulating the output of these commands. It will take the output and objectify it.

Objectifying psscan2:

PS C:\> $null, $pslist = python volatility pslist -f memory.img
PS C:\> [regex]$regex = '(?<Name>\S+)\s+(?<PID>[0-9]+)\s+(?<PPID>[0-9]+)\s+(?<Threads>[0-9]+)\s+(?<Handles>[0-9]+)\s+(?<Time>.*)'
PS C:\> $pslistobjects = foreach ($p in $pslist) {
... $psobj = "" | Select-Object Name, PID, PPID, Threads, Handles, Time
... $p -match $regex | Out-Null
... $psobj.Name = $matches.Name
... $psobj.PID = $matches.PID
... $psobj.PPID = $matches.PPID
... $psobj.Threads = $matches.Threads
... $psobj.Handles = $matches.Handles
... $psobj.Time = [datetime]::ParseExact($matches.Time.Trim(), "ddd MMM dd HH:mm:ss yyyy", $null)
... $psobj
... }

PS C:\> $pslistobjects | Format-Table
Name PID PPID Threads Handles Time
---- --- ---- ------- ------- ----
System 4 0 55 260 1/1/1970 12:00:00 AM
smss.exe 540 4 3 21 1/28/2010 4:11:40 PM
csrss.exe 604 540 12 363 1/28/2010 4:11:46 PM
...


This takes the output from pslist and converts it to PowerShell objects. Let's look at each line, one at a time.

PS C:\> $null, $pslist = python volatility pslist -f memory.img


Here we get the output from pslist, send the first line to null, and the remainder is put into the variable pslist. This effectively skips the first line (header).

PS C:\> [regex]$regex = '(?<Name>\S+)\s+(?<PID>[0-9]+)\s+(?<PPID>[0-9]+)\s+(?<Threads>[0-9]+)\s+(?<Handles>[0-9]+)\s+(?<Time>.*)'


The next chunk sets up our Regular Expression with named groupings.

PS C:\> $pslistobjects = foreach ($p in $pslist) {
... $psobj = "" | Select-Object Name, PID, PPID, Threads, Handles, Time
... $p -match $regex | Out-Null
... $psobj.Name = $matches.Name
... $psobj.PID = $matches.PID
... $psobj.PPID = $matches.PPID
... $psobj.Threads = $matches.Threads
... $psobj.Handles = $matches.Handles
... $psobj.Time = [datetime]::ParseExact($matches.Time.Trim(), "ddd MMM dd HH:mm:ss yyyy", $null)
... $psobj
... }


Inside the ForEach-Object loop is where the heavy lifting is done. First, an empty object is created. Then the Match operator is used to match the string using the regular expression and automatically populate the $matches variable. We then set each property of our object. The Time property is a bit special since the time format used by pslist isn't one of the formats that PowerShell/Windows natively understands. The variable $pslistobjects then contains PowerShell'ed objects from volatility's pslist. We can then sort, filter, or do perform all sorts of tricks once it has been PowerShellized.

A similar mini-script will objectify the output from psscan2:

PS C:\> $null, $null, $null, $psscan2 = \python25\python.exe volatility psscan2 -f memory.img
PS C:\> [regex]$regex = '\s*?(?<PID>[0-9]+)\s+(?<PPID>[0-9]+)\s(?<Created>.{24})\s(?<Exited>.{24})
\s(?<Offset>[0-9a-fx]{10})\s(?<PDB>[0-9a-fx]{10})\s(?<Name>.+)'

PS C:\> $psscan2objects = foreach ($p in $psscan2) {
... $psobj = "" | Select-Object Name, PID, PPID, Created, Exited, Offset, PDB
... $p -match $regex | Out-Null
... $psobj.Name = $matches.Name
... $psobj.PID = $matches.PID
... $psobj.PPID = $matches.PPID
... $psobj.Offset = $matches.Offset
... $psobj.PDB = $matches.PDB
... if ($matches.Created.Trim()) {
... $psobj.Created = [datetime]::ParseExact($matches.Created, "ddd MMM dd HH:mm:ss yyyy", $null)
... }
... if ($matches.Exited.Trim()) {
... $psobj.Exited = [datetime]::ParseExact($matches.Exited, "ddd MMM dd HH:mm:ss yyyy", $null)
... }
... $psobj
... }

PS C:\> $psscan2objects | ft

Name PID PPID Created Exited Offset PDB
---- --- ---- ------- ------ ------ ---
svchost.exe 932 672 1/28/2010 4:11:47 PM 0x01ea3558 0x082c0100
wmiprvse.exe 1744 848 2/4/2010 12:02:53 AM 2/4/2010 12:04:23 AM 0x01eaea88 0x082c0380
svchost.exe 1132 672 1/28/2010 4:11:48 PM 0x01eb4970 0x082c0160
mike022.exe 1956 672 2/2/2010 3:25:29 AM 0x020155d8 0x082c02c0
...


If you are going to use these commands often I would highly suggest making these into script files. You could even pass the file name to these scripts and have it wrap the volititlity commands.

Ok, so now we have two variables, each contains the output of the respective volatility command.

PS C:\> $pslistobjects | ft

Name PID PPID Threads Handles Time
---- --- ---- ------- ------- ----
System 4 0 55 260 1/1/1970 12:00:00 AM
smss.exe 540 4 3 21 1/28/2010 4:11:40 PM
csrss.exe 604 540 12 363 1/28/2010 4:11:46 PM
lsass.exe 684 628 18 341 1/28/2010 4:11:47 PM
...


PS C:\> $psscan2objects | ft

Name PID PPID Created Exited Offset PDB
---- --- ---- ------- ------ ------ ---
svchost.exe 932 672 1/28/2010 4:11:47 PM 0x01ea3558 0x082c0100
wmiprvse.exe 1744 848 2/4/2010 12:02:53 AM 2/4/2010 12:04:23 AM 0x01eaea88 0x082c0380
svchost.exe 1132 672 1/28/2010 4:11:48 PM 0x01eb4970 0x082c0160
mike022.exe 1956 672 2/2/2010 3:25:29 AM 0x020155d8 0x082c02c0
...


Finally Now, we can then use the Compare-Object cmdlet to compare the two sets of processes.

PS C:\> Compare-Object $pslistobjects $psscan2objects -Property name,pid

name pid SideIndicator
---- --- -------------
svchost.exe 932 =>
wmiprvse.exe 1744 =>
cmd.exe 1172 =>
msmsgs.exe 1664 =>
wordpad.exe 272 =>
alg.exe 1012 =>
VMwareTray.exe 1648 =>
cmd.exe 1748 =>
services.exe 672 =>
winlogon.exe 628 =>


The Property parameter is used to specify the properties to use for comparison. We can either use a single property or a comma separated list of property names.

From this output it is quickly apparent that there are 10 processes found by psscan2 that were not found by pslist.

Whew, that was a lot of work this week. I hope it gets me on Santa's Nice list...next year.

Davide is too cool for school

Davide Brini has once again punk'd me with this full-on awk attack:

awk 'FNR>1 && NR==FNR {a[$1,$2]; next} 
FNR>3 && !(($NF,$1) in a)' \
<(volatility pslist -f memory.img) \
<(volatility psscan2 -f memory.img)

Obviously, Davide has a PhD in awk, so let me explain what's going on here. FNR is an internal awk variable that tracks the current "input record number"-- usually the line number-- of the current file. NR, on the other hand, tracks the total number of records (lines) seen so far across all files.

If you look at the first awk clause, the "FNR>1" is how Davide is skipping the first header line in the pslist output. The "NR=FNR" expression will only be true if we're processing the first input "file", i.e. the output of "volatility pslist ...". Once awk moves on to the second "file" (the psscan output), NR will keep on accumulating, but FNR will be reset to zero.

So the first clause is for handling the psscan output. If you look at what's happening in the curly braces, Davide is creating empty array entries indexed by process name ($1) and PID ($2). The "next" just tells awk to read and process the next line of input, skipping the second clause which applies to the psscan output.

So let's look at that second clause. We can only get here if "NR!=FNR", which means we're dealing with the psscan output from the second input "file". Here Davide is using "FNR>3" to skip the header lines. For all the other lines, "!(($NF,$1) in a)" is true if and only if there is no entry in the array "a" for this combination of process name ($NF) and PID ($1). If we don't find an entry then psscan is telling us about a process that's been hidden from pslist and we want to output the information about this process. Davide is relying on the implicit "{print}" behavior of awk to make this happen.

Davide points out that the output from the above command will not be sorted, but you can always pipe the results into sort if that's important to you:

awk 'FNR>1 && NR==FNR {a[$1,$2]; next} 
FNR>3 && !(($NF,$1) in a)' \
<(volatility pslist -f memory.img) \
<(volatility psscan2 -f memory.img) | sort -n -k2,2

Nice job, Davide!

Michael has to stay late for passing notes

Wow, this Episode sure provoked a lot of interesting commentary. Michael Hale Ligh gave us a shout out from the volatility camp. He even wrote a small plugin for volatility, psdiff.py, that does the same thing as our command line kung fu:

# For http://volatility.googlecode.com/svn/branches/Volatility-1.4_rc1

import volatility.plugins.psscan as psscan
import volatility.win32.tasks as tasks
import volatility.utils as utils

class PsDiff(psscan.PSScan):
"""Produce a process diff"""

def calculate(self):
addr_space = utils.load_as(self._config)

# Build a dictionary of processes found by scanning. The keys are
# physical addresses and the values are the objects
procs_scan = dict((p.obj_offset, p) for p in psscan.PSScan.calculate(self))

# Build a dictionary of processes found by walking the linked list.
# The virtual addresses are converted to physical with vtop.
procs_list = dict((addr_space.vtop(p.obj_offset), p) for p in tasks.pslist(addr_space))

# Create two sets of addresses so we can easily compute the difference
scan_addrs = set(procs_scan.keys())
list_addrs = set(procs_list.keys())

# Yield any objects that are found by psscan but not pslist
for addr in (scan_addrs - list_addrs):
yield procs_scan[addr]

def render_text(self, outfd, data):
for p in data:
outfd.write("{0:<8} {1:<16} {2}\n".format(p.UniqueProcessId, p.ImageFileName, p.ExitTime))

Michael's plugin uses "psscan" instead of "psscan2", so the output will be slightly different, but it shouldn't be that hard to switch things over to use "psscan2" instead if you prefer. Michael also provided a bit more explanation in his original email:

$ python volatility.py psdiff -f memory.dmp

Volatile Systems Volatility Framework 1.4_rc1
0 Idle 1970-01-01 00:00:00
940 cmd.exe 2008-11-26 07:45:49
660 services.exe 1970-01-01 00:00:00
808 taskmgr.exe 2008-11-26 07:45:40
924 svchost.exe 1970-01-01 00:00:00
592 csrss.exe 1970-01-01 00:00:00
992 alg.exe 1970-01-01 00:00:00
1016 svchost.exe 1970-01-01 00:00:00
828 svchost.exe 1970-01-01 00:00:00

The exit time of "1970-01-01 00:00:00" just means the field is empty (process is still active). I am doing the diff based on the address of EPROCESS objects, however its possible, though not very likely, that an address could get re-used...so for a more robust diff you may check other fields as well.

If you want to see other fields in the output, its rather easy because the Volatility types are auto-generated from Microsoft's PDB symbol files. For example since Windows defines a structure like this:

typedef struct _EPROCESS {
...
char ImageFileName[16];
DWORD UniqueProcessId;
...
} EPROCESS, *PEPROCESS;

You can print those fields like p.ImageFileName and p.UniqueProcessId in the plugin.

Lastly, the csrpslist plugin discussed in Malware Analyst's Cookbook produces a diff using two alternate sources of process listings (the csrss.exe handle table and an internal linked list found in the memory of csrss.exe). There are many other sources as well...

Tuesday, December 21, 2010

Episode #126: Cleaning Up The Dump

Hal's directories are bloated

It's not politically correct to say, but sometimes in Unix your directories just get fat. And like most of us, as your directories get fat, they also get slow. This is because in standard Unix file systems, directories are implemented as sequential lists of file names. They aren't even sorted, so you can't binary search them.

For example, suppose you'd just been dumping your logs into a single directory for years. You could end up with a big pile of stuff that looks like this:

# ls -ld logs
drwxr-xr-x 2 root root 266240 Dec 18 15:49 logs
# ls logs | wc -l
7188
# ls logs
authpriv.20070808.gz
authpriv.20070809.gz
authpriv.20070810.gz
...

Almost 7200 files-- and as you can see the directory itself has grown to be about a quarter of a megabyte! In our example, the file names are "<log>.YYYYMMDD" with an optional ".gz" extension on the older log files that have been compressed to save space.

Well I want my directories to be fit and lean again, so I decided to move the files into a tree structure based on year and month. So I'll need to move each file to a new location such as "YYYY/MM/<log>.YYYYMMDD". That should prevent any single sub-directory from getting too bloated.

I think there are a lot of ways you could attack this one, but I decided to make some noise with sed:

# cd logs
# for file in *; do
dir=$(echo $file | sed 's/.*\.\([0-9][0-9][0-9][0-9]\)\([0-9][0-9]\).*/\1\/\2/');
mkdir -p $dir;
mv $file $dir;
done

Yep, that sed expression sure is noisy-- as in "line noise". What's going on here? Well I'm taking the file name as input and using sed to pull out the YYYY and the MM and reformatting them into a subdirectory name like "YYYY/MM". First I match "anything followed by a literal dot", aka ".*\.". Then I match four digits-- four instances of the set "[0-9]"-- followed by two digits. However, I enclose both groups of digits in parens-- "\( ... \)"-- so that I can use the matched values on the righthand side of the substitution. On the RHS, "\1" is the four-digit year we matched in the first parenthesized expression and "\2" is the month we matched second. So "\1\/\2" is the year and the month with a literal slash in between-- "YYYY/MM". Obvious, right?

But the sed is the hard part. Once that's over, it's a simple task to make the directory and move the file. And now our directory should be nice and skinny:

# ls
2007 2008 2009 2010
# ls -ld .
drwxr-xr-x 6 root root 266240 Dec 18 15:58 .

Wait a minute! We've only got four top-level directories under our logs directory, but the logs directory itself hasn't shrunk at all. Unfortunately, this is normal behavior for Unix-- once a directory gets big, it never loses the weight.

So how do we stop our directory from looking like Jabba the Hut? In Unix, you make a new directory and wipe out the old one:

# mkdir ../newlogs
# mv * ../newlogs
# cd ..
# rmdir logs
# mv newlogs logs
# ls -ld logs
drwxr-xr-x 6 root root 4096 Dec 18 16:09 logs

It's liposuction via cloning! A miracle of the modern age! OK, really it's a lame mis-feature of the Unix file system. But at least you now know what to do about it.

And now I want to see Tim push his big directories around. Hey Tim, your directory is so fat...
Tim feels bloated from all the Christmas food:

Hal, yo directories is so fat, when they floated around the ocean Spain claimed them as a new world.

Ok, so the joke is terrible, but the problem is real. Directories with a lot of files can really be a pain.

On Windows there isn't one directory that contains all the logs. Each service typically has its own subdirectory under C:\Windows\System32\LogFiles\. For example, the subdirectory W3SVC1 would contain the logs for the first instance of an IIS webserver. Also, with older version of Windows C:\Windows is replaced with C:\WinNT.

This LogFiles directory is used by Microsoft products and some third-party products, but of course the third-party products can put their log files in all sorts of other weird locations. For the sake of this article, we'll assume we are looking at IIS logs.

By default IIS log files are created daily with the naming convention of exyymmdd.log. Microsoft doesn't put the full four digit year, so we'll assume 20XX. Why assume post 2000? Because if you are running an IIS server from the last millennium it probably isn't your server any more (see pwned).

Let's start off by getting the names for our directories, and then we'll build on that. According to Microsoft's IIS Log File Naming Syntax, no matter what file format or regular rotation interval (month, week, day, hour), the format always is always:

<some chars describing format><YY><MM><other numbers as used in date format>.log
We can build a regular expression replace pattern to derive directory names from the file names:

PS C:\Windows\System32\LogFiles\W3SVC1> ls *.log | % { [regex]::Replace($_.name, '[^0-9]*([0-9]{2})([0-9]{2}).*', '20$1\$2') }
2010\01
...
2010\02
...
2010\03
...
We use a ForEach-Object (alias %) loop on the output of our directory listing (Get-ChildItem is aliased as ls). Inside the loop we use .Net to call the static Replace method in the Regex class. The Replace method takes three arguments: the input, the search pattern, and the replacement string. The input is the name of the file. The search pattern is slightly more complicated. Here is how the search pattern maps to the portions of the log created on January 16th of 2009, ex090116.log.

[^0-9]*    = ex (all the non-digits at the beginning of the file name)
([0-9]{2}) = 09 (two digit year)
([0-9]{2}) = 01 (two digit month)
.* = 16.log (the rest of the name)
We then use the replacement string to build the directory name, where $1 represents the first grouping (year) and $2 represents the second grouping. Each grouping is designated by parenthesis. For more information on .Net and Regular Expression Replacement, see this article.

Notice, in our command above we used single quotes. That is because PowerShell will expand any strings inside double quotes before our Replace method had a chance to do any replacing. This means that PowerShell would try to convert $1 into a variable and not pass the literal string to the Replace method. Here is what I mean:

PS C:\> echo "Here is my string $1"
Here is my string

PS C:\> echo 'Here is my string $1'
Here is my string $1
We could use double quotes, but we would have to add a backtick (`) before the dollar sign. The resulting command would look like this:

PS C:\Windows\System32\LogFiles\W3SVC1> ls *.log | % {
[regex]::Replace($_.name, '[^0-9]*([0-9]{2})([0-9]{2}).*', "20`$1\`$2") }


So now we have the directory name, let's create the directory structure and move some files! I'm not going to show the full prompt so the command is less cluttered.

> Get-ChildItem *.log | ForEach-Object {
$dir = [regex]::Replace($_.Name, '[^0-9]*([0-9]{2})([0-9]{2}).*', "20`$1\`$2");
mkdir $dir -ErrorAction SilentlyContinue;
Move-Item $_ $dir }
Wow, that is a rather large command, so let's trim it down with aliases and shortened parameter names. We can't have a big ol' fat command with our nice lean directories.

> ls *.log | % {
$dir = [regex]::Replace($_.name, '[^0-9]*([0-9]{2})([0-9]{2}).*', "20`$1\`$2");
mkdir $dir -ea;
move $_ $dir }
Inside our ForEach-Object loop we set $dir equal to the new directory name. We then create the directory. The ErrorAction (ea for short) switch tells the shell not to show us an error message or stop processing if there is a problem. In our case, we want to make sure the command continues to run even if the directory already exists. After the directory is created we move the file, which is represented by $_.

PS C:\Windows\System32\LogFiles\W3SVC1> ls

Directory: Microsoft.PowerShell.Core\FileSystem::C:\Windows\System32\LogFiles\W3SVC1

Mode LastWriteTime Length Name
---- ------------- ------ ----
d---- 12/19/2010 12:28 AM <DIR> 2008
d---- 12/19/2010 12:28 AM <DIR> 2009
d---- 12/19/2010 12:28 AM <DIR> 2010


So now we can enter the new year with leaner and meaner directories. And yes, they are meaner. Directories get pretty ticked off when you trim their children.

Tuesday, December 14, 2010

Episode #125: Find Yourself

Tim takes credit for someone else's work:

One of our faithful readers, John, wrote in. Well, we presume he is faithful to us, but we've heard he cheats on us with other blogs, and that's the worst kind of cheating. Since we are short of other ideas I guess we'll have to use his email.

Seriously though, John Ahearne has a nice bit of fu. On one particular assignment, John had carved over 1,200,000 files, where there were over 1,000 per directory. The files were named based on a particular file header in a proprietary file format. The client asked him to look for several files and gave him a text file with the file names. He started with this command to search for his files:

C:\> findstr /s /g:filestofind.txt


He used the command with the /s option to do a recursive search, and the /g option to load the search strings from a file. But there was a problem, slowness. The reason, this command searches inside the file, and we just want to search for the file name. He then tried another command to see if that would work more quickly.

C:\> dir /b /s | findstr /g:filestofind.txt
C:\Windows\System32\cmd.exe
C:\Windows\System32\en-US\cmd.exe.mui
C:\Windows\winsxs\x86_microsoft-windows-c..c87d\cmd.exe.mui
C:\Windows\winsxs\x86_microsoft-windows-c..1ee0\cmd.exe
C:\Windows\winsxs\x86_microsoft-windows-i..d1e2\appcmd.exe.mui
C:\Windows\winsxs\x86_microsoft-windows-i..ecbd\appcmd.exe
C:\Windows\winsxs\x86_microsoft-windows-s..5b54\evntcmd.exe.mui
C:\Windows\winsxs\x86_microsoft-windows-s..b805\evntcmd.exe


This is much quicker, and it searches what we actually want! How would we do the same thing in PowerShell?

PS C:\> ls -r -i (cat .\filestofind.txt)

Directory: C:\Windows\System32

Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 7/13/2009 8:14 PM 301568 cmd.exe

Directory: C:\Windows\winsxs\x86_microsoft-windows-commandpro...

Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 7/13/2009 8:14 PM 301568 cmd.exe


We use Get-ChildItem (alias ls) with the Recursive option (r for short). Also used is the Include parameter (i for short), which is used to find items that match our search string where our search string is taken from the file, via Get-Content (alias cat). One other thing, notice a difference between the output of the two commands?

One command shows only files named "cmd.exe", the other looks for files containing "cmd.exe". The difference is due to the way each command expects the search strings to be presented. Here is a little chart describing how to get similar searches from each command:




Search TypePowerShellcmd.exe
Name is exactly cmd.execmd.exe^cmd.exe$
Name contains cmd.exe*cmd.exe*cmd.exe
Name ends with cmd.execmd.exe*cmd.exe$


Note, in the second case our cmd command will return any file with cmd.exe in the path, so one of the other options might be a better choice.

We can get the same results with each command. Obviously, when searching 1,200,000 files we want to use the faster command. Let's do a little test to see which is faster. We'll use search strings that return identical results. More specifically, we'll use the search string that exactly matches a file named cmd.exe. Before each search I modified the file filetofind.txt accordingly. Now how do we measure the duration of each command?

PowerShell has the measure-command cmdlet, but cmd.exe does not have a way to measure time. However, Ed used a cool method in episode #49 that I'll borrow.

PS C:\> measure-command { ls -r C:\Windows -Include (cat .\filestofind.txt) } | Select TotalSeconds

TotalSeconds
------------
55.7072613


C:\> cmd.exe /v:on /c "echo !time! & (dir C:\Windows /s /b | findstr /g:filestofind.txt > NUL) & echo !time!"
23:55:25.16
23:55:41.52


Cmd.exe took just 16.36 seconds, which is 3.4 times faster than PowerShell's 55.7 seconds. Wow! The cmd.exe command is obviously the command we are going to use.

After John found the files, he needed a way to copy the files to a location of his choosing. Here is the cool little command he came up with:

C:\> dir /b /s | findstr /g:filestofind.txt > c:\foundthem.txt &
FOR /F %i in (d:\foundthem.txt) do copy %i d:\neededfiles\


This takes the output from our search and dumps it into foundthem.txt. We then use a For loop to read the contents of the file and copy each file to the neededfiles directory.

Well done John.

I have to say thanks to John, since he came up with the idea and wrote the commands; making my life much easier. I wonder if Hal has found anyone to write his episode for him?

Hal stands alone

Geez, Tim, since John did all your work for you I was sort of hoping that you'd write the Unix bit this week. Some friend and co-author you are!

The Unix equivalent of what John's trying to do would be something like this:

# find /etc -type f | grep -f filestofind.txt
/etc/passwd.bak
/etc/shadow.bak
/etc/passwd
/etc/security/group.conf
/etc/security/opasswd
...

Here I'm using find to output a list of all regular files ("-type f") under /etc. Then I pipe that output into a grep command and use the "-f" option to tell grep to read a list of patterns from a text file. In this case my patterns were things like "passwd", "shadow", "group", and so on, which actually match a surprisingly large number of files under /etc.

Since were talking about performance improvements here, it's worth noting that if you're searching for fixed strings rather than regular expressions, then using fgrep is going to be faster:

# time find /etc -type f | grep -f filestofind.txt >/dev/null

real 0m0.052s
user 0m0.030s
sys 0m0.020s
# time find /etc -type f | fgrep -f filestofind.txt >/dev/null

real 0m0.026s
user 0m0.010s
sys 0m0.010s

Now /etc is a fairly small directory-- we'd probably get better numbers if we tried running this on a larger portion of the file system. And we should probably run multiple trials to get a more average value. But at least in this case you can see that fgrep is twice as fast as grep.

John's actual challenge is to copy the matching files into another directory. We can use the cpio trick from Episode #115 to actually copy the files:

# find /etc -type f | fgrep -f filestofind.txt | cpio -pd /root/saved
39 blocks

"cpio -p" is the "pass through" option that reads file/directory names from the standard input and copies them from their current location to the directory name specified with "-d". You don't even have to create the target directory-- if it doesn't exist, cpio will create it for you.

So this one really wasn't that difficult. Tim may need our readers to help him, but us Unix folks can get it done on our own.

Tuesday, December 7, 2010

Episode #124: Levelling Up

Tim set himself up to bomb:

So I came up with the idea for this episode, totally my fault. And I knew going into it that I was setting myself for a significant beating from Hal. My guess is that it will take him all of five minutes to write his portion. So here goes.

One of the nice features of Windows is the extremely granular permissions that can be granted on files and directories. This functionality comes at a price, it makes auditing of permissions a big pain. Especially when it comes to groups, and even worse, nested groups. A few of my colleagues and I were looking for files that would allow us to elevate our privileges from the limited user account one with more privileges. Files run by service accounts, or possibly an administrator, and are also modifiable by a more limited user. In short, we were looking for files owned by an admin but writeable by a limited user. Before we get into the fu, we need to look at how file permissions look in PowerShell.

To get file permissions we need to use the Get-Acl cmdlet. The output of the command looks like this (fl is an alias for Format-List and is used to display the results in list form):
PS C:\> get-acl test | fl

Path : Microsoft.PowerShell.Core\FileSystem::C:\test
Owner : MYDOM\tim
Group :
Access : BUILTIN\Administrators Allow FullControl
NT AUTHORITY\SYSTEM Allow FullControl
MYDOM\tim Allow FullControl
CREATOR OWNER Allow 268435456
BUILTIN\Users Allow ReadAndExecute, Synchronize
BUILTIN\Users Allow AppendData
BUILTIN\Users Allow CreateFiles
Audit :
Sddl : O:S-1-5-21-236840484-2123344539-2455687859-23475G:DUD:(A;OICIID;FA;;;BA)
(A;OICIID;FA;;;SY)(A;ID;FA;;;S-1-5-21-236840484-2123344539-2455687859-23475)
(A;OICIIOID;GA;;;CO)(A;OICIID;0x1200a9;;;BU)(A;CIID;LC;;;BU)(A;CIID;DC;;;BU)


If you look at the Access property you can see that I, MYDOM\Tim, have full access to the folder test. This means I can do what ever I want to the file. Let's take a closer look at this property and expand it using the Select-Object cmdlet with the ExpandProprty option.

PS C:\> get-acl test | select -ExpandProperty Access
FileSystemRights : FullControl
AccessControlType : Allow
IdentityReference : MYDOM\tim
IsInherited : True
InheritanceFlags : None
PropagationFlags : None


In order for me to have write permission, the IdentityReference needs to my user account or a group of which I am a member. The FileSystemRights must be something that allows me to modify the file. Finally, the AccessControlType needs to be Allow. Ok, great, but what groups am I a member of?

To get a list of all the groups a user is a member of you can use the Get-ADAccountAuthorizationGroup cmdlet. The problem, it requires a Windows Server 2008 R2 domain controller or an instance of AD LDS running on a Windows Server 2008 R2 server. It also requires that you have ability to query the domain controller. We'll assume we don't have permissions to do this, so we'll just look for some known groups on the local computer that I should be a member of:

  • MYDOM\Users

  • MYLAPPY\Users

  • MYLAPPY\Guests

  • Everyone



Now we have the list of groups. All we need to do is add my user account and we have the list of IdentityReference values we need to look for.

We also need to filter for specific permissions which will allow us modify the file. Here is what we are looking for:

  • FullControl

  • WriteData

  • CreateFiles

  • AppendData

  • ChangePermissions

  • TakeOwnership

  • Write

  • Modify



So now with all this knowledge of what to look for, we can now do our search for executable files in the Windows directory which we can modify. MYLAPPY is the name of my computer, and MYDOM is the name of my domain.

PS C:\Windows> ls -r -include *.exe,*.ps1,*.bat,*.com,*.vbs,*.dll | Get-Acl |
? { select -InputObject $_ -ExpandProperty Access |
? { ("MYDOM\tim","MYDOM\Users","MYLAPPY\Users","MYLAPPY\Guests","Everyone" -contains $_.IdentityReference)
-and ( "FullControl","WriteData","CreateFiles","AppendData","ChangePermissions","TakeOwnership","Write","Modify"
-contains $_.FileSystemRights) -and $_.AccessControlType -eq "Allow" }
} |
select path


We start off with a recursive directory listing that finds executable files. The results are piped into Get-Acl. A giant Where-Object (alias ?) filter is used to find the files we want. In this case use a nested Where-Object filter. If the inner filter returns an object (an Access object), the outer filter returns true and will return the parent object (the Acl object).

The outer filter just sets up our inner filter. In the inner filter we check to see if the current Access object matches our username or group. This is done by creating a collection of principles and checking if the IdentityReference property of the Access object is in the collection. We take a similar approach with the File System Rights property. Finally, we check the Access Control Type is Allow, rather than Deny. If all three parts are true, then the Acl object is passed down the pipeline where we just output the path to the file. The only problem is that this command does not check to see if a Deny rule supercedes the Allow rule.

We could also add a filter for files owned by MYLAPPY\Administrators.

PS C:\> ls -r -include *.exe,*.ps1,*.bat,*.com,*.vbs,*.dll | Get-Acl |
? { $_.Owner -eq "MYLAPPY\Administrators" } ...


The problem with this approach is that the file we are looking for may be owned by a Domain Admin or some other service account with elevated permissions so we might have to do another collection of principles like we did above. The nice thing with MYLAPPY\Administrators is that group is the default owner of any object that is created by a member of the group, meaning John is an Administrator and he creates a file it will be owned by MYLAPPY\Administrators. Of course there are options in Windows to change this setting, but it is the default.

So there you have it. And by it, I mean a big, confusing, complex command. An now Hal is going to give it to you. And by it I mean a simple short easy to read command.

Hal says, "Unix is the bomb!"

Here's a reasonable Unix approximation for what Tim is trying to do. It's surprisingly not all that terse:

find / -type f -user root \( -perm -0020 -o -perm -0002 \) \
\( -perm -0100 -o -perm -0010 -o -perm -0001 \)


The basic idea is simple. We want to find executable files that are owned by root but which are group or world writable. "Files owned by root" is no problem: that's just "-type f -user root". The verbosity comes from how you have to specify permissions with find.

If I want to say "group or world writable", I end up having to specify each bit with its own "-perm -...." clause and then gang them together with "or" ("-o") and parens ("\( ... \)"). Similarly, defining "executable" means checking each of the three possible execute bits individually. I've often wanted find to have a terser syntax for doing this kind of thing.

But there's a solution for you in any event. Unix's much less granular ownership and permissions model makes things considerably easier on this side of the house than on Windows.

Tuesday, November 30, 2010

Episode #123: Bad Connections

Hal rings up another one

Similar to last week, this week's challenge comes from Tim's friend who is mentoring a CCDC team. The mentor was interested in creating some shell fu that lets them monitor all network connections in and out of a system and get information about the executable that's handling the local side of the connection. The kind of information they're looking for is the sort of thing you'd get from the output of "ls -l": permissions, ownership, file size, MAC times, etc.

Truthfully, I got a sinking feeling when I heard this request. We already established back in Episode #93 how nasty it can be to determine the path name of an executable from the output of lsof if you want to do it in a way that's portable across a wide number of Unix-like systems. But let's adapt what we learned in Episode #93 to get the executable names we're interested in:

# for pid in $(lsof -i -t); do 
lsof -a -p $pid -d txt | awk '/txt/ {print $9}' | head -1;
done

/usr/sbin/avahi-daemon
/usr/sbin/sshd
/sbin/dhclient3
/usr/sbin/mysqld
/usr/sbin/cupsd
/opt/cisco/vpn/bin/vpnagentd
/usr/lib/apache2/mpm-worker/apache2
/usr/lib/apache2/mpm-worker/apache2
/usr/lib/apache2/mpm-worker/apache2
/usr/sbin/ntpd
/usr/lib/firefox-3.6.12/firefox-bin
/usr/bin/ssh
/usr/bin/ssh

In the for loop, "lsof -i -t" tells lsof to just print out the PIDs ("-t") of the processes that have active network connections ("lsof -i"). We then use the trick we developed in Episode #93 to get the binary name associated with each process ID.

Of course, you'll notice that there are multiple instances of executables like ssh and apache2, and we probably don't want to dump the same information multiple times. A little "sort -u" action will fix that right up:

# for pid in $(lsof -i -t); do 
lsof -a -p $pid -d txt | awk '/txt/ {print $9}' | head -1;
done | sort -u | xargs ls -l

-rwsr-xr-x 1 root root 1719832 2010-02-17 16:17 /opt/cisco/vpn/bin/vpnagentd
-rwxr-xr-x 1 root root 443472 2010-01-26 20:35 /sbin/dhclient3
-rwxr-xr-x 1 root root 333464 2009-10-22 12:58 /usr/bin/ssh
-rwxr-xr-x 1 root root 478768 2010-08-16 10:42 /usr/lib/apache2/mpm-worker/apache2
-rwxr-xr-x 1 root root 51496 2010-10-27 06:37 /usr/lib/firefox-3.6.12/firefox-bin
-rwxr-xr-x 1 root root 119032 2010-09-22 11:03 /usr/sbin/avahi-daemon
-rwxr-xr-x 1 root root 416304 2010-11-02 11:24 /usr/sbin/cupsd
-rwxr-xr-x 1 root root 9943440 2010-11-09 21:19 /usr/sbin/mysqld
-rwxr-xr-x 1 root root 548976 2009-12-04 11:03 /usr/sbin/ntpd
-rwxr-xr-x 1 root root 441888 2009-10-22 12:58 /usr/sbin/sshd

Once I use "sort -u" to produce the unique list of executable names, I just pop that output into xargs to get a detailed file listing about each executable.

I would say that this meets the terms of the challenge, but the output left me rather unsatisfied. I'd really like to see exactly what network connections are associated with each of the above executables. So I decided to replace xargs with another loop:

# for pid in $(lsof -i -t); do 
lsof -a -p $pid -d txt | awk '/txt/ {print $9}' | head -1;
done | sort -u |
while read exe; do
echo ===========;
ls -l $exe;
lsof -an -i -c $(basename $exe);
done

===========
-rwsr-xr-x 1 root root 1719832 2010-02-17 16:17 /opt/cisco/vpn/bin/vpnagentd
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
vpnagentd 2314 root 12u IPv4 8634 0t0 TCP 127.0.0.1:29754 (LISTEN)
===========
-rwxr-xr-x 1 root root 443472 2010-01-26 20:35 /sbin/dhclient3
===========
...

My new while loop reads each executable path and generates a little report-- first a separator line, then the output of "ls -l", and then some lsof output. In this case we have lsof dump the network information ("-i") related to the given command name ("-c"). However, the "-c" option only wants the "basename" of the commmand and not the full path name. The "-a" option says to join the "-i" and "-c" requirements with a logical "and" and "-n" suppresses mapping IP addresses to host names.

But what's up with the dhclient3 output? Why are we not seeing anything from lsof?

# lsof -an -i -c /dhcli/
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
dhclient 1725 root 5w IPv4 6266 0t0 UDP *:bootpc

Broadening our search a little bit by using the "/dhcli/" syntax to do a substring match, you can now see in the lsof output that the "command name" as far as lsof is concerned is "dhclient", and not "dhclient3". It turns out that on this system, /sbin/dhclient is a symbolic link to /sbin/dhclient3, so there's a disconnect between the executable name and the name that the program was invoked with.

Well that's a bother! But I can make this work:

# for pid in $(lsof -i -t); do      
lsof -a -p $pid -d txt | awk '/txt/ {print $9,$1}' | head -1;
done | sort -u |
while read exe cmd; do
echo ==========;
ls -l $exe;
lsof -an -i -c $cmd;
done

==========
-rwsr-xr-x 1 root root 1719832 2010-02-17 16:17 /opt/cisco/vpn/bin/vpnagentd
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
vpnagentd 2314 root 12u IPv4 8634 0t0 TCP 127.0.0.1:29754 (LISTEN)
==========
-rwxr-xr-x 1 root root 443472 2010-01-26 20:35 /sbin/dhclient3
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
dhclient 1725 root 5w IPv4 6266 0t0 UDP *:bootpc
==========
...

If you look carefully, the awk statement in the first loop is now outputting the executable path followed by the command name as reported by lsof ("print $9,$1"). So now all my second loop has to do is read these two values into separate variables and call ls and lsof with the appropriate arguments. This actually saves me calling out to basename, so it's more efficient anyway (and probably what I should have done in the first place).

Whew! That one was all kinds of nasty! I wonder how Tim will fare this week?

Tim set himself up:

I can tell you right now, I didn't fare well on this one. I had initially suggested this topic, but it was such a pain and was borderline scripting. I wanted to nix it, but no, Hal wanted to torture me. Merry friggin' Christmas to you too Hal.

Let's start off with cmd. We have the netstat command and we can use it to see what executable is involved in creating each connection or listening port.

C:\> netstat -bn

Active Connections

Proto Local Address Foreign Address State PID
TCP 192.168.1.10:2870 11.22.33.44:443 ESTABLISHED 172
[ma.exe]

TCP 192.168.1.10:1420 99.88.77.66:80 CLOSE_WAIT 5408
[firefox.exe]
...


This is nice since it gives us the IP Addresses and ports in use, as well as the name of the executable. The problem is, it doesn't give us the full path. Since we don't have the full path we either have to search for the executable (big pain) or just go off of the name. Not a great solution since two executables can have the same name but be in different directories. We need a different approach, what if we use netstat in a For loop with tasklist?

C:\> for /F "tokens=5 skip=4" %i in ('netstat -ano') do @tasklist /V /FI "PID eq %i"

Image Name PID Session Name Session# Mem Usage Status User Name CPU Time Window Title
============ ==== ============ ======== ========= ======= ============================ ======== ============
svchost.exe 1840 Console 0 5,084 K Running NT AUTHORITY\NETWORK SERVICE 0:00:01 N/A

Image Name PID Session Name Session# Mem Usage Status User Name CPU Time Window Title
============ ==== ============ ======== ========= ======= ============================ ======== ============
ma.exe 172 Console 0 3,372 K Running NT AUTHORITY\SYSTEM 0:00:10 MicroAgent

Image Name PID Session Name Session# Mem Usage Status User Name CPU Time Window Title
============ ==== ============ ======== ========= ======= ============================ ======== ============
svchost.exe 1768 Console 0 5,608 K Running NT AUTHORITY\SYSTEM 0:00:00 N/A

Image Name PID Session Name Session# Mem Usage Status User Name CPU Time Window Title
============ ==== ============ ======== ========= ======= ============================ ======== ============
svchost.exe 1840 Console 0 5,084 K Running NT AUTHORITY\NETWORK SERVICE 0:00:01 N/A
...


This is our good ol' For loop we've used a bunch of times. The loop takes the output of "netstat -ano", skips the four header lines and sets %i to the Process ID. We then use the Process ID with the tasklist command's filter to get the information on the process. The /V switch is used to get additional information on the process, but it still doesn't give us the full path. Bah, humbug. Cmd get's a lump of coal. Let's see if PowerShell can do anything better!

First off, PowerShell doesn't have a cmdlet that gives us a nice version of netstat. Off to a bad start already.

I use the Get-Netstat script (yep, I said script) to parse netstat output for use with PowerShell.

PS C:\> Get-Netstat | ft *

Protocol LocalAddress Localport RemoteAddress Remoteport State PID ProcessName
-------- ------------ --------- ------------- ---------- ----- --- -----------
TCP 0.0.0.0 135 0.0.0.0 0 LISTENING 1840 svchost
TCP 0.0.0.0 445 0.0.0.0 0 LISTENING 4 System
TCP 0.0.0.0 912 0.0.0.0 0 LISTENING 2420 vmware-authd
TCP 0.0.0.0 3389 0.0.0.0 0 LISTENING 1768 svchost
...


We can use the Add-Member cmdlet to extend this object to add a Path property.

PS C:\> Get-Netstat | % { Add-Member -InputObject $_ -MemberType NoteProperty -Name Path
-Value (Get-Process -Id $_.PID).Path -Force -PassThru }


Protocol : TCP
LocalAddress : 0.0.0.0
Localport : 912
RemoteAddress : 0.0.0.0
Remoteport : 0
State : LISTENING
PID : 2420
ProcessName : vmware-authd
Path : C:\Program Files\VMware\VMware Player\vmware-authd.exe

Protocol : TCP
LocalAddress : 0.0.0.0
Localport : 3389
RemoteAddress : 0.0.0.0
Remoteport : 0
State : LISTENING
PID : 1768
ProcessName : svchost
Path : C:\WINDOWS\system32\svchost.exe
...


The Add-Member cmdlet takes the current object sent down the pipeline ($_) as its input object. We then use the MemberType switch to specify a NoteProperty (static value) along with the name and value. The Force option has to be used for some silly reason, or else PowerShell complains the property already exists (which it doesn't). Finally, the PassThru switch is used to send our object down the pipeline.

We can use this approach to add as many properties as we would like. Let's add the CreationTime from the executable.

PS C:\> Get-Netstat | % { Add-Member -InputObject $_ -MemberType NoteProperty -Name Path
-Value (Get-Process -Id $_.PID).Path -Force -PassThru } | % { Add-Member -InputObject $_
-MemberType NoteProperty -Name CreationTime -Value (ls $_.Path).CreationTime -Force -PassThru }


Protocol : TCP
LocalAddress : 0.0.0.0
Localport : 912
RemoteAddress : 0.0.0.0
Remoteport : 0
State : LISTENING
PID : 2420
ProcessName : vmware-authd
Path : C:\Program Files\VMware\VMware Player\vmware-authd.exe
CreationTime : 11/11/2010 11:11:11 PM


Adding more properties makes our command significantly longer. But, it does have the added benefit of everything being an object. We can export this or do all sorts of filtering. However, this would obviously be better suited as a...<gulp>...script...

Tuesday, November 23, 2010

Episode #122: More Whacking of Moles

Tim prepares for a fight:

In my home town we have a college with a team who intends to compete in the CCDC Competition. The students are in control of a number of systems that are under attack by professional penetration testers (hackers) and the students need to defend the systems from the attackers.

The mentor of the group asked if I had any nasty little tricks to help defend the systems. I first pointed him to our Advanced Process Whack-a-Mole. I was then asked if there was a good way to baseline the system for running processes, and then kill any that aren't in that group. I said sure, but with two caveats: 1) most exploits aren't going to kick off a separate process and 2) this may have unexpected consequences. But we went a head an did it anyway to experiment. After all, college is a time to experiment isn't it?

Let's first use cmd.exe to create our baseline file.

C:\> for /f "skip=3" %i in ('tasklist') do @echo %i >> knowngood.txt


The Tasklist command lists all the running processes. The For loop is used to strip off the column headers and to give us just the name of the executable. The knowngood.txt file now contains a list of all the executables that we trust and looks like this:

winlogon.exe
services.exe
lsass.exe
ibmpmsvc.exe
nvsvc32.exe
svchost.exe
svchost.exe
svchost.exe
S24EvMon.exe
svchost.exe
...


Now, a little while later, we come back and check the running processes. We compare the running processes against our file to find ones we don't approve of.

C:\> for /f "skip=3" %i in ('tasklist') do @type knowngood.txt |
find "%i" > null || echo "bad process found: %i"


bad process found: calc.exe


Uh Oh, it looks like someone is doing some unauthorized math. We'll stop that, but first let's see how this command works?

The first part of our For loop parses the output of tasklist. The variable %i contains the name of the executable of a currently running process. We then need to search the file to see if %i is a good process, or a bad one.

We write out the contents of knowngood.txt and use the Find command to see if the file contains the process %i. And we don't care to see the output, so the output is redirected to NUL. The next part is a little trick using the Logical Or (||) operator.

As you probably know, if either input to our Logical Or is a true, then the result is true.

Input 1 Input 2 Result
true true true
true false true
false true true
false false false


If the first part of the command is successful, meaning we found a string in the file, then it returns true, otherwise the result is false. Also notice, if the first input is true, then we don't need to check the second input since the result is already true. Operators that operate in such a manner are known as Short-Circuit Operators, and we can use the functionality to our advantage.

If our Find command finds our process in the file, then the result is true, so there is no need to do the second portion. Only if our Find does not find a match do we execute the second portion of our command, in this case Echo.

We can upgrade the command to kill our process too.

C:\> for /f "skip=3" %i in ('tasklist') do @type knowngood.txt | find "%i" > NUL || taskkill /F /IM %i
SUCCESS: The process "calc.exe" with PID 1932 has been terminated.


Cool, we have an automated killing machine, now to do the same thing in PowerShell. We'll start off creating our known good file.

PS C:\> Get-Process | select name | Export-Csv knowngood.csv


The Export-Csv cmdlet is the best way to import and export object via PowerShell. Once we export our list we can come back later and look for any "rogue" processes.

PS C:\> Compare-Object (Import-Csv .\knowngood.csv) (Get-Process) -Property Name |
? { $_.SideIndicator -eq "=>" }


Name SideIndicator
---- -------------
calc =>


We use the Compare-Object cmdlet to compare the Known Good processes from our csv file and the results from the Get-Process command. The comparison takes place on the Name property of each. We then filter the results for objects that aren't in our csv but are returned by Get-Process. And as we can see that pesky calc is back. Silly math, let's make it stop by killing it automatically. To do that, all we need to do is pipe it into the Stop-Process cmdlet.

PS C:\> Compare-Object (Import-Csv .\knowngood.csv) (Get-Process) -Property Name |
? { $_.SideIndicator -eq "=>" } | Stop-Process


Those silly mathmeticians hackers, we have foiled them. No taking over our Windows machines.

Hal joins the throwdown

This challenge turned out to be a lot of fun because what I thought was going to be a straightforward bit of code turned out to have some unexpected wrinkles. I was sure at the outset how I wanted to solve the problem. I would set up an array variable indexed by process ID and containing the command lines of the current processes. Then I would just re-run the ps command and compare the output against the values stored in my array.

So let's first set up our array variable:

$ ps -e -o pid,cmd | tail -n +2 | while read pid cmd; do proc[$pid]=$cmd; done

Here I'm telling ps just to dump out the PID and command line columns. Then I use tail to filter off the header line from the output. Finally, my while loop reads the remaining input and makes the appropriate array variable assignments.

Seems like it should work, right? Well imagine my surprise when I tried to do a little testing by printing out the information about my current shell process:

$ echo ${proc[$$]}

$

Huh? I should be getting some output there. I must admit that I chased my tail on this for quite a while before I realized what was going on. The array assignments in the while loop are happening in a sub-shell! Consequently, the results of the array variable assignments are not available to my parent command shell.

But where there's a will, there's a way:

$ eval $(ps -e -o pid,cmd | tail -n +2 | 
while read pid cmd; do echo "proc[$pid]='$cmd';"; done)

$ echo ${proc[$$]}
bash

Did you figure out the trick here? Rather than doing the array variable assignments in the while loop, the loop is actually outputting the correct shell code to do the assignment statements. Here's the actual loop output:

$ ps -e -o pid,cmd | tail -n +2 | 
while read pid cmd; do echo "proc[$pid]='$cmd';"; done

proc[1]='/sbin/init';
proc[2]='[kthreadd]';
proc[3]='[migration/0]';
[...]

So I then take that output and process it with "eval $(...)" to force the array assignments to happen in the environment of the my parent shell. And you can see in the example output above that this sleight of hand actually works because I get meaningful output from my "echo ${proc[$$]}" command. By the way, for those of you who've never seen it before "$$" is a special variable that expands to be the PID of the current process-- my command shell in this case.

OK, we've got our array variable all loaded up. Now we need to create another loop to check the system for new processes. Actually, since we want these checks to happen over and over again, we end up with two loops:

$ while :; do 
ps -e -o pid,cmd | tail -n +2 | while read pid cmd; do
[[ "${proc[$pid]}" == "$cmd" ]] || echo $pid $cmd;
done;
echo check done;
sleep 5;
done

4607 ps -e -o pid,cmd
4608 tail -n +2
4609 bash
check done
4611 ps -e -o pid,cmd
4612 tail -n +2
4613 bash
check done
4615 ps -e -o pid,cmd
4616 tail -n +2
4617 bash
check done
...

The outermost loop is simply an infinite loop to force the checks to happen over and over again. Inside that loop we have another loop that looks a lot like the one we used to set up our array variable in the first place. In this case however, we're comparing the current ps output against the values stored in our array. Similar to Tim's solution, I'm using short-circuit logical operators to output the information about any processes that don't match up with the stored values in our array. After the loop I throw out a little bit of output an sleep for five seconds before repeating the process all over again.

But take a look at the output. Our comparison loop is catching the ps and tail commands we're running and also the sub-shell we're spawning to process the output of those commands. These processes aren't "suspicious", so we don't want to kill them and we don't want them cluttering our output. But how to filter them out?

Well all of these processes are being spawned by our current shell. So they should have the PID of our current process as their parent process ID. We can filter on that:

$ while :; do 
ps -e -o pid,ppid,cmd | tail -n +2 | while read pid ppid cmd; do
[[ "${proc[$pid]}" == "$cmd" || "$ppid" == "$$" ]] || echo $pid $cmd;
done;
echo check done;
sleep 5;
done

check done
check done
check done
4636 gcalctool
check done
4636 gcalctool
check done
4636 gcalctool
check done
...

So the changes here are that I'm telling the ps command to now output the PPID value in addition to the PID and command line. This means I'm now reading three variables at the top of my while loop. And the comparison operator inside the while loop gets a bit more complicated, since I want to ignore the process if it's already a known process in the proc array variable or if it's PPID is that of my command shell.

From the output you can see that all is well for the first three checks, and then the evil mathematicians fire up their calculator of doom. If you want the calculator of doom to be stopped automatically, then all you have to do is change the echo statement after the "||" in the innermost loop to be "kill -9 $pid" instead. Of course, you'd have to be running as root to be able to kill any process on the system.

Shell trickery! Death to evil mathematicians! What's not to like?

Friend of the blog Jeff Haemer wrote in with an alternate solution that uses intermediate files and some clever join trickery. Check out his blog post for more details.

Tuesday, November 16, 2010

Episode #121: Naughty Characters

Hal has friends in low places:

This week's Episode comes to us courtesy of one our loyal readers who had a bit of a misadventure with vi. The intended keyboard sequence was ":w^C<Enter>", aka "save the file, oh wait nevermind". Unfortunately, there was a bit of a fumble on the ^C and the command that actually got entered was ":w^X<Enter>", aka "save the file as '^X'". Whoops! My friend Jim always says that "experience is what you get when you don't get what you want." Our loyal reader was about to get a whole bunch of experience.

Even listing a file called ^X can be problematic. On Linux and BSD, non-printable characters are represented as a "?" in the output of ls. But on older, proprietary Unix systems like Solaris these characters will be output as-is, leading to weird output like this:

$ ls -l
total 2
-rw-r--r-- 1 hal staff 7 Nov 12 17:28

Wow, that's spectacularly unhelpful.

The GNU version of ls has the -b command switch that will display non-printable characters in octal:

$ ls -lb
total 4
-rw-r--r-- 1 hal hal 7 2010-11-12 14:18 \030

On other architectures, this trick works well:

$ ls -l | cat -v
total 2
-rw-r--r-- 1 hpomer staff 7 Nov 12 17:28 ^X

"cat -v" causes the control characters to be displayed with the "^X" notation.

Great, we can see the characters now, but how do we remove the file? This works:

$ rm $(echo -e \\030)
$ ls -l
total 0

Here we're using "echo -e" to output the literal control sequence using the octal value. We then use the output of the echo command as the argument to rm. Voila! No more file.

Our loyal reader sent in an alternate solution, which is the "classic" way of solving this problem:

$ ls -lbi
total 0
918831 -rw-r--r-- 1 hal hal 0 2010-11-12 14:36 \030
$ find . -inum 918831 -exec rm {} \;
$ ls -l
total 0

The trick is to use "ls -i" to dump out the inode number associated with the file. Then we can use "find . -inum ... -exec rm {} \;" to "find" the file and remove it. Actually, the solution we received was to use "... -exec mv {} temp \;" instead of rm-- that way you can easily review the contents of the file before deciding to remove it. That's probably safer.

Besides files containing non-printable characters, there are other file names that can ruin your day. For example, having a file whose name starts with a dash can be a problem:

$ ls
-i
$ rm -i
rm: missing operand
Try `rm --help' for more information.

Whoops! The rm command is interpreting the file name as a command-line switch!

There are actually several ways of removing these kinds of files. The "find . -inum ..." trick works here, of course. Another approach is:

$ rm -- -i

For most Unix commands these days the "--" tells commands to stop processing arguments and treat everything else on the command line as a file name. But there's actually a more terse solution that doesn't require the command to support "--":

$ touch ./-i
$ rm ./-i

"./-i" means "the file called -i in the current directory", and the advantage to specifying the file name this way is that the leading "./" means that the command no longer sees the file name as a command-line switch with a leading dash.

So there you go: a walk on the wild side with some weird Unix file names. I wonder if Tim has any problem files he has to deal with on the Windows side?

Tim works with those who shall not be named:

Oh silly Hal, surely you know the tremendous problems I have...I mean with files.

The problem in Windows isn't so much with characters, as it is with certain names. Windows doesn't have wild characters, but it does have some names that shall not be spoken. The names include: CON, PRN, AUX, NUL, COM1..COM9, and LPT1...LPT9. These names date from back in the DOS days, and represented devices like the console, printer, auxiliary device, null bucket, serial port, and parallel port. Since Windows recognizes these as devices, you can't easily create files or directories with the same names. Here is what happens if you try:

C:\> mkdir con
The directory name is invalid.


And if you try to redirect output to one of these files you will see no file created.

C:\> echo "stuff" > con
C:\> dir con*
Volume in drive C has no label.
Volume Serial Number is ED15-DEAD

Directory of C:\

File Not Found


See, no file.

To create a file or directory with one of these special names, we have to prefix the path with \\?\. This prefix tells the Windows API to disable string parsing. The "\\.\" prefix is similar and will access the Win32 device namespace instead of the Win32 file namespace. This is how access to physical disks and volumes is accomplished directly, without going through the file system. (reference).

In layman's terms, use one of these options to create a directory.

C:\> mkdir \\.\c:\con
C:\> mkdir \\?\c:\con


Same goes for files:

C:\> echo "some text" > \\.\c:\con
C:\> echo "some text" > \\?\c:\con


Note, you have to use the full file path to create the file. So if you want to create a file in the system32 directory you need to do this:

C:\Windows\System32> echo "some text" > \\.\c:\Windows\System32\con


Just because you can create the file, doesn't mean it will work well. Some of the API's don't support the prefix, so don't be surprise if an app crashes when it tries to access one of these files.

As for PowerShell, well, I can't see a way to create a file or directory. It always returns an error such as this:

PS C:\> mkdir -Path \\.\c:\con


New-Item : The given path's format is not supported.
At line:38 char:24
+ $scriptCmd = {& <<<< $wrappedCmd -Type Directory @PSBoundParameters }
+ CategoryInfo : InvalidOperation: (\\.\c:\con:String) [New-Item], NotSupportedException
+ FullyQualifiedErrorId : ItemExistsNotSupportedError,Microsoft.PowerShell.Commands.NewItemCommand


If you can figure out how to do it in PowerShell (without using .NET), let me know.

Tuesday, November 9, 2010

Episode #120: Sign Me Up, I'm Enlisting in Your Army

Yes, it's your blog authors again, reminding you that you have the power to program this blog. Send us your ideas, your questions, your huddled shell fu yearning to be free. Maybe we'll turn your idea into a future Episode of Command-Line Kung Fu. Please, we're blog^H^H^H^Hbleg^H^H^H^Hbegging you!


Tim creates another army:

Another of our readers, Timothy McColgan, writes in:

Hey Kung Fu Krew,

... I starting working on a very simple batch to automate my user creation process. Here is what I came up with so far:

for /f "tokens=1-2" %%A in (names.txt) do (dsadd user "CN=%%A %%B,DC=commandlinekungfu,DC=com"
-f %%A -ln %%B -display "%%A %%B" -samid %%A,%%B -upn %%A%%B@kungfu.com -pwd P@ssw0rd


Basically names.txt has first and last names of new users, separated by a space. I wanted to add some more functionality to it, specifically the ability to add additional attributes. Say names.txt had more information in it, like first name, last name, description, employee ID, and how about a custom code in extensionAttribute1. And, how about the ability to put the users into an assigned group. So names.txt would look like this:

Tim Tekk,MIS,32159,301555,Managers


Tim started off well, all we need to do is make a few simple modifications.

C:\> for /f "delims=, " %a in (names.txt) do dsadd user "CN=%a %b,DC=clkf,DC=com"
-fn %a -ln %b -display "%a %b" -samid "%b, %a" -upn %a.%b@clkf.com -pwd P@ssw0rd
-desc %c -empid %d -memberof %f


We use our For loop to split the text using the space and comma as delimiters. From there we use the parameters of dsadd. Here are the paramters, the variables, and the expanded value.


  • UserDN is a required paremeter and doesn't use a switch. "CN=%a %b,DC=clkf,DC=com" -> "CN=Tim Tekk,DC=clkf,DC=com"

  • Firstname: -fn %a --> Tim

  • Lastname: -ln %b --> Tekk

  • Displayname: -display "%a %b" --> "Tim Tekk"
  • Security Accounts Manager (SAM) name: -samid "%b, %a" --> "Tekk, Tim"

  • User Principle Name: -upn %a.%b@clkf.com --> Tim.Tekk@clkf.com

  • Password: -pwd P@ssw0rd

  • Description: -desc %c --> MIS

  • Employee ID: -empid %d --> 32159

  • Group Membership*: -memberof %f --> Managers



*If you run the command like it is, you will get an error. The MemberOf paremeter requires a Distingushed Name, so the file would need to look like this:

Tim Tekk,MIS,32159,301555,CN=Managers,DC=clkf,DC=com


This creates a new problem, since we now have extra commas. Fortunately, we can use the tokens option with our For loop to cram "CN=Managers,DC=clkf,DC=com" into variable %f.

C:\> for /f "tokens=1-5* delims=, " %a in (names.txt) do ...


The Tokens options takes the first five tokens and put them in %a, %b, %c, %d, and %e. The * puts the remainder of the line in %f. The only thing we missed is extensionAttribute1, and we can't do that with cmd, so we have to use PowerShell.

PowerShell

To read the original file we use the cmdlet Import-CSV. The Import-CSV cmdlet requires the file have headers, and if we name our headers right we can very easily create the user using New-ADUser.

Name,GivenName,Surname,Description,SamAccountName,UserPrincipalName,EmployeeID
Tim Tekk, Tim, Tekk, MIS, "Tekk, Tim", Tim.Tekk@clkf.com, 32159


The secret to this trick is having the correct headers. We need the headers to exactly match the parameters accepted by New-ADUser.

PS C:\> Import-CSV names.txt | New-ADUser


Once catch, this approach won't work with extensionAttribute1 since the cmdlet doesn't set this option. So close!

Since that approach doesn't get us 100% of what we want, let's just create shorter header names so our file looks like this:

First,Last,Description,EmployeeID,ExtAtt1,Group
Tim,Tekk,MIS,32159,301555,"CN=Managers,DC=clkf,DC=com"


Now we can use the Import-Csv cmdlet with a ForEach-Object loop to create our users.

PS C:\> Import-CSV names.txt | % { New-ADUser
-Name "$($_.First $_.Last)"
-GivenName $_.First
-Surname $_.Last
-DisplayName "$($_.First $_.Last)"
-SamAccountName "$($_.Last, $_.First)"
-UserPrincipalName "$($_.First).$($_.Last)@clkf.com"
-Description $_.Description
-EmployeeID $_.EmployeeID
-OtherAttributes @{ extensionAttribute1=$_.ExtAtt1 } -PassThru |
Add-ADGroupMember $_.Group }


One bit of weirdness you may notice is: "$(blah blah blah)"

We have to do this to manipulate our strings. The quotes tell the shell that a string is coming. The $() says we have something that needs to be evaluated. In side the parenthesis we use the current pipeline object ($_) and the properties we would like to access. We have to use this approach with the Name parameter since we have to combine the first and last name. The Surname paramter doesn't need this weirdness because it only takes one object.

I'll admit, it is a bit weird looking, but it works great and can save a lot of time.

Hal changes the game:

I'm going to change Timothy's input format somewhat to demo a neat little feature of the Unix password file format. Let's suppose that the middle fields are an office location, office phone number, and a home or alternate contact number. So we'd have a file that looks more like:

Tim Tekk,MIS,VM-SB2,541-555-1212,541-555-5678,Managers

I'm going to assume here that "MIS" is going to be the new user's primary group and that we can have one or more additional group memberships after the alternate contact number. Here we're saying that this user is also a member of the "Managers" group.

We can process our input file with the following fearsome fu:

# IFS=,
# while read name group office wphone altphone altgroups; do
username=$(echo $name | tr ' ' .)
useradd -g $group -G "$altgroups" -c "$name,$office,$wphone,$altphone" \
-m -s /bin/bash $username;
done < names.txt

We want to break up the fields from our input file on comma, so we first set "IFS=,". Then we read our file line-by-line with the "while read ...; do ... done < names.txt" idiom and the lines will be split automatically and the fields assigned to the variables listed in after the read command. If more than one alternate group name is listed, the final $altgroups variable just shlurps up everything after the alternate phone number field.

Inside the loop, the first step is to convert the "first last" user full name into a "first.last" type username. Then it's just a matter of plugging the variables into the right places in the useradd command. Notice that I'm careful to use double quotes around "$altgroups". This is necessary in case there are multiple alternate groups separated by commas: without the double quotes the value of the variable would be expressed with spaces in place of the commas, a nasty side-effect of setting IFS earlier.

Notice that I'm packing the office location and phone numbers into the user full name field. It turns out that various Unix utilities will parse the full name field and do useful things with these extra comma-delimited values:

# finger Tim.Tekk
Login: Tim.Tekk Name: Tim Tekk
Directory: /home/Tim.Tekk Shell: /bin/bash
Office: VM-SB2, 541-555-1212 Home Phone: 541-555-5678
Never logged in.
No mail.
No Plan.

This is just one of those wacky old Unix legacy features that has been mostly forgotten about. But it was all the rage back when we were running Unix time-sharing systems with hundreds or thousands of users, because it gave you a searchable company phone directory from your shell prompt.

Tuesday, November 2, 2010

Episode #119: Spam You? Spam Fu!

Tim reads someone else's mail:

Chris Roose wrote in asking about spam forensics. He recently received a phishing email which asked him to reply with his login credentials. He assumed others in his organization had received the message as well. He wanted to do some quick forensics to determine:

  • The users who received the email

  • A count of users who received the email

  • List of delivery attempts

  • The users who replied to the email



The phishing email was sent from admin@mynet.com with a reply address of support@evil.com. (Names changed to protect someone).

Let's assume Chris is using Exchange 2003, and the log file is formatted like this:

# Message Tracking Log File
# Exchange System Attendant Version 6.5.7638.1
# Date<tab> Time<tab> <More Headers...>
Tab Delimited Data


The best way to import this file is going to be with Import-Csv, but we have extra headers that the cmdlet doesn't like. So let's first "fix" the file so we can use our nice cmdlet. We need to remove the first two lines and the # on the third line. Realistically, this is most easily done by hand, but we can do it from the command line as well.

PS C:\> (Get-Content sample.log -totalcount 3 | select -Skip 2).Substring(2) | Out-File sample2.log
PS C:\> Get-Content sample.log | select -Skip 3 | Out-File sample2.log -Append


The first command gets these first three lines and skips the first two, which leaves us with the header row. We then use the SubString overload that allows us to skip the first two characters of the line. The output is then piped into our sample file. The next command just takes the remainder of the file (all but the first three lines) and appends it to the file. Our file now looks like this:

Date<tab>  Time<tab>  <More Headers...>
Tab Delimited Data


Now we can use the Import-Csv Command.

PS C:\> Import-Csv sample2.log -Delimiter "`t"

Date : 10/23/2010
Time : 0:0:30 GMT
client-ip : 10.0.3.133
Client-hostname : mail.mynet.com
...


The Import-Csv cmdlet turns each delimited row into an object. The default delimiter is the comma, so we need to specify the tab delimiter using the backtick t. Parameter names, such as Delimiterm can be shorted as long as it isn't ambiguous. We need at least three letters to differentiate it from the Debug parameter. We can shorten the name of the Delimiter parameter to Del, but that looks like delete, so I use Deli. Plus, it reminds me of sandwiches, and I like sandwiches.

Now that everything is an object, we can easily do the searching using the Where-Object cmdlet (alias ?).

PS C:\> Import-Csv sample2.log -Deli "`t" | ? { $_."Sender-Address" -eq 'admin@mynet.com' } | Select Recipient-Address

Recipient-Address
-----------------
juser@mynet.com
...


Nothing we haven't done before. The only little tweek is in the Where-Object scriptblock. We need wrap the property name in quotes if it contains a special character, such as a space or a dash. We have our list of users, now to get a count.

PS C:\> Import-Csv sample2.log -Deli "`t" | ? { $_."Sender-Address" -eq 'admin@mynet.com' } | Measure-Object

Count : 116
Average :
Sum :
Maximum :
Minimum :
Property :


Simply piping the results in into the Measure-Object cmdlet (alias measure) gives us a count.

Now lets get a list of all the delivery attempts export and export it to a csv. Chris specifically asked for a tab delimited file so here it goes.

PS C:\> Import-Csv sample2.log -Delimiter "`t" | ? { $_."Sender-Address" -eq 'admin@mynet.com' } |
Export-Csv PhishAttempts.csv -Deli "`t"


Now to see who replied to the message:
PS C:\> Import-Csv sample2.log -Delimiter "`t" | ? { $_."Recipient-Address" -eq 'support@evil.com' } |
Select Sender-Address


Recipient-Address
-----------------
skodo@mynet.com
...


Now for Exchange 2007 and 1010. These more recent versions of Exchange have built-in PowerShell cmdlets for accessing the transaction log. We can do the same thing as above, but in a much easier fashion.

The users who received the email:
PS C:\> Get-MessageTrackingLog -Sender admin@mynet.com | select -Expand Recipients


A count of users who received the email:
PS C:\> Get-MessageTrackingLog -Sender admin@mynet.com | select -Expand Recipients | measure


A list of delivery attempts:
PS C:\> Get-MessageTrackingLog -Sender admin@mynet.com


The users who replied to the email:
PS C:\> Get-MessageTrackingLog -Recipients support@evil.com | Select sender


In the first two examples we used the Select-Object cmdlet with the Expand option. A message may have multiple recipients, so we want to expand the Recipients object into individual recipients.

In the Exchange 2007 & 2010 world, the servers that send mail are called the Hub Transport servers. All we have to do is at a short bit of PowerShell to search all the Hub Transport servers in our environment.

PS C:\> Get-TransportServer | Get-MessageTrackingLog -Sender admin@mynet.com


The new cmdlets really make an admin's life easier. The new cmdlets are really easy to read and to use. Hal's fu may be terse, but without knowing the data being parsed it isn't easy to know what is going on.

Hal reads someone else's work:

The thing that I loved about Chris' email to us was that he sent us the Unix command lines to parse the Exchange log files that he was dealing with. That's my kind of fu-- when life throws you a Windows problem, you can always make life simpler by transferring the data to a Unix system and processing it there!

As Tim has already discussed, the Exchange logs are tab-delimited. The important fields for our problem are the recipient address in field #8 and the sender address in field #20. With that in mind, here's Chris' solution for finding the users who received the evil spam:

awk -F "\t" '$20 == "admin@example.com" {print $8}' *.log | sort -udf

The '-F "\t"' tells awk to split on tabs only, instead of any whitespace. We look for the malicious sender address in field #20 and print out the recipient addresses from field #8. The sort options are "-d" to sort on alphanumeric characters only, "-f" to ignore case ("-i" was already taken by the "ignore non-printing characters" option), and "-u" to only output the unique addresses.

The awk to figure out who replied to the malicious email is nearly identical:

awk -F "\t" '$8 == "support@xyz.com" {print $20}' *.log | sort -udf

This time we're looking for the malicious recipient address in field #8 and outputting the unsuspecting senders from field #20.

Getting a count of the total number of malicious messages received or responses sent is just a matter of adding "| wc -l" to either of the above command lines. Getting a tab-delimited date-stamped list of delivery attempts is just a matter of including some additional fields in the output, specifically the date and time which are the first and second fields in the logs. For example, here's how to get a list of the inbound emails with time and date stamps:

awk 'BEGIN { FS = "\t"; OFS = "\t" } 
$20 == "admin@example.com" { print $1, $2, $8 }' *.log

Since Chris wants the output to be tab-delimited, he uses a BEGIN block to set OFS (the "Output Field Separator") to be tab before the input gets processed. The "print $1, $2, $8" statement means print the specified fields with the OFS character between them and terminated by ORS (the "Output Record Separator"), which is newline by default. Since Chris has got to use a BEGIN block anyway, he also sets FS (the "Field Separator") to tab, which is the same as the '-F "\t"' in the previous examples.

But Chris' question set me to thinking about how I'd pull this information out of mail logs that were generated by a Unix Mail Transfer Agent like Sendmail. Sendmail logs are tricky because each message generates at least two lines of log output-- one with information about the sender and one with information about the recipient. For example, here's the log from a mail message I recently sent to my co-author:

Oct 31 19:27:33 newwinkle sendmail[31202]: oA10RVAv031202: from=<hal@deer-run.com>,
size=1088, class=0, nrcpts=2, msgid=<20101101002733.GB15307@deer-run.com>, proto=ESMTP,
daemon=MTA, relay=newwinkle.deer-run.com [67.18.149.10] (may be forged)
Oct 31 19:27:36 newwinkle sendmail[31207]: oA10RVAv031202: to=<timmedin@gmail.com>,
delay=00:00:04, xdelay=00:00:01, mailer=esmtp, pri=151088, relay=gmail-smtp-in.l.google.com.
[74.125.67.27], dsn=2.0.0, stat=Sent (OK 1288571256 f9si12494674yhc.86)

The way you connect the two lines together is the queue ID value, "oA10RVAv031202" in this case, that appears near the beginning of each line of log output. The tricky part is that on a busy mail server, there may actually be many intervening lines of log messages between the first line with the sender info ("from=...") and the later lines with recipient info ("to=...").

But we can do some funny awk scripting to work around these problems:

# awk '$7 == "from=<hal@deer-run.com>," {q[$6] = 1}; 
$7 ~ /^to=/ && q[$6] == 1 {print $1, $2, $3, $7}' /var/log/maillog

...
Oct 31 19:27:35 to=<suggestions@commandlinekungfu.com>,
Oct 31 19:27:36 to=<timmedin@gmail.com>,
...

Here I'm looking for any emails with my email address as the sender ('$7 == "from=<hal@deer-run.com>,"') and then making an entry in the array q[], which is indexed by the queue ID (field $6) from the log message. If I later find a recipient line ("to=") that refers to a queue ID associated with one of the emails I sent, then I output the time/date stamp (fields $1, $2, $3) and the recipient info (field $7). I could clean up the output a bit, but you could see how this idiom would allow you to find all the people who received email from a particular malicious sender address.

If we wanted to catch the people who were sending email to a particular recipient we were worried about, then the awk is a little different:

# awk '$7 ~ "from=" {q[$6] = $7}; 
$7 == "to=<timmedin@gmail.com>," {print $1, $2, $3, q[$6]}' mail.20101031

...
Oct 31 19:27:36 from=<hal@deer-run.com>,
Oct 31 19:27:47 from=<hal@deer-run.com>,
...

In this version, whenever I see a "from=" line, I save the sender address in the q[] array. Then when I match my evil recipient address, I can output the time stamp values and the stored sender information associated with the particular queue ID. Thankfully, I appear to be the only person stupid enough to send email to Tim these days.