Tuesday, November 2, 2010

Episode #119: Spam You? Spam Fu!

Tim reads someone else's mail:

Chris Roose wrote in asking about spam forensics. He recently received a phishing email which asked him to reply with his login credentials. He assumed others in his organization had received the message as well. He wanted to do some quick forensics to determine:

  • The users who received the email

  • A count of users who received the email

  • List of delivery attempts

  • The users who replied to the email



The phishing email was sent from admin@mynet.com with a reply address of support@evil.com. (Names changed to protect someone).

Let's assume Chris is using Exchange 2003, and the log file is formatted like this:

# Message Tracking Log File
# Exchange System Attendant Version 6.5.7638.1
# Date<tab> Time<tab> <More Headers...>
Tab Delimited Data


The best way to import this file is going to be with Import-Csv, but we have extra headers that the cmdlet doesn't like. So let's first "fix" the file so we can use our nice cmdlet. We need to remove the first two lines and the # on the third line. Realistically, this is most easily done by hand, but we can do it from the command line as well.

PS C:\> (Get-Content sample.log -totalcount 3 | select -Skip 2).Substring(2) | Out-File sample2.log
PS C:\> Get-Content sample.log | select -Skip 3 | Out-File sample2.log -Append


The first command gets these first three lines and skips the first two, which leaves us with the header row. We then use the SubString overload that allows us to skip the first two characters of the line. The output is then piped into our sample file. The next command just takes the remainder of the file (all but the first three lines) and appends it to the file. Our file now looks like this:

Date<tab>  Time<tab>  <More Headers...>
Tab Delimited Data


Now we can use the Import-Csv Command.

PS C:\> Import-Csv sample2.log -Delimiter "`t"

Date : 10/23/2010
Time : 0:0:30 GMT
client-ip : 10.0.3.133
Client-hostname : mail.mynet.com
...


The Import-Csv cmdlet turns each delimited row into an object. The default delimiter is the comma, so we need to specify the tab delimiter using the backtick t. Parameter names, such as Delimiterm can be shorted as long as it isn't ambiguous. We need at least three letters to differentiate it from the Debug parameter. We can shorten the name of the Delimiter parameter to Del, but that looks like delete, so I use Deli. Plus, it reminds me of sandwiches, and I like sandwiches.

Now that everything is an object, we can easily do the searching using the Where-Object cmdlet (alias ?).

PS C:\> Import-Csv sample2.log -Deli "`t" | ? { $_."Sender-Address" -eq 'admin@mynet.com' } | Select Recipient-Address

Recipient-Address
-----------------
juser@mynet.com
...


Nothing we haven't done before. The only little tweek is in the Where-Object scriptblock. We need wrap the property name in quotes if it contains a special character, such as a space or a dash. We have our list of users, now to get a count.

PS C:\> Import-Csv sample2.log -Deli "`t" | ? { $_."Sender-Address" -eq 'admin@mynet.com' } | Measure-Object

Count : 116
Average :
Sum :
Maximum :
Minimum :
Property :


Simply piping the results in into the Measure-Object cmdlet (alias measure) gives us a count.

Now lets get a list of all the delivery attempts export and export it to a csv. Chris specifically asked for a tab delimited file so here it goes.

PS C:\> Import-Csv sample2.log -Delimiter "`t" | ? { $_."Sender-Address" -eq 'admin@mynet.com' } |
Export-Csv PhishAttempts.csv -Deli "`t"


Now to see who replied to the message:
PS C:\> Import-Csv sample2.log -Delimiter "`t" | ? { $_."Recipient-Address" -eq 'support@evil.com' } |
Select Sender-Address


Recipient-Address
-----------------
skodo@mynet.com
...


Now for Exchange 2007 and 1010. These more recent versions of Exchange have built-in PowerShell cmdlets for accessing the transaction log. We can do the same thing as above, but in a much easier fashion.

The users who received the email:
PS C:\> Get-MessageTrackingLog -Sender admin@mynet.com | select -Expand Recipients


A count of users who received the email:
PS C:\> Get-MessageTrackingLog -Sender admin@mynet.com | select -Expand Recipients | measure


A list of delivery attempts:
PS C:\> Get-MessageTrackingLog -Sender admin@mynet.com


The users who replied to the email:
PS C:\> Get-MessageTrackingLog -Recipients support@evil.com | Select sender


In the first two examples we used the Select-Object cmdlet with the Expand option. A message may have multiple recipients, so we want to expand the Recipients object into individual recipients.

In the Exchange 2007 & 2010 world, the servers that send mail are called the Hub Transport servers. All we have to do is at a short bit of PowerShell to search all the Hub Transport servers in our environment.

PS C:\> Get-TransportServer | Get-MessageTrackingLog -Sender admin@mynet.com


The new cmdlets really make an admin's life easier. The new cmdlets are really easy to read and to use. Hal's fu may be terse, but without knowing the data being parsed it isn't easy to know what is going on.

Hal reads someone else's work:

The thing that I loved about Chris' email to us was that he sent us the Unix command lines to parse the Exchange log files that he was dealing with. That's my kind of fu-- when life throws you a Windows problem, you can always make life simpler by transferring the data to a Unix system and processing it there!

As Tim has already discussed, the Exchange logs are tab-delimited. The important fields for our problem are the recipient address in field #8 and the sender address in field #20. With that in mind, here's Chris' solution for finding the users who received the evil spam:

awk -F "\t" '$20 == "admin@example.com" {print $8}' *.log | sort -udf

The '-F "\t"' tells awk to split on tabs only, instead of any whitespace. We look for the malicious sender address in field #20 and print out the recipient addresses from field #8. The sort options are "-d" to sort on alphanumeric characters only, "-f" to ignore case ("-i" was already taken by the "ignore non-printing characters" option), and "-u" to only output the unique addresses.

The awk to figure out who replied to the malicious email is nearly identical:

awk -F "\t" '$8 == "support@xyz.com" {print $20}' *.log | sort -udf

This time we're looking for the malicious recipient address in field #8 and outputting the unsuspecting senders from field #20.

Getting a count of the total number of malicious messages received or responses sent is just a matter of adding "| wc -l" to either of the above command lines. Getting a tab-delimited date-stamped list of delivery attempts is just a matter of including some additional fields in the output, specifically the date and time which are the first and second fields in the logs. For example, here's how to get a list of the inbound emails with time and date stamps:

awk 'BEGIN { FS = "\t"; OFS = "\t" } 
$20 == "admin@example.com" { print $1, $2, $8 }' *.log

Since Chris wants the output to be tab-delimited, he uses a BEGIN block to set OFS (the "Output Field Separator") to be tab before the input gets processed. The "print $1, $2, $8" statement means print the specified fields with the OFS character between them and terminated by ORS (the "Output Record Separator"), which is newline by default. Since Chris has got to use a BEGIN block anyway, he also sets FS (the "Field Separator") to tab, which is the same as the '-F "\t"' in the previous examples.

But Chris' question set me to thinking about how I'd pull this information out of mail logs that were generated by a Unix Mail Transfer Agent like Sendmail. Sendmail logs are tricky because each message generates at least two lines of log output-- one with information about the sender and one with information about the recipient. For example, here's the log from a mail message I recently sent to my co-author:

Oct 31 19:27:33 newwinkle sendmail[31202]: oA10RVAv031202: from=<hal@deer-run.com>,
size=1088, class=0, nrcpts=2, msgid=<20101101002733.GB15307@deer-run.com>, proto=ESMTP,
daemon=MTA, relay=newwinkle.deer-run.com [67.18.149.10] (may be forged)
Oct 31 19:27:36 newwinkle sendmail[31207]: oA10RVAv031202: to=<timmedin@gmail.com>,
delay=00:00:04, xdelay=00:00:01, mailer=esmtp, pri=151088, relay=gmail-smtp-in.l.google.com.
[74.125.67.27], dsn=2.0.0, stat=Sent (OK 1288571256 f9si12494674yhc.86)

The way you connect the two lines together is the queue ID value, "oA10RVAv031202" in this case, that appears near the beginning of each line of log output. The tricky part is that on a busy mail server, there may actually be many intervening lines of log messages between the first line with the sender info ("from=...") and the later lines with recipient info ("to=...").

But we can do some funny awk scripting to work around these problems:

# awk '$7 == "from=<hal@deer-run.com>," {q[$6] = 1}; 
$7 ~ /^to=/ && q[$6] == 1 {print $1, $2, $3, $7}' /var/log/maillog

...
Oct 31 19:27:35 to=<suggestions@commandlinekungfu.com>,
Oct 31 19:27:36 to=<timmedin@gmail.com>,
...

Here I'm looking for any emails with my email address as the sender ('$7 == "from=<hal@deer-run.com>,"') and then making an entry in the array q[], which is indexed by the queue ID (field $6) from the log message. If I later find a recipient line ("to=") that refers to a queue ID associated with one of the emails I sent, then I output the time/date stamp (fields $1, $2, $3) and the recipient info (field $7). I could clean up the output a bit, but you could see how this idiom would allow you to find all the people who received email from a particular malicious sender address.

If we wanted to catch the people who were sending email to a particular recipient we were worried about, then the awk is a little different:

# awk '$7 ~ "from=" {q[$6] = $7}; 
$7 == "to=<timmedin@gmail.com>," {print $1, $2, $3, q[$6]}' mail.20101031

...
Oct 31 19:27:36 from=<hal@deer-run.com>,
Oct 31 19:27:47 from=<hal@deer-run.com>,
...

In this version, whenever I see a "from=" line, I save the sender address in the q[] array. Then when I match my evil recipient address, I can output the time stamp values and the stored sender information associated with the particular queue ID. Thankfully, I appear to be the only person stupid enough to send email to Tim these days.