Tuesday, November 30, 2010

Episode #123: Bad Connections

Hal rings up another one

Similar to last week, this week's challenge comes from Tim's friend who is mentoring a CCDC team. The mentor was interested in creating some shell fu that lets them monitor all network connections in and out of a system and get information about the executable that's handling the local side of the connection. The kind of information they're looking for is the sort of thing you'd get from the output of "ls -l": permissions, ownership, file size, MAC times, etc.

Truthfully, I got a sinking feeling when I heard this request. We already established back in Episode #93 how nasty it can be to determine the path name of an executable from the output of lsof if you want to do it in a way that's portable across a wide number of Unix-like systems. But let's adapt what we learned in Episode #93 to get the executable names we're interested in:

# for pid in $(lsof -i -t); do 
lsof -a -p $pid -d txt | awk '/txt/ {print $9}' | head -1;


In the for loop, "lsof -i -t" tells lsof to just print out the PIDs ("-t") of the processes that have active network connections ("lsof -i"). We then use the trick we developed in Episode #93 to get the binary name associated with each process ID.

Of course, you'll notice that there are multiple instances of executables like ssh and apache2, and we probably don't want to dump the same information multiple times. A little "sort -u" action will fix that right up:

# for pid in $(lsof -i -t); do 
lsof -a -p $pid -d txt | awk '/txt/ {print $9}' | head -1;
done | sort -u | xargs ls -l

-rwsr-xr-x 1 root root 1719832 2010-02-17 16:17 /opt/cisco/vpn/bin/vpnagentd
-rwxr-xr-x 1 root root 443472 2010-01-26 20:35 /sbin/dhclient3
-rwxr-xr-x 1 root root 333464 2009-10-22 12:58 /usr/bin/ssh
-rwxr-xr-x 1 root root 478768 2010-08-16 10:42 /usr/lib/apache2/mpm-worker/apache2
-rwxr-xr-x 1 root root 51496 2010-10-27 06:37 /usr/lib/firefox-3.6.12/firefox-bin
-rwxr-xr-x 1 root root 119032 2010-09-22 11:03 /usr/sbin/avahi-daemon
-rwxr-xr-x 1 root root 416304 2010-11-02 11:24 /usr/sbin/cupsd
-rwxr-xr-x 1 root root 9943440 2010-11-09 21:19 /usr/sbin/mysqld
-rwxr-xr-x 1 root root 548976 2009-12-04 11:03 /usr/sbin/ntpd
-rwxr-xr-x 1 root root 441888 2009-10-22 12:58 /usr/sbin/sshd

Once I use "sort -u" to produce the unique list of executable names, I just pop that output into xargs to get a detailed file listing about each executable.

I would say that this meets the terms of the challenge, but the output left me rather unsatisfied. I'd really like to see exactly what network connections are associated with each of the above executables. So I decided to replace xargs with another loop:

# for pid in $(lsof -i -t); do 
lsof -a -p $pid -d txt | awk '/txt/ {print $9}' | head -1;
done | sort -u |
while read exe; do
echo ===========;
ls -l $exe;
lsof -an -i -c $(basename $exe);

-rwsr-xr-x 1 root root 1719832 2010-02-17 16:17 /opt/cisco/vpn/bin/vpnagentd
vpnagentd 2314 root 12u IPv4 8634 0t0 TCP (LISTEN)
-rwxr-xr-x 1 root root 443472 2010-01-26 20:35 /sbin/dhclient3

My new while loop reads each executable path and generates a little report-- first a separator line, then the output of "ls -l", and then some lsof output. In this case we have lsof dump the network information ("-i") related to the given command name ("-c"). However, the "-c" option only wants the "basename" of the commmand and not the full path name. The "-a" option says to join the "-i" and "-c" requirements with a logical "and" and "-n" suppresses mapping IP addresses to host names.

But what's up with the dhclient3 output? Why are we not seeing anything from lsof?

# lsof -an -i -c /dhcli/
dhclient 1725 root 5w IPv4 6266 0t0 UDP *:bootpc

Broadening our search a little bit by using the "/dhcli/" syntax to do a substring match, you can now see in the lsof output that the "command name" as far as lsof is concerned is "dhclient", and not "dhclient3". It turns out that on this system, /sbin/dhclient is a symbolic link to /sbin/dhclient3, so there's a disconnect between the executable name and the name that the program was invoked with.

Well that's a bother! But I can make this work:

# for pid in $(lsof -i -t); do      
lsof -a -p $pid -d txt | awk '/txt/ {print $9,$1}' | head -1;
done | sort -u |
while read exe cmd; do
echo ==========;
ls -l $exe;
lsof -an -i -c $cmd;

-rwsr-xr-x 1 root root 1719832 2010-02-17 16:17 /opt/cisco/vpn/bin/vpnagentd
vpnagentd 2314 root 12u IPv4 8634 0t0 TCP (LISTEN)
-rwxr-xr-x 1 root root 443472 2010-01-26 20:35 /sbin/dhclient3
dhclient 1725 root 5w IPv4 6266 0t0 UDP *:bootpc

If you look carefully, the awk statement in the first loop is now outputting the executable path followed by the command name as reported by lsof ("print $9,$1"). So now all my second loop has to do is read these two values into separate variables and call ls and lsof with the appropriate arguments. This actually saves me calling out to basename, so it's more efficient anyway (and probably what I should have done in the first place).

Whew! That one was all kinds of nasty! I wonder how Tim will fare this week?

Tim set himself up:

I can tell you right now, I didn't fare well on this one. I had initially suggested this topic, but it was such a pain and was borderline scripting. I wanted to nix it, but no, Hal wanted to torture me. Merry friggin' Christmas to you too Hal.

Let's start off with cmd. We have the netstat command and we can use it to see what executable is involved in creating each connection or listening port.

C:\> netstat -bn

Active Connections

Proto Local Address Foreign Address State PID


This is nice since it gives us the IP Addresses and ports in use, as well as the name of the executable. The problem is, it doesn't give us the full path. Since we don't have the full path we either have to search for the executable (big pain) or just go off of the name. Not a great solution since two executables can have the same name but be in different directories. We need a different approach, what if we use netstat in a For loop with tasklist?

C:\> for /F "tokens=5 skip=4" %i in ('netstat -ano') do @tasklist /V /FI "PID eq %i"

Image Name PID Session Name Session# Mem Usage Status User Name CPU Time Window Title
============ ==== ============ ======== ========= ======= ============================ ======== ============
svchost.exe 1840 Console 0 5,084 K Running NT AUTHORITY\NETWORK SERVICE 0:00:01 N/A

Image Name PID Session Name Session# Mem Usage Status User Name CPU Time Window Title
============ ==== ============ ======== ========= ======= ============================ ======== ============
ma.exe 172 Console 0 3,372 K Running NT AUTHORITY\SYSTEM 0:00:10 MicroAgent

Image Name PID Session Name Session# Mem Usage Status User Name CPU Time Window Title
============ ==== ============ ======== ========= ======= ============================ ======== ============
svchost.exe 1768 Console 0 5,608 K Running NT AUTHORITY\SYSTEM 0:00:00 N/A

Image Name PID Session Name Session# Mem Usage Status User Name CPU Time Window Title
============ ==== ============ ======== ========= ======= ============================ ======== ============
svchost.exe 1840 Console 0 5,084 K Running NT AUTHORITY\NETWORK SERVICE 0:00:01 N/A

This is our good ol' For loop we've used a bunch of times. The loop takes the output of "netstat -ano", skips the four header lines and sets %i to the Process ID. We then use the Process ID with the tasklist command's filter to get the information on the process. The /V switch is used to get additional information on the process, but it still doesn't give us the full path. Bah, humbug. Cmd get's a lump of coal. Let's see if PowerShell can do anything better!

First off, PowerShell doesn't have a cmdlet that gives us a nice version of netstat. Off to a bad start already.

I use the Get-Netstat script (yep, I said script) to parse netstat output for use with PowerShell.

PS C:\> Get-Netstat | ft *

Protocol LocalAddress Localport RemoteAddress Remoteport State PID ProcessName
-------- ------------ --------- ------------- ---------- ----- --- -----------
TCP 135 0 LISTENING 1840 svchost
TCP 445 0 LISTENING 4 System
TCP 912 0 LISTENING 2420 vmware-authd
TCP 3389 0 LISTENING 1768 svchost

We can use the Add-Member cmdlet to extend this object to add a Path property.

PS C:\> Get-Netstat | % { Add-Member -InputObject $_ -MemberType NoteProperty -Name Path
-Value (Get-Process -Id $_.PID).Path -Force -PassThru }

Protocol : TCP
LocalAddress :
Localport : 912
RemoteAddress :
Remoteport : 0
PID : 2420
ProcessName : vmware-authd
Path : C:\Program Files\VMware\VMware Player\vmware-authd.exe

Protocol : TCP
LocalAddress :
Localport : 3389
RemoteAddress :
Remoteport : 0
PID : 1768
ProcessName : svchost
Path : C:\WINDOWS\system32\svchost.exe

The Add-Member cmdlet takes the current object sent down the pipeline ($_) as its input object. We then use the MemberType switch to specify a NoteProperty (static value) along with the name and value. The Force option has to be used for some silly reason, or else PowerShell complains the property already exists (which it doesn't). Finally, the PassThru switch is used to send our object down the pipeline.

We can use this approach to add as many properties as we would like. Let's add the CreationTime from the executable.

PS C:\> Get-Netstat | % { Add-Member -InputObject $_ -MemberType NoteProperty -Name Path
-Value (Get-Process -Id $_.PID).Path -Force -PassThru } | % { Add-Member -InputObject $_
-MemberType NoteProperty -Name CreationTime -Value (ls $_.Path).CreationTime -Force -PassThru }

Protocol : TCP
LocalAddress :
Localport : 912
RemoteAddress :
Remoteport : 0
PID : 2420
ProcessName : vmware-authd
Path : C:\Program Files\VMware\VMware Player\vmware-authd.exe
CreationTime : 11/11/2010 11:11:11 PM

Adding more properties makes our command significantly longer. But, it does have the added benefit of everything being an object. We can export this or do all sorts of filtering. However, this would obviously be better suited as a...<gulp>...script...

Tuesday, November 23, 2010

Episode #122: More Whacking of Moles

Tim prepares for a fight:

In my home town we have a college with a team who intends to compete in the CCDC Competition. The students are in control of a number of systems that are under attack by professional penetration testers (hackers) and the students need to defend the systems from the attackers.

The mentor of the group asked if I had any nasty little tricks to help defend the systems. I first pointed him to our Advanced Process Whack-a-Mole. I was then asked if there was a good way to baseline the system for running processes, and then kill any that aren't in that group. I said sure, but with two caveats: 1) most exploits aren't going to kick off a separate process and 2) this may have unexpected consequences. But we went a head an did it anyway to experiment. After all, college is a time to experiment isn't it?

Let's first use cmd.exe to create our baseline file.

C:\> for /f "skip=3" %i in ('tasklist') do @echo %i >> knowngood.txt

The Tasklist command lists all the running processes. The For loop is used to strip off the column headers and to give us just the name of the executable. The knowngood.txt file now contains a list of all the executables that we trust and looks like this:


Now, a little while later, we come back and check the running processes. We compare the running processes against our file to find ones we don't approve of.

C:\> for /f "skip=3" %i in ('tasklist') do @type knowngood.txt |
find "%i" > null || echo "bad process found: %i"

bad process found: calc.exe

Uh Oh, it looks like someone is doing some unauthorized math. We'll stop that, but first let's see how this command works?

The first part of our For loop parses the output of tasklist. The variable %i contains the name of the executable of a currently running process. We then need to search the file to see if %i is a good process, or a bad one.

We write out the contents of knowngood.txt and use the Find command to see if the file contains the process %i. And we don't care to see the output, so the output is redirected to NUL. The next part is a little trick using the Logical Or (||) operator.

As you probably know, if either input to our Logical Or is a true, then the result is true.

Input 1 Input 2 Result
true true true
true false true
false true true
false false false

If the first part of the command is successful, meaning we found a string in the file, then it returns true, otherwise the result is false. Also notice, if the first input is true, then we don't need to check the second input since the result is already true. Operators that operate in such a manner are known as Short-Circuit Operators, and we can use the functionality to our advantage.

If our Find command finds our process in the file, then the result is true, so there is no need to do the second portion. Only if our Find does not find a match do we execute the second portion of our command, in this case Echo.

We can upgrade the command to kill our process too.

C:\> for /f "skip=3" %i in ('tasklist') do @type knowngood.txt | find "%i" > NUL || taskkill /F /IM %i
SUCCESS: The process "calc.exe" with PID 1932 has been terminated.

Cool, we have an automated killing machine, now to do the same thing in PowerShell. We'll start off creating our known good file.

PS C:\> Get-Process | select name | Export-Csv knowngood.csv

The Export-Csv cmdlet is the best way to import and export object via PowerShell. Once we export our list we can come back later and look for any "rogue" processes.

PS C:\> Compare-Object (Import-Csv .\knowngood.csv) (Get-Process) -Property Name |
? { $_.SideIndicator -eq "=>" }

Name SideIndicator
---- -------------
calc =>

We use the Compare-Object cmdlet to compare the Known Good processes from our csv file and the results from the Get-Process command. The comparison takes place on the Name property of each. We then filter the results for objects that aren't in our csv but are returned by Get-Process. And as we can see that pesky calc is back. Silly math, let's make it stop by killing it automatically. To do that, all we need to do is pipe it into the Stop-Process cmdlet.

PS C:\> Compare-Object (Import-Csv .\knowngood.csv) (Get-Process) -Property Name |
? { $_.SideIndicator -eq "=>" } | Stop-Process

Those silly mathmeticians hackers, we have foiled them. No taking over our Windows machines.

Hal joins the throwdown

This challenge turned out to be a lot of fun because what I thought was going to be a straightforward bit of code turned out to have some unexpected wrinkles. I was sure at the outset how I wanted to solve the problem. I would set up an array variable indexed by process ID and containing the command lines of the current processes. Then I would just re-run the ps command and compare the output against the values stored in my array.

So let's first set up our array variable:

$ ps -e -o pid,cmd | tail -n +2 | while read pid cmd; do proc[$pid]=$cmd; done

Here I'm telling ps just to dump out the PID and command line columns. Then I use tail to filter off the header line from the output. Finally, my while loop reads the remaining input and makes the appropriate array variable assignments.

Seems like it should work, right? Well imagine my surprise when I tried to do a little testing by printing out the information about my current shell process:

$ echo ${proc[$$]}


Huh? I should be getting some output there. I must admit that I chased my tail on this for quite a while before I realized what was going on. The array assignments in the while loop are happening in a sub-shell! Consequently, the results of the array variable assignments are not available to my parent command shell.

But where there's a will, there's a way:

$ eval $(ps -e -o pid,cmd | tail -n +2 | 
while read pid cmd; do echo "proc[$pid]='$cmd';"; done)

$ echo ${proc[$$]}

Did you figure out the trick here? Rather than doing the array variable assignments in the while loop, the loop is actually outputting the correct shell code to do the assignment statements. Here's the actual loop output:

$ ps -e -o pid,cmd | tail -n +2 | 
while read pid cmd; do echo "proc[$pid]='$cmd';"; done


So I then take that output and process it with "eval $(...)" to force the array assignments to happen in the environment of the my parent shell. And you can see in the example output above that this sleight of hand actually works because I get meaningful output from my "echo ${proc[$$]}" command. By the way, for those of you who've never seen it before "$$" is a special variable that expands to be the PID of the current process-- my command shell in this case.

OK, we've got our array variable all loaded up. Now we need to create another loop to check the system for new processes. Actually, since we want these checks to happen over and over again, we end up with two loops:

$ while :; do 
ps -e -o pid,cmd | tail -n +2 | while read pid cmd; do
[[ "${proc[$pid]}" == "$cmd" ]] || echo $pid $cmd;
echo check done;
sleep 5;

4607 ps -e -o pid,cmd
4608 tail -n +2
4609 bash
check done
4611 ps -e -o pid,cmd
4612 tail -n +2
4613 bash
check done
4615 ps -e -o pid,cmd
4616 tail -n +2
4617 bash
check done

The outermost loop is simply an infinite loop to force the checks to happen over and over again. Inside that loop we have another loop that looks a lot like the one we used to set up our array variable in the first place. In this case however, we're comparing the current ps output against the values stored in our array. Similar to Tim's solution, I'm using short-circuit logical operators to output the information about any processes that don't match up with the stored values in our array. After the loop I throw out a little bit of output an sleep for five seconds before repeating the process all over again.

But take a look at the output. Our comparison loop is catching the ps and tail commands we're running and also the sub-shell we're spawning to process the output of those commands. These processes aren't "suspicious", so we don't want to kill them and we don't want them cluttering our output. But how to filter them out?

Well all of these processes are being spawned by our current shell. So they should have the PID of our current process as their parent process ID. We can filter on that:

$ while :; do 
ps -e -o pid,ppid,cmd | tail -n +2 | while read pid ppid cmd; do
[[ "${proc[$pid]}" == "$cmd" || "$ppid" == "$$" ]] || echo $pid $cmd;
echo check done;
sleep 5;

check done
check done
check done
4636 gcalctool
check done
4636 gcalctool
check done
4636 gcalctool
check done

So the changes here are that I'm telling the ps command to now output the PPID value in addition to the PID and command line. This means I'm now reading three variables at the top of my while loop. And the comparison operator inside the while loop gets a bit more complicated, since I want to ignore the process if it's already a known process in the proc array variable or if it's PPID is that of my command shell.

From the output you can see that all is well for the first three checks, and then the evil mathematicians fire up their calculator of doom. If you want the calculator of doom to be stopped automatically, then all you have to do is change the echo statement after the "||" in the innermost loop to be "kill -9 $pid" instead. Of course, you'd have to be running as root to be able to kill any process on the system.

Shell trickery! Death to evil mathematicians! What's not to like?

Friend of the blog Jeff Haemer wrote in with an alternate solution that uses intermediate files and some clever join trickery. Check out his blog post for more details.

Tuesday, November 16, 2010

Episode #121: Naughty Characters

Hal has friends in low places:

This week's Episode comes to us courtesy of one our loyal readers who had a bit of a misadventure with vi. The intended keyboard sequence was ":w^C<Enter>", aka "save the file, oh wait nevermind". Unfortunately, there was a bit of a fumble on the ^C and the command that actually got entered was ":w^X<Enter>", aka "save the file as '^X'". Whoops! My friend Jim always says that "experience is what you get when you don't get what you want." Our loyal reader was about to get a whole bunch of experience.

Even listing a file called ^X can be problematic. On Linux and BSD, non-printable characters are represented as a "?" in the output of ls. But on older, proprietary Unix systems like Solaris these characters will be output as-is, leading to weird output like this:

$ ls -l
total 2
-rw-r--r-- 1 hal staff 7 Nov 12 17:28

Wow, that's spectacularly unhelpful.

The GNU version of ls has the -b command switch that will display non-printable characters in octal:

$ ls -lb
total 4
-rw-r--r-- 1 hal hal 7 2010-11-12 14:18 \030

On other architectures, this trick works well:

$ ls -l | cat -v
total 2
-rw-r--r-- 1 hpomer staff 7 Nov 12 17:28 ^X

"cat -v" causes the control characters to be displayed with the "^X" notation.

Great, we can see the characters now, but how do we remove the file? This works:

$ rm $(echo -e \\030)
$ ls -l
total 0

Here we're using "echo -e" to output the literal control sequence using the octal value. We then use the output of the echo command as the argument to rm. Voila! No more file.

Our loyal reader sent in an alternate solution, which is the "classic" way of solving this problem:

$ ls -lbi
total 0
918831 -rw-r--r-- 1 hal hal 0 2010-11-12 14:36 \030
$ find . -inum 918831 -exec rm {} \;
$ ls -l
total 0

The trick is to use "ls -i" to dump out the inode number associated with the file. Then we can use "find . -inum ... -exec rm {} \;" to "find" the file and remove it. Actually, the solution we received was to use "... -exec mv {} temp \;" instead of rm-- that way you can easily review the contents of the file before deciding to remove it. That's probably safer.

Besides files containing non-printable characters, there are other file names that can ruin your day. For example, having a file whose name starts with a dash can be a problem:

$ ls
$ rm -i
rm: missing operand
Try `rm --help' for more information.

Whoops! The rm command is interpreting the file name as a command-line switch!

There are actually several ways of removing these kinds of files. The "find . -inum ..." trick works here, of course. Another approach is:

$ rm -- -i

For most Unix commands these days the "--" tells commands to stop processing arguments and treat everything else on the command line as a file name. But there's actually a more terse solution that doesn't require the command to support "--":

$ touch ./-i
$ rm ./-i

"./-i" means "the file called -i in the current directory", and the advantage to specifying the file name this way is that the leading "./" means that the command no longer sees the file name as a command-line switch with a leading dash.

So there you go: a walk on the wild side with some weird Unix file names. I wonder if Tim has any problem files he has to deal with on the Windows side?

Tim works with those who shall not be named:

Oh silly Hal, surely you know the tremendous problems I have...I mean with files.

The problem in Windows isn't so much with characters, as it is with certain names. Windows doesn't have wild characters, but it does have some names that shall not be spoken. The names include: CON, PRN, AUX, NUL, COM1..COM9, and LPT1...LPT9. These names date from back in the DOS days, and represented devices like the console, printer, auxiliary device, null bucket, serial port, and parallel port. Since Windows recognizes these as devices, you can't easily create files or directories with the same names. Here is what happens if you try:

C:\> mkdir con
The directory name is invalid.

And if you try to redirect output to one of these files you will see no file created.

C:\> echo "stuff" > con
C:\> dir con*
Volume in drive C has no label.
Volume Serial Number is ED15-DEAD

Directory of C:\

File Not Found

See, no file.

To create a file or directory with one of these special names, we have to prefix the path with \\?\. This prefix tells the Windows API to disable string parsing. The "\\.\" prefix is similar and will access the Win32 device namespace instead of the Win32 file namespace. This is how access to physical disks and volumes is accomplished directly, without going through the file system. (reference).

In layman's terms, use one of these options to create a directory.

C:\> mkdir \\.\c:\con
C:\> mkdir \\?\c:\con

Same goes for files:

C:\> echo "some text" > \\.\c:\con
C:\> echo "some text" > \\?\c:\con

Note, you have to use the full file path to create the file. So if you want to create a file in the system32 directory you need to do this:

C:\Windows\System32> echo "some text" > \\.\c:\Windows\System32\con

Just because you can create the file, doesn't mean it will work well. Some of the API's don't support the prefix, so don't be surprise if an app crashes when it tries to access one of these files.

As for PowerShell, well, I can't see a way to create a file or directory. It always returns an error such as this:

PS C:\> mkdir -Path \\.\c:\con

New-Item : The given path's format is not supported.
At line:38 char:24
+ $scriptCmd = {& <<<< $wrappedCmd -Type Directory @PSBoundParameters }
+ CategoryInfo : InvalidOperation: (\\.\c:\con:String) [New-Item], NotSupportedException
+ FullyQualifiedErrorId : ItemExistsNotSupportedError,Microsoft.PowerShell.Commands.NewItemCommand

If you can figure out how to do it in PowerShell (without using .NET), let me know.

Tuesday, November 9, 2010

Episode #120: Sign Me Up, I'm Enlisting in Your Army

Yes, it's your blog authors again, reminding you that you have the power to program this blog. Send us your ideas, your questions, your huddled shell fu yearning to be free. Maybe we'll turn your idea into a future Episode of Command-Line Kung Fu. Please, we're blog^H^H^H^Hbleg^H^H^H^Hbegging you!

Tim creates another army:

Another of our readers, Timothy McColgan, writes in:

Hey Kung Fu Krew,

... I starting working on a very simple batch to automate my user creation process. Here is what I came up with so far:

for /f "tokens=1-2" %%A in (names.txt) do (dsadd user "CN=%%A %%B,DC=commandlinekungfu,DC=com"
-f %%A -ln %%B -display "%%A %%B" -samid %%A,%%B -upn -pwd P@ssw0rd

Basically names.txt has first and last names of new users, separated by a space. I wanted to add some more functionality to it, specifically the ability to add additional attributes. Say names.txt had more information in it, like first name, last name, description, employee ID, and how about a custom code in extensionAttribute1. And, how about the ability to put the users into an assigned group. So names.txt would look like this:

Tim Tekk,MIS,32159,301555,Managers

Tim started off well, all we need to do is make a few simple modifications.

C:\> for /f "delims=, " %a in (names.txt) do dsadd user "CN=%a %b,DC=clkf,DC=com"
-fn %a -ln %b -display "%a %b" -samid "%b, %a" -upn -pwd P@ssw0rd
-desc %c -empid %d -memberof %f

We use our For loop to split the text using the space and comma as delimiters. From there we use the parameters of dsadd. Here are the paramters, the variables, and the expanded value.

  • UserDN is a required paremeter and doesn't use a switch. "CN=%a %b,DC=clkf,DC=com" -> "CN=Tim Tekk,DC=clkf,DC=com"

  • Firstname: -fn %a --> Tim

  • Lastname: -ln %b --> Tekk

  • Displayname: -display "%a %b" --> "Tim Tekk"
  • Security Accounts Manager (SAM) name: -samid "%b, %a" --> "Tekk, Tim"

  • User Principle Name: -upn -->

  • Password: -pwd P@ssw0rd

  • Description: -desc %c --> MIS

  • Employee ID: -empid %d --> 32159

  • Group Membership*: -memberof %f --> Managers

*If you run the command like it is, you will get an error. The MemberOf paremeter requires a Distingushed Name, so the file would need to look like this:

Tim Tekk,MIS,32159,301555,CN=Managers,DC=clkf,DC=com

This creates a new problem, since we now have extra commas. Fortunately, we can use the tokens option with our For loop to cram "CN=Managers,DC=clkf,DC=com" into variable %f.

C:\> for /f "tokens=1-5* delims=, " %a in (names.txt) do ...

The Tokens options takes the first five tokens and put them in %a, %b, %c, %d, and %e. The * puts the remainder of the line in %f. The only thing we missed is extensionAttribute1, and we can't do that with cmd, so we have to use PowerShell.


To read the original file we use the cmdlet Import-CSV. The Import-CSV cmdlet requires the file have headers, and if we name our headers right we can very easily create the user using New-ADUser.

Tim Tekk, Tim, Tekk, MIS, "Tekk, Tim",, 32159

The secret to this trick is having the correct headers. We need the headers to exactly match the parameters accepted by New-ADUser.

PS C:\> Import-CSV names.txt | New-ADUser

Once catch, this approach won't work with extensionAttribute1 since the cmdlet doesn't set this option. So close!

Since that approach doesn't get us 100% of what we want, let's just create shorter header names so our file looks like this:


Now we can use the Import-Csv cmdlet with a ForEach-Object loop to create our users.

PS C:\> Import-CSV names.txt | % { New-ADUser
-Name "$($_.First $_.Last)"
-GivenName $_.First
-Surname $_.Last
-DisplayName "$($_.First $_.Last)"
-SamAccountName "$($_.Last, $_.First)"
-UserPrincipalName "$($_.First).$($_.Last)"
-Description $_.Description
-EmployeeID $_.EmployeeID
-OtherAttributes @{ extensionAttribute1=$_.ExtAtt1 } -PassThru |
Add-ADGroupMember $_.Group }

One bit of weirdness you may notice is: "$(blah blah blah)"

We have to do this to manipulate our strings. The quotes tell the shell that a string is coming. The $() says we have something that needs to be evaluated. In side the parenthesis we use the current pipeline object ($_) and the properties we would like to access. We have to use this approach with the Name parameter since we have to combine the first and last name. The Surname paramter doesn't need this weirdness because it only takes one object.

I'll admit, it is a bit weird looking, but it works great and can save a lot of time.

Hal changes the game:

I'm going to change Timothy's input format somewhat to demo a neat little feature of the Unix password file format. Let's suppose that the middle fields are an office location, office phone number, and a home or alternate contact number. So we'd have a file that looks more like:

Tim Tekk,MIS,VM-SB2,541-555-1212,541-555-5678,Managers

I'm going to assume here that "MIS" is going to be the new user's primary group and that we can have one or more additional group memberships after the alternate contact number. Here we're saying that this user is also a member of the "Managers" group.

We can process our input file with the following fearsome fu:

# IFS=,
# while read name group office wphone altphone altgroups; do
username=$(echo $name | tr ' ' .)
useradd -g $group -G "$altgroups" -c "$name,$office,$wphone,$altphone" \
-m -s /bin/bash $username;
done < names.txt

We want to break up the fields from our input file on comma, so we first set "IFS=,". Then we read our file line-by-line with the "while read ...; do ... done < names.txt" idiom and the lines will be split automatically and the fields assigned to the variables listed in after the read command. If more than one alternate group name is listed, the final $altgroups variable just shlurps up everything after the alternate phone number field.

Inside the loop, the first step is to convert the "first last" user full name into a "first.last" type username. Then it's just a matter of plugging the variables into the right places in the useradd command. Notice that I'm careful to use double quotes around "$altgroups". This is necessary in case there are multiple alternate groups separated by commas: without the double quotes the value of the variable would be expressed with spaces in place of the commas, a nasty side-effect of setting IFS earlier.

Notice that I'm packing the office location and phone numbers into the user full name field. It turns out that various Unix utilities will parse the full name field and do useful things with these extra comma-delimited values:

# finger Tim.Tekk
Login: Tim.Tekk Name: Tim Tekk
Directory: /home/Tim.Tekk Shell: /bin/bash
Office: VM-SB2, 541-555-1212 Home Phone: 541-555-5678
Never logged in.
No mail.
No Plan.

This is just one of those wacky old Unix legacy features that has been mostly forgotten about. But it was all the rage back when we were running Unix time-sharing systems with hundreds or thousands of users, because it gave you a searchable company phone directory from your shell prompt.

Tuesday, November 2, 2010

Episode #119: Spam You? Spam Fu!

Tim reads someone else's mail:

Chris Roose wrote in asking about spam forensics. He recently received a phishing email which asked him to reply with his login credentials. He assumed others in his organization had received the message as well. He wanted to do some quick forensics to determine:

  • The users who received the email

  • A count of users who received the email

  • List of delivery attempts

  • The users who replied to the email

The phishing email was sent from with a reply address of (Names changed to protect someone).

Let's assume Chris is using Exchange 2003, and the log file is formatted like this:

# Message Tracking Log File
# Exchange System Attendant Version 6.5.7638.1
# Date<tab> Time<tab> <More Headers...>
Tab Delimited Data

The best way to import this file is going to be with Import-Csv, but we have extra headers that the cmdlet doesn't like. So let's first "fix" the file so we can use our nice cmdlet. We need to remove the first two lines and the # on the third line. Realistically, this is most easily done by hand, but we can do it from the command line as well.

PS C:\> (Get-Content sample.log -totalcount 3 | select -Skip 2).Substring(2) | Out-File sample2.log
PS C:\> Get-Content sample.log | select -Skip 3 | Out-File sample2.log -Append

The first command gets these first three lines and skips the first two, which leaves us with the header row. We then use the SubString overload that allows us to skip the first two characters of the line. The output is then piped into our sample file. The next command just takes the remainder of the file (all but the first three lines) and appends it to the file. Our file now looks like this:

Date<tab>  Time<tab>  <More Headers...>
Tab Delimited Data

Now we can use the Import-Csv Command.

PS C:\> Import-Csv sample2.log -Delimiter "`t"

Date : 10/23/2010
Time : 0:0:30 GMT
client-ip :
Client-hostname :

The Import-Csv cmdlet turns each delimited row into an object. The default delimiter is the comma, so we need to specify the tab delimiter using the backtick t. Parameter names, such as Delimiterm can be shorted as long as it isn't ambiguous. We need at least three letters to differentiate it from the Debug parameter. We can shorten the name of the Delimiter parameter to Del, but that looks like delete, so I use Deli. Plus, it reminds me of sandwiches, and I like sandwiches.

Now that everything is an object, we can easily do the searching using the Where-Object cmdlet (alias ?).

PS C:\> Import-Csv sample2.log -Deli "`t" | ? { $_."Sender-Address" -eq '' } | Select Recipient-Address


Nothing we haven't done before. The only little tweek is in the Where-Object scriptblock. We need wrap the property name in quotes if it contains a special character, such as a space or a dash. We have our list of users, now to get a count.

PS C:\> Import-Csv sample2.log -Deli "`t" | ? { $_."Sender-Address" -eq '' } | Measure-Object

Count : 116
Average :
Sum :
Maximum :
Minimum :
Property :

Simply piping the results in into the Measure-Object cmdlet (alias measure) gives us a count.

Now lets get a list of all the delivery attempts export and export it to a csv. Chris specifically asked for a tab delimited file so here it goes.

PS C:\> Import-Csv sample2.log -Delimiter "`t" | ? { $_."Sender-Address" -eq '' } |
Export-Csv PhishAttempts.csv -Deli "`t"

Now to see who replied to the message:
PS C:\> Import-Csv sample2.log -Delimiter "`t" | ? { $_."Recipient-Address" -eq '' } |
Select Sender-Address


Now for Exchange 2007 and 1010. These more recent versions of Exchange have built-in PowerShell cmdlets for accessing the transaction log. We can do the same thing as above, but in a much easier fashion.

The users who received the email:
PS C:\> Get-MessageTrackingLog -Sender | select -Expand Recipients

A count of users who received the email:
PS C:\> Get-MessageTrackingLog -Sender | select -Expand Recipients | measure

A list of delivery attempts:
PS C:\> Get-MessageTrackingLog -Sender

The users who replied to the email:
PS C:\> Get-MessageTrackingLog -Recipients | Select sender

In the first two examples we used the Select-Object cmdlet with the Expand option. A message may have multiple recipients, so we want to expand the Recipients object into individual recipients.

In the Exchange 2007 & 2010 world, the servers that send mail are called the Hub Transport servers. All we have to do is at a short bit of PowerShell to search all the Hub Transport servers in our environment.

PS C:\> Get-TransportServer | Get-MessageTrackingLog -Sender

The new cmdlets really make an admin's life easier. The new cmdlets are really easy to read and to use. Hal's fu may be terse, but without knowing the data being parsed it isn't easy to know what is going on.

Hal reads someone else's work:

The thing that I loved about Chris' email to us was that he sent us the Unix command lines to parse the Exchange log files that he was dealing with. That's my kind of fu-- when life throws you a Windows problem, you can always make life simpler by transferring the data to a Unix system and processing it there!

As Tim has already discussed, the Exchange logs are tab-delimited. The important fields for our problem are the recipient address in field #8 and the sender address in field #20. With that in mind, here's Chris' solution for finding the users who received the evil spam:

awk -F "\t" '$20 == "" {print $8}' *.log | sort -udf

The '-F "\t"' tells awk to split on tabs only, instead of any whitespace. We look for the malicious sender address in field #20 and print out the recipient addresses from field #8. The sort options are "-d" to sort on alphanumeric characters only, "-f" to ignore case ("-i" was already taken by the "ignore non-printing characters" option), and "-u" to only output the unique addresses.

The awk to figure out who replied to the malicious email is nearly identical:

awk -F "\t" '$8 == "" {print $20}' *.log | sort -udf

This time we're looking for the malicious recipient address in field #8 and outputting the unsuspecting senders from field #20.

Getting a count of the total number of malicious messages received or responses sent is just a matter of adding "| wc -l" to either of the above command lines. Getting a tab-delimited date-stamped list of delivery attempts is just a matter of including some additional fields in the output, specifically the date and time which are the first and second fields in the logs. For example, here's how to get a list of the inbound emails with time and date stamps:

awk 'BEGIN { FS = "\t"; OFS = "\t" } 
$20 == "" { print $1, $2, $8 }' *.log

Since Chris wants the output to be tab-delimited, he uses a BEGIN block to set OFS (the "Output Field Separator") to be tab before the input gets processed. The "print $1, $2, $8" statement means print the specified fields with the OFS character between them and terminated by ORS (the "Output Record Separator"), which is newline by default. Since Chris has got to use a BEGIN block anyway, he also sets FS (the "Field Separator") to tab, which is the same as the '-F "\t"' in the previous examples.

But Chris' question set me to thinking about how I'd pull this information out of mail logs that were generated by a Unix Mail Transfer Agent like Sendmail. Sendmail logs are tricky because each message generates at least two lines of log output-- one with information about the sender and one with information about the recipient. For example, here's the log from a mail message I recently sent to my co-author:

Oct 31 19:27:33 newwinkle sendmail[31202]: oA10RVAv031202: from=<>,
size=1088, class=0, nrcpts=2, msgid=<>, proto=ESMTP,
daemon=MTA, [] (may be forged)
Oct 31 19:27:36 newwinkle sendmail[31207]: oA10RVAv031202: to=<>,
delay=00:00:04, xdelay=00:00:01, mailer=esmtp, pri=151088,
[], dsn=2.0.0, stat=Sent (OK 1288571256 f9si12494674yhc.86)

The way you connect the two lines together is the queue ID value, "oA10RVAv031202" in this case, that appears near the beginning of each line of log output. The tricky part is that on a busy mail server, there may actually be many intervening lines of log messages between the first line with the sender info ("from=...") and the later lines with recipient info ("to=...").

But we can do some funny awk scripting to work around these problems:

# awk '$7 == "from=<>," {q[$6] = 1}; 
$7 ~ /^to=/ && q[$6] == 1 {print $1, $2, $3, $7}' /var/log/maillog

Oct 31 19:27:35 to=<>,
Oct 31 19:27:36 to=<>,

Here I'm looking for any emails with my email address as the sender ('$7 == "from=<>,"') and then making an entry in the array q[], which is indexed by the queue ID (field $6) from the log message. If I later find a recipient line ("to=") that refers to a queue ID associated with one of the emails I sent, then I output the time/date stamp (fields $1, $2, $3) and the recipient info (field $7). I could clean up the output a bit, but you could see how this idiom would allow you to find all the people who received email from a particular malicious sender address.

If we wanted to catch the people who were sending email to a particular recipient we were worried about, then the awk is a little different:

# awk '$7 ~ "from=" {q[$6] = $7}; 
$7 == "to=<>," {print $1, $2, $3, q[$6]}' mail.20101031

Oct 31 19:27:36 from=<>,
Oct 31 19:27:47 from=<>,

In this version, whenever I see a "from=" line, I save the sender address in the q[] array. Then when I match my evil recipient address, I can output the time stamp values and the stored sender information associated with the particular queue ID. Thankfully, I appear to be the only person stupid enough to send email to Tim these days.