Tuesday, May 4, 2010

Episode #93: Of Ports and Paths

Tim is sweating in Texas:

This week's episode is inspired by another one of our readers, Aaron Goad. He was working on a cool bit of fu to map out all of the executables that are listening for incoming network connections. Based on the information gathered, he hoped to create profiles for the different server types in a given environment. The data would be used to create histograms based on server types, and make it easy find one off processes that could be anything from a backup client to a netcat backdoor. He is planning on writing a paper on it for his SANS Gold Certification. Good luck Aaron. Aaron also sent us his command which was 99% of the way there, but there was one problem, which I'll explain later.

I'm in Texas this week visiting some family. It is only May, but dang it is hot. Ed is out this week on vacation, so I'm going to work overtime this week in this (literal) sweat shop. I'm hoping I'll get paid overtime too. Let's see $0 times 1.5 times...never mind.

We'll start off with the command in the classic Windows shell since that is what Aaron sent us. Here is my version of the command.

C:\> for /f "tokens=1,2,3,7 delims=: " %a in ('netstat -nao ^| find 
^"LISTENING^" ^| find /v ^"::^"') do @(for /f "tokens=1,*" %n in ('"wmic process
where processId=%d get caption,executablepath | find ".""') do @echo Protocol=%a,
IP=%b, Port=%c, PID=%d, Name=%n, Path=%o)

Protocol=TCP, IP=, Port=135, PID=776, Name=svchost.exe,
Protocol=TCP, IP=, Port=912, PID=2368, Name=vmware-authd.exe,
Path=C:\Program Files\VMware\VMware Player\vmware-authd.exe
Protocol=TCP, IP=, Port=49153, PID=892, Name=svchost.exe,
Protocol=TCP, IP=, Port=49154, PID=952, Name=svchost.exe,
Protocol=TCP, IP=, Port=49155, PID=520, Name=lsass.exe,
Protocol=TCP, IP=, Port=49157, PID=512, Name=services.exe,
I did cheat a little this week by filtering out all IPv6 addresses. All the extra colons really screw up our makeshift parser. IPv6 addresses are filtered out by removing all the lines containing "::" using the /v switch with find.

The cleaned up netstat output, which is just IPv4 listeners, is split using our For loop. Regular readers are well aware that there is no good way to parse text in the classic shell, so we have to use our good ol' For loop for this task (again). We use the delimiters colon and space to get the 1st, 2nd, 3rd, and 7th tokens which represent the protocol, local address, local port, and process id respectively.

Next, we need to use wmic to get the executable name and path. This is the part that caused the problems for Aaron. When wmic returns the properties, it sorts the properties alphabetically. The ExecutablePath property comes before the Name property. So what? Well, the path typically contains spaces which our parser uses as delimiters. There isn't a way to know how many spaces are in the path, so we don't know which variable will contain the Name property. The problem can be fixed by getting the name property first, but how? The Caption property contains the same value as the Name property and C comes before E. Problem solved. We can then use the 1st and *th tokens, where the 1st is the Caption and the *th contains the rest of the line (the Executable Path).

Now we have all the values we want:
%a tcp or udp
%b local ip
%c local port
%d is pid
%n is name
%o is executable path

With these variables we can dump them to a file or do what ever we want with them.

Tim's second shift, PowerShell

Unfortunately, PowerShell does not include a nice objectified version of netstat, so we will have to parse it ourselves. However, we do have regular expressions to help us parse.

Here is the command in PowerShell.

PS C:\> netstat -ano | 
? { $_ -match [regex]'\s+(?<Protocol>\S+)\s+(?<LocalAddress>(\[.*?\])|([0-9\.]+)):
(?<LocalPort>\d+).+LISTENING.+?(?<PID>\d+$)' } |
select @{Name="Protocol";Expression={$matches.Protocol}},
@{Name="Name";Expression={(Get-Process -id $matches.PID).Name}},
@{Name="Path";Expression={(Get-Process -id $matches.PID).Path}}

Protocol LocalAddress LocalPort Name Path
-------- ------------ --------- ---- ----
TCP 135 svchost C:\Windows\system32\svchost.exe
TCP 445 System
TCP 49154 svchost C:\Windows\system32\svchost.exe
TCP 49155 lsass C:\Windows\system32\lsass.exe
TCP 49157 services C:\Windows\system32\services.exe
TCP 139 System
TCP [::] 135 svchost C:\Windows\system32\svchost.exe
TCP [::] 445 System
TCP [::] 49152 wininit C:\Windows\system32\wininit.exe
TCP [::] 49157 services C:\Windows\system32\services.exe
TCP [::1] 49159 ccApp C:\Program Files\Common Files\...
This command looks really nasty, but it isn't too bad. It is just three portions.

netstat -ano | [regular expression] | [output cleanup]

The middle section uses a regular expression for filtering and for named groups (also called named captures or named capture groups). It will filter out lines that do not contain LISTENING so we are left with only listeners. The named capture groups will contain the protocol, local address, local port, and process id (pid). The syntax for a capture groups is (?<Name>Expression). The variable $matches contains the information for the named captures, and it can be used later in the command in our output.

Next, we then use select object and calculated properties to clean up the output into a nice object. The calculated properties, also called custom columns, are created using a hashtable. A hashtable is specified by @{ key1=value1, key2=value2, ... }. The hashtable for a calculated property uses the Name and Expression keys. Our first three custom columns are just our named captures from the regular expression. The remaining two columns require a bit more work. Inside the property expression we use Get-Process to retrieve the details for a process and then select the property we want, name and path.

It does take a little more work to get the command into a nice object, but it does make it easy to export or pipe into other commands.

So there is all the Windows fu for the week. Hal, whatcha got?

Hal is sweating a bit in Oregon too:

I have to admit at first I was feeling pretty cocky about this one. "Oh gee, I have to parse the output of several commands and produce a nice report? <sarcasm>That's really tough for us Unix folks!</sarcasm>"

The easy part was pulling the basic information together. I'm going to use my little friend "lsof -i" to dump information about network sockets on the system, using the "-n" (show IPs, not hostnames) and "-P" (show port numbers, not port names) options. A little awk fu will get us the PID, protocol, address, and port information for just the processes that are in "LISTEN" state:

# lsof -nP -i | awk '/LISTEN/ {print $2 " " $7 " " $8}'
4107 TCP *:902
4219 TCP *:8903
4219 TCP *:8902
18877 TCP
18877 TCP
18877 TCP
18877 TCP
18877 TCP
18877 TCP [::1]:953

I've edited the output here a bit in the interests of space, but I've left in a few representative entries that will turn out to be interesting in various ways.

Our first issue is splitting the port numbers from the IP addresses. As Tim points out, IPv6 addressing makes this a little more difficult than just splitting on colons. I decided to opt for a sed soltution:

# lsof -nP -i | awk '/LISTEN/ {print $2 " " $7 " " $8}' | sed -r 's/:([0-9]+)$/ \1/'
4107 TCP * 902
4219 TCP * 8903
4219 TCP * 8902
18877 TCP 53
18877 TCP 53
18877 TCP 53
18877 TCP 53
18877 TCP 953
18877 TCP [::1] 953

Here my sed expression is matching the last "colon followed by some digits" at the end of the line and replacing that with a space followed by those digits. This effectively removes the colon and inserts a space. A little ugly, but I'm not working up a sweat so far.

The next trick is getting the executable path. Unfortunately, this is where everything goes pear-shaped. My little friend lsof only outputs the base name of the command, and will even truncate the command name if it exceeds 9 characters, so that's no help. But then I recalled that the /proc file system contains the information we need:

# readlink /proc/4219/exe

The /proc file system features a /proc/<pid>/exe is a symlink that points to the executable file. But guess what? This is only a feature of the Linux /proc file system. Unfortunately, other Unix operating systems (e.g. Solaris) may not have this link. So I needed to come up with something more portable.

When in doubt, dip back into the lsof bag of tricks:

# lsof -a -p 4219 -d txt
vmware-ho 4219 root txt REG 253,2 49355280 230822 /usr/lib/vmware/bin/vmware-hostd

Here I'm using lsof to dump the files related to the "text segment" ("-d txt") for PID 4219 ("-p 4219"). The "-a" option does a logical "and" of the two conditions rather than "or" which is (rather oddly, IMHO) the default for lsof.

As you can see, on my Linux system the output is a header line plus a line that describes the executable. On other Unix architectures, however, you may also get a bunch of additional lines that describe all of the shared libraries required by the executable. The good news is that the actual executable is always listed first. So the next trick is to extract the last field from the first line after the header:

# lsof -a -p 4219 -d txt | awk '/txt/ {print $NF}' | head -1

Here I'm matching on the string "txt" in the non-header lines and dumping the last field with $NF. I then use head to make sure I only get the first non-header line just in case there are multiple lines of output.

Looking good so far, but check out this interesting example:

# lsof -a -p 4107 -d txt | awk '/txt/ {print $NF}' | head -1
# lsof -a -p 4107 -d txt
vmware-au ... /usr/sbin/vmware-authdlauncher.#prelink#.gvYLje (deleted)

Here I've edited out the middle columns of output from the second command so you can more clearly see what's going on. Our hero VMware is running an executable that was subsequently deleted. Because our first command using $NF to dump out the last field delimited by whitespace, we just get the "(deleted)" bit. The work-around is to explicitly dump the 9th column (the executable path) and then the 10th column (the "deleted" marker) if it exists:

# lsof -a -p 4107 -d txt | awk '/txt/ {print $9 " " $10}' | head -1
/usr/sbin/vmware-authdlauncher.#prelink#.gvYLje (deleted)
# lsof -a -p 4219 -d txt | awk '/txt/ {print $9 " " $10}' | head -1

Whew! With me so far? We're in the home stretch now. All we have to do is take our initial lsof pipeline that outputs PID, protocol, IP, and port and combine that with our hack to recover the executable names:

# lsof -nP -i | awk '/LISTEN/ {print $2 " " $7 " " $8}' | sed -r 's/:([0-9]+)$/ \1/' | \
while read pid rest; do
echo "$rest" `lsof -a -p $pid -d txt | awk '/txt/ {print $9 " " $10}' | head -1`;

TCP * 902 /usr/sbin/vmware-authdlauncher.#prelink#.gvYLje (deleted)
TCP * 8903 /usr/lib/vmware/bin/vmware-hostd
TCP * 8902 /usr/lib/vmware/bin/vmware-hostd
TCP 53 /usr/local/depot/bind/9.6.1-P1/sbin/named
TCP 53 /usr/local/depot/bind/9.6.1-P1/sbin/named
TCP 53 /usr/local/depot/bind/9.6.1-P1/sbin/named
TCP 53 /usr/local/depot/bind/9.6.1-P1/sbin/named
TCP 953 /usr/local/depot/bind/9.6.1-P1/sbin/named
TCP [::1] 953 /usr/local/depot/bind/9.6.1-P1/sbin/named

This looks pretty fugly, but it's actually quite simple. We're using a while loop to read the output of our first lsof command line-by-line. We pull the PID out of the first field of each line and then save the rest in $rest. The only statement inside the while loop simply echoes $rest followed by the executable path name extracted by our crazy lsof concoction.

Alert readers may note that my echo statement includes quotes around $rest. Why did I do that? Well remember that in many cases in our output the IP address appears as "*". If we just did "echo $rest" without quotes around $rest, then the "*" would actually be interpolated as a shell glob and we'd end up echoing the contents of whatever directory we were in when we ran the command. This is definitely not what we want!

I can't say that I'm overly happy with the amount of code I needed to sling around to solve this week's puzzle. The Linux-specific solution that uses readlink is much cleaner, but I'll leave that one as an exercise to the reader.