Sunday, January 6, 2013

An AWK-ward Response

A couple of weeks ago I promised some answers to the exercises I proposed at the end of my last post. What we have here is a case of, "Better late than never!"

1. If you go back and look at the example where I counted the number of processes per user, you'll notice that the "UID" header from the ps command ends up being counted. How would you suppress this?

There's a couple of different ways you could attack this using the material I showed you in the previous post. One way would be to do string comparison on field $1:

$ ps -ef | awk '$1 != "UID" {print $1}' | sort | uniq -c | sort -nr
    178 root
     58 hal
      2 www-data
    ...

An alternative approach would be to use pattern matching to print lines that don't match the string "UID". The "!" operator means "not", so the expression "!/UID/" does what we want:

$ ps -ef | awk '!/UID/ {print $1}' | sort | uniq -c | sort -nr
    178 root
     57 hal
      2 www-data
    ...

You'll notice that the "!/UID/" version counts one less process for user "hal" than the string comparison version. That's because the pattern match is matching the "UID" in the awk code and not showing you that process. So the string comparison version is slightly more accurate.

2. Print out the usernames of all accounts with superuser privileges (UID is 0 in /etc/passwd).

Remember that /etc/passwd file is colon-delimited, so we'll use awk's "-F" operator to split on colons. UID is field #3 and the username is field #1:

$ awk -F: '$3 == 0 {print $1}' /etc/passwd
root

Normally, a Unix-like OS will only have a single UID 0 account named "root". If you find other UID 0 accounts in your password file, they could be a sign that somebody's doing something naughty.

3. Print out the usernames of all accounts with null password fields in /etc/shadow.

You'll need to be root to do this one, since /etc/shadow is only readable by the superuser:

# awk -F: '$2 == "" {print $1}' /etc/shadow

Again, we use "-F:" to split the fields in /etc/shadow. We look for lines where the second field (containing the password hash) is the empty string and print the first field (the username) when this condition is true. It's really not much different from the previous /etc/passwd example.

You should get no output. There shouldn't be any entries in /etc/shadow with null password hashes!

4. Print out process data for all commands being run as root by interactive users on the system (HINT: If the command is interactive, then the "TTY" column will have something other than a "?" in it)

The "TTY" column in the "ps" output is field #6 and the username field is #1:

# ps -ef | awk '$1 == "root" && $6 != "?" {print}'
root      1422     1  0 Jan05 tty4     00:00:00 /sbin/getty -8 38400 tty4
root      1427     1  0 Jan05 tty5     00:00:00 /sbin/getty -8 38400 tty5
root      1434     1  0 Jan05 tty2     00:00:00 /sbin/getty -8 38400 tty2
root      1435     1  0 Jan05 tty3     00:00:00 /sbin/getty -8 38400 tty3
root      1438     1  0 Jan05 tty6     00:00:00 /sbin/getty -8 38400 tty6
root      1614  1523  0 Jan05 tty7     00:09:00 /usr/bin/X :0 -nr -verbose -auth ... 
root      2082     1  0 Jan05 tty1     00:00:00 /sbin/getty -8 38400 tty1
root      5909  5864  0 13:42 pts/3    00:00:00 su -
root      5938  5909  0 13:42 pts/3    00:00:00 -su
root      5968  5938  0 13:47 pts/3    00:00:00 ps -ef
root      5969  5938  0 13:47 pts/3    00:00:00 awk $1 == "root" && $6 != "?" {print}

We look for the keyword "root" in the first field, and anything that's not "?" in the sixth field. If both conditions are true, then we just print out the entire line with "{print}".

Actually, "{print}" is the default action for awk. So we could shorten our code just a bit:

# ps -ef | awk '$1 == "root" && $6 != "?"'
root      1422     1  0 Jan05 tty4     00:00:00 /sbin/getty -8 38400 tty4
root      1427     1  0 Jan05 tty5     00:00:00 /sbin/getty -8 38400 tty5
root      1434     1  0 Jan05 tty2     00:00:00 /sbin/getty -8 38400 tty2
...

5. I mentioned that if you kill all the sshd processes while logged in via SSH, you'll be kicked out of the box (you killed your own sshd process) and unable to log back in (you've killed the master SSH daemon). Fix the awk so that it only prints out the PIDs of SSH daemon processes that (a) don't belong to you, and (b) aren't the master SSH daemon (HINT: The master SSH daemon is the one who's parent process ID is 1).

This one's a little tricky. Take a look at the sshd processes on my system:

# ps -ef | grep sshd
root      3394     1  0  2012 ?        00:00:00 /usr/sbin/sshd
root     13248  3394  0 Jan05 ?        00:00:00 sshd: hal [priv] 
hal      13250 13248  0 Jan05 ?        00:00:02 sshd: hal@pts/0  
root     25189  3394  0 08:27 ?        00:00:00 sshd: hal [priv] 
hal      25191 25189  0 08:27 ?        00:00:00 sshd: hal@pts/1  
root     25835 25807  0 15:33 pts/1    00:00:00 grep sshd

For modern SSH daemons with "Privilege Separation" enabled, there are actually two sshd processes per login. There's a root-owned process marked as "sshd: <user> [priv]" and a process owned by the user marked as "sshd: <user>@<tty>". Life would be a whole lot easier if both processes were identified with the associated pty, but alas things didn't work out that way. So here's what I came up with:

# ps -ef | awk '/sshd/ && !($3 == 1 || /sshd: hal[@ ]/) {print $2}'

First we eliminate all processes except for the sshd processes with "/sshd/". We only want to print out the process IDs if it's not the master SSH daemon ("$3 == 1" to make sure the PPID isn't 1) or if it's not one of my SSH daemons ("/sshd: hal[@ ]/" means the string "sshd: hal" followed by either "@" or space). If everything looks good, then print the process ID of the process ("{print $2}").

Frankly, that's some pretty nasty awk. I'm not sure it's something I'd come up with easily on the spur of the moment.

6. Use awk to parse the output of the ifconfig command and print out the IP address of the local system.

Here's the output from ifconfig on my system:

$ ifconfig eth0
eth0      Link encap:Ethernet  HWaddr f0:de:f1:29:c7:18  
          inet addr:192.168.0.14  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::f2de:f1ff:fe29:c718/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:7724312 errors:0 dropped:0 overruns:0 frame:0
          TX packets:13553720 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          RX bytes:711630936 (711.6 MB)  TX bytes:17529013051 (17.5 GB)
          Memory:f2500000-f2520000 

So this is a reasonable first approximation:

$ ifconfig eth0 | awk '/inet addr:/ {print $2}'
addr:192.168.0.14

The only problem is the "addr:" bit that's still hanging on. awk has a number of built-in functions, including substr() which can help us in this case:

$ ifconfig eth0 | awk '/inet addr:/ {print substr($2, 6)}'
192.168.0.14

substr() takes as arguments the string we're working on (field $2 in this case) and the place in the string where you want to start (for us, that's the sixth character so we skip over the "addr:"). There's an optional third argument which is the number of characters to grab. If you leave that off, then you just get the rest of the string, which is what we want here.

There are lots of other useful built-in functions in awk. Consult the manual page for further info.

7. Parse the output of "lsof -nPi" and output the unique process name, PID, user ID, and port combinations for all processes that are in "LISTEN" mode on ports on the system.

Let's take a look at the "lsof -nPi" output using awk to match only the lines for "LISTEN" mode:

# lsof -nPi | awk '/LISTEN/'
sshd      1216     root    3u  IPv4   5264      0t0  TCP *:22 (LISTEN)
sshd      1216     root    4u  IPv6   5266      0t0  TCP *:22 (LISTEN)
mysqld    1610    mysql   10u  IPv4   6146      0t0  TCP 127.0.0.1:3306 (LISTEN)
vmware-au 1804     root    8u  IPv4   6440      0t0  TCP *:902 (LISTEN)
cupsd     1879     root    6u  IPv6  73057      0t0  TCP [::1]:631 (LISTEN)
cupsd     1879     root    8u  IPv4  73058      0t0  TCP 127.0.0.1:631 (LISTEN)
apache2   1964     root    4u  IPv4   7412      0t0  TCP *:80 (LISTEN)
apache2   1964     root    5u  IPv4   7414      0t0  TCP *:443 (LISTEN)
apache2   4112 www-data    4u  IPv4   7412      0t0  TCP *:80 (LISTEN)
apache2   4112 www-data    5u  IPv4   7414      0t0  TCP *:443 (LISTEN)
apache2   4113 www-data    4u  IPv4   7412      0t0  TCP *:80 (LISTEN)
apache2   4113 www-data    5u  IPv4   7414      0t0  TCP *:443 (LISTEN)
skype     5133      hal   41u  IPv4 104783      0t0  TCP *:6553 (LISTEN)

Process name, PID, and process owner are fields 1-3 and the protocol and port are in fields 8-9. So that suggests the following awk:

# lsof -nPi | awk '/LISTEN/ {print $1, $2, $3, $8, $9}'
sshd 1216 root TCP *:22
sshd 1216 root TCP *:22
mysqld 1610 mysql TCP 127.0.0.1:3306
vmware-au 1804 root TCP *:902
cupsd 1879 root TCP [::1]:631
cupsd 1879 root TCP 127.0.0.1:631
apache2 1964 root TCP *:80
apache2 1964 root TCP *:443
apache2 4112 www-data TCP *:80
apache2 4112 www-data TCP *:443
apache2 4113 www-data TCP *:80
apache2 4113 www-data TCP *:443
skype 5133 hal TCP *:6553

And if we want the unique entries, then just use "sort -u":

# lsof -nPi | awk '/LISTEN/ {print $1, $2, $3, $8, $9}' | sort -u
apache2 1964 root TCP *:443
apache2 1964 root TCP *:80
apache2 4112 www-data TCP *:443
apache2 4112 www-data TCP *:80
apache2 4113 www-data TCP *:443
apache2 4113 www-data TCP *:80
cupsd 1879 root TCP 127.0.0.1:631
cupsd 1879 root TCP [::1]:631
mysqld 1610 mysql TCP 127.0.0.1:3306
skype 5133 hal TCP *:6553
sshd 1216 root TCP *:22
vmware-au 1804 root TCP *:902

Looking at the output, I'm not sure I care about all of the different apache2 instances. All I really want to know is which program is using port 80/tcp and 443/tcp. So perhaps we should just drop the PID and process owner:

# lsof -nPi | awk '/LISTEN/ {print $1, $8, $9}' | sort -u
apache2 TCP *:443
apache2 TCP *:80
cupsd TCP 127.0.0.1:631
cupsd TCP [::1]:631
mysqld TCP 127.0.0.1:3306
skype TCP *:6553
sshd TCP *:22
vmware-au TCP *:902

In the above output you see cupsd bound to both the IPv4 and IPv6 loopback address. If you just care about the port numbers, we can flash a little sed to clean things up:

# lsof -nPi | awk '/LISTEN/ {print $1, $8, $9}' | \
    sed 's/[^ ]*:\([0-9]*\)/\1/' | sort -u -n -k3
sshd TCP 22
apache2 TCP 80
apache2 TCP 443
cupsd TCP 631
vmware-au TCP 902
mysqld TCP 3306
skype TCP 6553

In the sed expression I'm matching "some non-space characters followed by a colon" ("[^ ]*:") with some digits afterwards ("[0-9]*"). The digits are the port number, so we replace the matching expression with just the port number. Notice I used "\(...\)" around the "[0-9]*" to create a sub-expression that I can substitute on the right-hand side as "\1".

I've also modified the final "sort" command so that we get a numeric ("-n") sort on the port number ("-k3" for the third column). That makes the output look more natural to me.

I guess the moral of the story here is that awk is good for many things, but not necessarily for everything. Don't forget that there are other standard commands like sed and sort that can help produce the output that you're looking for.

Happy awk-ing everyone!