Pages

Tuesday, December 31, 2013

Episode #173: Tis the Season

Hal finds some cheer
From somewhere near the borders of scriptistan, we send you:
function t { 
    for ((i=0; $i < $1; i++)); do
        s=$((8-$i)); e=$((8+$i));
        for ((j=0; j <= $e; j++)); do [ $j -ge $s ] && echo -n '^' || echo -n ' '; done;
        echo;
    done
}
function T {
    for ((i=0; $i < $1; i++)); do
        for ((j=0; j < 10; j++)); do [ $j -ge 7 ] && echo -n '|' || echo -n ' '; done;
        echo;
    done
    echo
}
t 3; t 5; t 7; T 2; echo -e "Season's Greetings\n    from CLKF"


Ed comes in out of the cold:

Gosh, I missed you guys.  It's nice to be home with my CLKF family for the holidays.  I brought you a present:

c:\>cmd.exe /v:on /c "echo. & echo A Christmas present for you: & color 24 & 
echo. & echo     0x0& for /L %a in (1,1,11) do @(for /L %b in (1,1,10) do @ set /a
%b%2) & echo 1"& echo. & echo Merry Christmas!

Tim awaits the new year:

Happy New Year from within the borders of Scriptistan!

Function Draw-Circle {
    Param( $Radius, $XCenter, $YCenter )
    
    for ($x = -$Radius; $x -le $Radius ; $x++) {
        $y = [int]([math]::sqrt($Radius * $Radius - $x * $x))
        Set-CursorLocation -X ($XCenter + $x) -Y ($YCenter + $y)
        Write-Host "*" -ForegroundColor Blue -NoNewline
        Set-CursorLocation -X ($XCenter + $x) -Y ($YCenter - $y)
        Write-Host "*" -ForegroundColor Blue -NoNewline
    }
}

Function Draw-Hat {
    Param( $XCenter, $YTop, $Height, $Width, $BrimWidth )
    
    $left = Round($XCenter - ($Width / 2))
    $row = "#" * $Width
    for ($y = $YTop; $y -lt $YTop + $Height - 1; $y++) {
        Set-CursorLocation -X $left -Y $y
        Write-Host $row -ForegroundColor Black -NoNewline
    }
    
    Set-CursorLocation -X ($left - $BrimWidth) -Y ($YTop + $Height - 1)
    $row = "#" * ($Width + 2 * $BrimWidth)
    Write-Host $row -ForegroundColor Black -NoNewline
}

Function Set-CursorLocation {
    Param ( $x, $y )

    $pos = $Host.UI.RawUI.CursorPosition
    $pos.X = $x
    $pos.Y = $y
    $Host.UI.RawUI.CursorPosition = $pos
}

Function Round {
    Param ( $int )
    # Stupid banker's rounding
    return [Math]::Round( $int, [MidpointRounding]'AwayFromZero' )
}

Clear-Host
Write-Host "Happy New Year!"
Draw-Circle -Radius 4 -XCenter 10 -YCenter 8
Draw-Circle -Radius 5 -XCenter 10 -YCenter 17
Draw-Circle -Radius 7 -XCenter 10 -YCenter 29
Draw-Hat -XCenter 10 -YTop 2 -Height 5 -Width 7 -BrimWidth 2
Set-CursorLocation -X 0 -Y 38

Tuesday, November 26, 2013

Episode #172: Who said bigger is better?

Tim sweats the small stuff

Ted S. writes in:

"I have a number of batch scripts which turn a given input file into a configurable amount of versions, all of which will contain identical data content, but none of which, ideally, contain the same byte content. My problem is, how do I, using *only* XP+ cmd (no other scripting - PowerShell, jsh, wsh, &c), replace the original (optionally backed up) with the smallest of the myriad versions produced by the previous batch runs?"

This is pretty straight forward, but it depends on what we want to do with the files. I assumed that the larger files should be deleted since they are redundant. This will leave us with only the smallest file in the directory. Let's start off by listing all the files in the current directory and sort them by size.

C:\> dir /A-D /OS /b
file3.txt
file2.txt
file1.txt
file4.txt

Sorting the files, and only files, in the current directory by size is pretty easy. The "/A" option filters on the object's properties and directories are filtered out with "-D". Next, the "/O" option is used to sort and the "S" tells the command to sort putting the smallest files first. Finally, the "/b" is used to show the bare format.

At this point we have the files in the proper order and in a nice light format. We can now use a For loop to delete everything while skipping the first file.

C:\> for /F "tokens=* skip=1" %i in ('dir /A-D /OS /b') do @del %i

Here is the same functionality in PowerShell:

PS C:\> Get-ChildItem | Where-Object { -not $_.PSIsContainer } | Sort-Object -Property Length | Select-Object -Skip 1 | Remove-Item

This is mostly readable. The only exception is the "PSIsContainer". Directories are container objects but files are not, so we filter out the containers (directories). Here is the same command shortented using aliases and positional parameters:

PS C:\> ls | ? { !$_.PSIsContainer } | sort Length | select -skip 1 | rm

There you go Ted, and in PowerShell even though you didn't want it. Here comes Hal brining something even smaller you don't want.

Hal's is smaller than Tim's... but less sweaty

Tim, how many times do I have to tell you, smaller is better when it comes to command lines:

ls -Sr | tail -n +2 | xargs rm

It's actually not that different from Tim's PowerShell solution, except that my "ls" command has "-S" to sort by size as a built-in. We use the "-r" flag to reverse the sort, putting the smallest file first and skipping it with "tail -n +2".

If you're worried about spaces in the file names, we could tart this one up a bit more:

ls -Sr | tail -n +2 | tr \\n \\000 | xargs -0 rm

After I use "tail" to get rid of the first, smallest file, I use "tr" to convert the newlines to nulls. That allows me to use the "-0" flag to "xargs" to split the input on nulls, and preserves the spaces in the input file names.

What may be more interesting about this Episode is the command line I used to create and re-create my files for testing. First I made a text file with lines like this:

1 3
2 4
3 1
4 2

And then I whipped up a little loop action around the "dd" command:

$ while read file size; do 
      dd if=/dev/zero bs=4K count=$size of=file$file; 
  done <../input.txt 
3+0 records in
3+0 records out
12288 bytes (12 kB) copied, 6.1259e-05 s, 201 MB/s
4+0 records in
4+0 records out
16384 bytes (16 kB) copied, 0.000144856 s, 113 MB/s
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 3.4961e-05 s, 117 MB/s
2+0 records in
2+0 records out
8192 bytes (8.2 kB) copied, 4.3726e-05 s, 187 MB/s

Then I just had to re-run the loop whenever I wanted to re-create my test files after deleting them.

Tuesday, October 8, 2013

Episode #171: Flexibly Finding Firewall Phrases

Old Tim answers an old email

Patrick Hoerter writes in:
I have a large firewall configuration file that I am working with. It comes from that vendor that likes to prepend each product they sell with the same "well defended" name. Each configuration item inside it is multiple lines starting with "edit" and ending with "next". I'm trying to extract only the configuration items that are in some way tied to a specific port, in this case "port10".

Sample Data:

edit "port10"
        set vdom "root"
        set ip 192.168.1.54 255.255.255.248
        set allowaccess ping
        set type physical
        set sample-rate 400
        set description "Other Firewall"
        set alias "fw-outside"
        set sflow-sampler enable
   next
edit "192.168.0.0"
        set subnet 192.168.0.0 255.255.0.0
    next
    edit "10.0.0.0"
        set subnet 10.0.0.0 255.0.0.0
    next
    edit "172.16.0.0"
        set subnet 172.16.0.0 255.240.0.0
    next
  edit "vpn-CandC-1"
        set associated-interface "port10"
        set subnet 10.254.153.0 255.255.255.0
    next
    edit "vpn-CandC-2"
        set associated-interface "port10"
        set subnet 10.254.154.0 255.255.255.0
    next
    edit "vpn-CandC-3"
        set associated-interface "port10"
        set subnet 10.254.155.0 255.255.255.0
    next
   edit 92
        set srcintf "port10"
        set dstintf "port1"
            set srcaddr "vpn-CandC-1" "vpn-CandC-2" "vpn-CandC-3"            
            set dstaddr "all"            
        set action accept
        set schedule "always"
            set service "ANY"            
        set logtraffic enable
    next
 

Sample Results:

edit "port10"
        set vdom "root"
        set ip 192.168.1.54 255.255.255.248
        set allowaccess ping
        set type physical
        set sample-rate 400
        set description "Other Firewall"
        set alias "fw-outside"
        set sflow-sampler enable
   next
  edit "vpn-CandC-1"
        set associated-interface "port10"
        set subnet 10.254.153.0 255.255.255.0
    next
    edit "vpn-CandC-2"
        set associated-interface "port10"
        set subnet 10.254.154.0 255.255.255.0
    next
    edit "vpn-CandC-3"
        set associated-interface "port10"
        set subnet 10.254.155.0 255.255.255.0
    next
   edit 92
        set srcintf "port10"
        set dstintf "port1"
            set srcaddr "vpn-CandC-1" "vpn-CandC-2" "vpn-CandC-3"            
            set dstaddr "all"            
        set action accept
        set schedule "always"
            set service "ANY"            
        set logtraffic enable
    next

Patrick gave us the full text and the expected output. In short, he wants the text between "edit" and "next" if it contains the text "port10". To begin this task we need to first need get each of the edit/next chunks.

PS C:\> ((cat fw.txt) -join "`n") | select-string "(?s)edit.*?next" -AllMatches | 
 select -ExpandProperty matches

This command will read the entire file fw.txt and combine it into one string. Normally, each line is treated as a separate object, but we are going to join them into a big string using the newline (`n) to join each line. Now that the text is one big string we can use Select-String with a regular expression to find all the matches. The regular expression will find text across line breaks and allows for very flexible searches so we can find our edit/next chunks. Here is a break down of the pieces of the regular expression:

  • (?s) - Use single line mode where the dot (.) will match any character, including a newline character. This allows us to match text across multiple lines.
  • edit - the literal text "edit"
  • .*? - find any text, but be lazy, not greedy. This means it should match the smallest chunks that will match the criteria.
  • next - literal text next

Now that we have the chunks we use a Where-Object filter (alias ?) to find matching objects to pass down the pipeline.

PS C:\> ((cat .\fw.txt) -join "`n") | select-string "(?s)edit.*?next" -AllMatches | 
 select -ExpandProperty matches | ? { $_.Captures | Select-String "port10" }

Inside the Where-Object filter we can check the Value property to see if it contains the text "port10". The Value property is piped into Select-String to look for the text "port10", and if it contains "port10" it continues down the pipeline, if not, it is dropped.

At this point, we have the objects we want, so all we need to do is display the results by expanding the Value and displaying it again. The expansion means that it just displays the text and no data or metadata associated with the parent object. Here is what the final command looks like.

PS C:\> ((cat .\fw.txt) -join "`n") | select-string "(?s)edit.*?next" -AllMatches | 
 select -ExpandProperty matches | ? { $_.Value | Select-String "port10" } | 
 select -ExpandProperty Value

Not so bad, but I have a feeling it is going to be worse for my friend Hal.

Old Hal uses some old tricks

Oh sure, I know what Tim's thinking here. "It's multi-line matching, and the Unix shell is lousy at that. Hal's in trouble now. Mwhahaha. The Command-Line Kung Fu title will finally be mine! Mine! Do you hear me?!? MINE!"

Uh-huh. Well how about this, old friend:

awk -v RS=next -v ORS=next '/port10/' fw.txt

While we're doing multi-line matching here, the blocks of text have nice regular delimiters. That means I can change the awk "record separator" ("RS") from newline to the string "next" and gobble up entire chunks at a time.

After that, it's smooth sailing. I just use awk's pattern-matching operator to match the "port10" strings. Since I don't have an action defined, "{print}" is assumed and we output the matching blocks of text.

The only tricky part is that I have to remember to change the "output record separator" ("ORS") to be "next". Otherwise, awk will use its default ORS value, which is newline. That would give me output like:

$ awk -v RS=next '/port10/' fw.txt
edit "port10"
        set vdom "root"
        set ip 192.168.1.54 255.255.255.248
        set allowaccess ping
        set type physical
        set sample-rate 400
        set description "Other Firewall"
        set alias "fw-outside"
        set sflow-sampler enable
   

  edit "vpn-CandC-1"
        set associated-interface "port10"
        set subnet 10.254.153.0 255.255.255.0
    

    edit "vpn-CandC-2"
        set associated-interface "port10"
...

The "next" terminators get left out and we get extra lines in the output. But when ORS is set properly, we get exactly what we were after:

$ awk -v RS=next -v ORS=next '/port10/' fw.txt
edit "port10"
        set vdom "root"
        set ip 192.168.1.54 255.255.255.248
        set allowaccess ping
        set type physical
        set sample-rate 400
        set description "Other Firewall"
        set alias "fw-outside"
        set sflow-sampler enable
   next
  edit "vpn-CandC-1"
        set associated-interface "port10"
        set subnet 10.254.153.0 255.255.255.0
    next
    edit "vpn-CandC-2"
        set associated-interface "port10"
...

So that wasn't bad at all. Sorry about that Tim. Maybe next time, old buddy.

Friday, September 27, 2013

Episode #170: Fearless Forensic File Fu

Hal receives a cry for help

Fellow forensicator Craig was in a bit of a quandary. He had a forensic image in "split raw" format-- a complete forensic image broken up into small pieces. Unfortunately for him, the pieces were named "fileaa", "fileab", "fileac", and so on while his preferred tool wanted the files to be named "file.001", "file.002", "file.003", etc. Craig wanted to know if there was an easy way to rename the files, using either Linux or the Windows shell.

This one's not too hard in Linux, and in fact it's a lot like something we did way back in Episode #26:

c=1; 
for f in file*; do 
    printf -v ext %03d $(( c++ )); 
    mv $f ${f/%[a-z][a-z]/.$ext}; 
done

You could remove the newlines and make that one big long line, but I think it's a bit easier to read this way. First we initialize a counter variable $c to 1. Then we loop over each of the files in our split raw image.

The printf statement inside the loop formats $c as three digits, with however many leading zeroes are necessary ("%03d"). There are a couple of tricky bits in the printf though. First is we're assigning the output of printf to a variable $ext ("-v ext"). Second, we're doing a little arithmetic on $c at the same time and using the "++" operator to increment the value of $c each time through the loop-- that's the "$(( c++ ))" part.

Then we use mv to rename our file. I'm using the variable substitution operator like we did in Episode #26. The format again is "${var/pattern/substitution}" and here the "%" after the first slash means "match at the end of the string". So I'm replacing the last two letters in the file name with a dot followed by our $ext value. And that's exactly what Craig wanted!

All of the symbols in this solution make it appear like a little chunk of line noise, but it's nowhere near as ugly as Ed's CMD.EXE solution in Episode #26. Here's hoping Tim's Powershell solution is a bit more elegant.

Tim finishes before September ends!

Elegance where here we come!

Long Version:
PS C:\> $i=1; Get-ChildItem file?? | Sort-Object -Propery Name | 
  ForeEach-Object { MoveItem -Path $_ -Destination ("file.{0:D3}" -f $i++) }
Shortened Version:
PS C:\> ls file?? | sort name | % { move $_ -dest ("file.{0:D3}" -f $i++) }

We start off by initializing our counter variable ($i) to 1 just like Hal did. Next, we list all the files that start with "file" and are followed by exactly two characters (each ? matches exactly 1 character of any kind). The results are then sorted by the file name to ensure that the files are renamed in the correct order. The results are then fed into the ForEach-Object cmdlet (alias %).

The ForEach-Object loop will operate on each object (file) as it moves down the pipeline. One at a time, each file will be represented by the current pipeline object ($_). The Move-Item cmdlet (alias move) is used to rename a file; to move it to its new name. The source path is provided by the current object and the destination is determined using the format operator (-f) and our counter ($i). The format operator will print $i as a three digit number prefixed with leading zeros and "file.". The ++ after $i will increment the counter after it has been used.

That is much cleaner than Ed's example...and even cleaner than Hal's to boot!

Update:

Reader m_cnd writes in with a solution for CMD. vm

C:\> for /F "tokens=1,2 delims=:" %d in ('dir /on /b file* ^| 
findstr /n "file"') do for /F %x in ('set ext^=00%d^&^& 
cmd /v:on /c "echo !ext:~-3!"') do rename %e file.%x
Nice work!

Tuesday, August 6, 2013

Episode #169: Move Me Maybe

Tim checks the mailbag

Carlos IHaveNoLastName writes in asking for a way to move a directory to a new destination. That's easy, but the directory should only be moved if the the directory (at any depth) does NOT contain a file with a specific extenstion.

Here is an example of a sample directory structure:

SomeTopDir1
|-OtherDir1
|  |-File1
|  |-File2
|  |-File2
|-OtherDir2
   |-File1
   |-File.inprogress

SomeTopDir2
|-OtherDir1
|  |-File1
|  |-File2
|  |-File2
|-OtherDir2
   |-File1
   |-File2

In this example we should NOT move SomeTopDir1 because it contains a file with the string "inprogress". We should however move SomeTopDir2 because it contains no such file. In short, "inprogress" means leave it alone.

Executing this in PowerShell is quite easy. CMD is a pain, and I'll skip that crazy long command because it is a circus trick. Here is the command to do exactly what Carlos asked:

PS C:\jobsdir> Get-ChildItem | ? { $_.PSIsContainer } | 
    ? { -not ( Get-ChildItem $_ -Recurse -Filter *.inprogress ) } | 
    Move-Item -Destination \archive
 

This command with use Get-ChildItem to list the contents of the current directory. We first filter for Directories (Container objects) just in case there are files in the root of the directory that we don't want to move. Next, another Where-Object cmdlet (alias ?) is used to check all the sub-directories and look for a file matching "*.inprogress". The -Not operator inverts the match so that only directores with a "*.inprogress" file will be passed down the pipeline.

At this point we have the directories that do not contain this file. The results are then piped into Move-Item and the directories are moved to the \archive directory.

One of the other criteria that Mr. IHaveNoLastName requested is that the command must work on XP. Well it does, but only if you install PowerShell. Sadly, XP does not support PowerShell v3. With PowerShell v3's simplified syntax (and some additional aliases) we can shorten the command to this:

PS C:\jobsdir> ls | ? PSIsContainer | ? { -not ( ls $_ -r -fi *.inprogress ) } | 
    mv -d \archive
 

Thanks for an easy one Carlos! Hal, your turn. I suspect this is will be almost as easy for you (even though it won't work on XP).

Hal takes it easy

This one's quite do-able in the shell. But unlike Tim's solution, the most straightforward approach in Linux is a loop:

for i in *; do [ "$(find $i -type f -name \*.inprogress)" ] || mv $i /some/dest; done

The loop is over all of the directories in the current directory. Inside the loop we run a find command looking for "*.inprogress" files. If we find any, then the test operator ("[ ... ]") returns true and we don't do the mv command on the other side of the "||". If we find nothing, then the directory gets moved. Easy peasy

"But wait!", I hear you cry, "That was too easy. And besides, you're running a mv command for each individual directory!"

OK, fine. You want a single mv command? Here you go:

mv $(ls | grep -vf <(find * -type f -name \*.inprogress | cut -f1 -d/)) /some/dest

Happy now?

The best way to puzzle this one out is to start with the command in the innermost parentheses:

find * -type f -name \*.inprogress | cut -f1 -d/

The find command returns the pathnames of all of the *.inprogress files, and the cut command pulls off the top-level directory name. If there are multiple *.inprogress files in a single directory, we'll get multiple instances of the top-level directory name, but that doesn't really matter.

The "<( ... )" syntax takes the output of our find pipeline and lets it be treated as an input file for another command:

ls | grep -vf <( ... )

We take the output of ls and use "grep -v" to filter out directories we don't want. Normally "grep -f" takes a list of patterns from an input file, but in this case we use the "<( ... )" syntax to substitute our find output instead of a normal input file. So we suppress any directories that have a *.inprogress file in them. Anything left over is a directory without a *.inprogress file, which is precisely the set of directories we want to move.

So we wrap the complicated ls pipline up in "$(...)" so that the output-- the list of directories we want to move-- is substituted into the "mv $(...) /some/dest" command. And that gets us to where we want to be.

Or you could use the same idea, but with xargs:

ls | grep -vf <(find * -type f -name \*.inprogress | cut -f1 -d/) | xargs mv -d /some/dest

This looks a lot more like Tim's approach in Powershell. However, this makes use of the "mv -d /some/dest ..." syntax that's supported in the GNU version of the command, but not widely supported in other more traditional Unix distros.

Oh, and by the way, Tim, this all works fine under Windows XP if you'd just install Cygwin like I've been telling you to...

Update:

m_cnd wrote in again with a shortcut for CMD.EXE:

dir SomeTopDir /s /b | findstr /i /e ".extension" > nul || move SomeTopDir Destination

I use this trick all the time, so I feel bad that I missed it here. With his shortcut I can put together a For loop (our favorite, and only, text parser) to do the work.

C:\> for /F "tokens=*" %i in ('dir /b /AD') do dir "%i" /s /b /a-d "%i\*.extension" 1>nul 2>nul || move "%i" Destination

The Dir command with /AD will list directories and not files. We can then use the output to search and move if necessary. The Tokens and quotes are used in case the directory names contain spaces.

Thanks again m_cnd!

Tuesday, July 2, 2013

Episode #168: Scan On, You Crazy Command Line

Hal gets back to our roots

With one ear carefully tuned to cries of desperation from the Internet, it's no wonder I picked up on this plea from David Nides on Twitter:

Whenever I see a request to scan for files based on a certain criteria and then copy them someplace else, I immediately think of the "find ... | cpio -pd ..." trick I've used in several other Episodes.

Happily, "find" has "-mtime", "-atime", and "-ctime" options we can use for identifying the files. But they all want their arguments to be in terms of number of days. So I need to calculate the number of days between today and the end of 2012. Let's do that via a little command-line kung fu, shall we? That will make this more fun.

$ days=$(( ($(date +%Y) - 2012)*365 + $(date +%j | sed 's/^0*//') ))
$ echo $days
447

Whoa nelly! What just happened there? Well, I'm doing math with the bash "$(( ... ))" operator and assigning the result to a variable called "days" so I can use it later. But what's all that line noise in the middle?

  • "date +%Y" returns the current year. That's inside "$( ... )" so I can use the value in my calculations.
  • I subtract 2012 from the current year to get the number of years since 2012 and multiply that by 365. Screw you, leap years!
  • "date +%j" returns the current day of the year, a value from 001-365.
  • Unfortunately the shell interprets values with leading zeroes as octal and errors out on values like "008" and "097". So I use a little sed to strip the leading zeroes.

Hey, I said it would be fun, not that it would necessarily be a good idea!

But now that I've got my "$days" value, the answer to David's original request couldn't be easier:

$ find /some/dir -mtime +$days -atime +$days -ctime +$days | cpio -pd /new/dir

The "find" command locates files whose MAC times are all greater than our "$days" value-- that's what the "+$days" syntax means. After that, it's just a matter of passing the found files off to "cpio". Calculating "$days" was the hard part.

My final solution was short enough that I tweeted it back to David. Which took me all the way back to the early days of Command-Line Kung Fu, when Ed Skoudis had hair would tweet cute little CMD.EXE hacks that he could barely fit into 140 characters. And I would respond with bash code that would barely line wrap. Ah, those were the days!

Of course, Tim was still in diapers then. But he's come so far, that precocious little rascal! Let's see what he has for us this time!

Tim gets an easy one!

Holy Guacamole! This is FINALLY an easy one! Robocopy makes this super easy *and* it plays well with leap years. I feel like it is my birthday and I can finally get out of these diapers.

PS C:\> robocopy \some\dir \new\dir /MINLAD (Get-Date).DayOfYear /MINAGE (Get-Date).DayOfYear /MOV

Simply specify the source and destination directories and use /MOV to move the files. MINLAD will ignore files that have been accessed in the past X days (LAD = Last Access Date), and MINAGE does the same based on the creation date. All we need is the number of days since the beginning of the year. Fortunately, getting that number is super easy in PowerShell (I have no pity for Hal).

All Date objects have the property DayOfYear which is (surprise, surprise) the number of days since the beginning of the year (Get-Member will show all the available properties and methods of an object). All we need is the current date, which we get Get-Date.

DONE! That's all folks! You can go home now. I know you expected a long complicated command, but we don't have one here. However, if you feel that you need to read more you can go back and read the episodes where we cover some other options available with robocopy.

This command is so easy, simple, and short I could even fit it into a tweet!

Tuesday, June 18, 2013

Episode #167: Big MAC

Hal checks into Twitter:

So there I was, browsing my Twitter timeline and a friend forwarded a link to Jeremy Ashkenas' github site. Jeremy created an alias for changing your MAC address to a random value. This is useful when you're on a public WiFi network that only gives you a small amount of free minutes. Since most of these services keep track by noting your MAC address, as long as you keep cycling you MAC, you can keep using the network for free.

Here's the core of Jeremy's alias:

sudo ifconfig en0 ether `openssl rand -hex 6 | sed "s/\(..\)/\1:/g; s/.$//"`

Note that the syntax of the ifconfig command varies a great deal between various OS versions. On my Linux machine, the syntax would be "sudo ifconfig wlan0 hw ether..."-- you need "hw ether" after the interface name and not just "ether".

Anyway, this seemed like a lot of code just to generate a random MAC address. Besides, what if you didn't have the openssl command installed on your Linux box? So I decided to try and figure out how to generate a random MAC address in fewer characters and using commonly built-in tools.

What does a MAC address look like? It's six pairs of digits with colons between. "Pairs of digits with colons between" immediately made me think of time values. And this works:

$ date +00:11:22:%T
00:11:22:11:23:08

Just print three pairs of fixed digits followed by "hh:mm:ss". I originally tried "date +%T:%T". But in my testing, the ifconfig command didn't always like the fake MAC addresses that were generated this way. So specifying the first few octets was the way to go.

The only problem is that this address really isn't all that random. If there were a lot of people on the same WiFi network all using this trick, MAC address collisions could happen pretty easily. Though if everybody chose their own personal sequence for the first three octets, you could make this a lot less likely.

The Linux date command lets you output a nine-digit nanoseconds value with "%N". I could combine that with a few leading digits to generate a pseudo-random sequence of 12 digits:

$ date +000%N
000801073504

But now we need to use the sed expression in Jeremy's original alias to put the colons in. Or do we?

$ sudo ifconfig wlan0 hw ether $(date +000%N)
$ ifconfig wlan0
wlan0     Link encap:Ethernet  HWaddr 00:02:80:12:43:53  
...

I admit that I was a little shocked when I tried this and it actually worked! I can't guarantee that it will work across all Unix-like operating systems, but it allows me to come up with a much shorter bit of fu compared to Jeremy's solution.

What if you were on a system that didn't have openssl installed and didn't have a date command that had nanosecond resolution? If your system has a /dev/urandom device (and most do) you could use the trick we used way back in Episode #85:

$ sudo ifconfig wlan0 hw ether 00$(head /dev/urandom | tr -dc a-f0-9 | cut -c1-10)
$ ifconfig wlan0
wlan0     Link encap:Ethernet  HWaddr 00:7a:5f:be:a2:ca
...

Again I'm using two literal zeroes at the front of the MAC address, so that I create addresses that don't cause ifconfig to error out on me.

The expression above is not very short, but at least it uses basic commands that will be available on pretty much any Unix-like OS. If your ifconfig needs colons between the octets, then you'll have to add a little sed like Jeremy did:

$ sudo ifconfig wlan0 hw ether \
    00$(head /dev/urandom | tr -dc a-f0-9 | sed 's/\(..\)/:\1/g;' | cut -c1-15)
$ ifconfig wlan0
wlan0     Link encap:Ethernet  HWaddr 00:d9:3e:0d:80:57  
...

Jeremy's sed is more complicated because he takes 12 digits and adds colons after each octet, but leaves a trailing colon at the end of the address. So he has a second substitution to drop the trailing colon. I'm using cut to trim off the extra output anyway, so I don't really need the extra sed substitution. Also, since I'm specifying the first octet outside of the "$(...)", my sed expression puts the colons in front of each octet.

So there you have it. There's a very short solution for my Linux box that has a date command with nanosecond resolution and a very forgiving ifconfig command. And a longer solution that should work on pretty much any Unix-like OS. But even my longest solution is surely going to look great compared to what Tim's going to have to deal with.

Tim wishes he hadn't checked into Twitter:

I'm so jealous of Hal. I think his entire command is shorter than the name of my interface. This command is painful, quite painful. I would very much suggest something like Technitium's Mac Address Changer, but since Hal set me up here we go...

To start of, we need to get the name of our target interface. Sadly, the names of the interfaces aren't as simply named as they are on a *nix box. Not only is the name 11 times longer, but it is not easy to type. If you run "ipconfig /all" you can find the name and copy/paste it. (By the way, I'm only going to use PowerShell here, the CMD.EXE version would be ugly^2).

PS C:\> $ifname = "Intel(R) 82574L Gigabit Network Connection"

The MAC address for each interface is stored somewhere in the registry under this even-less-easy-to-type Key:
HKLM:\SYSTEM\CurrentControlSet\Control\Class\{4D36E972-E325-11CE-BFC1-08002bE10318}\[Some 4 digit number]\

First, a bit of clarification. Many people (erroneously) refer to Keys as the name/value pairs, but those pairs are actually called Values. A key is the container object (similar to a directory). How about that for a little piece of trivia?

With PowerShell we can use Get-ChildItem (alias dir, ls, gci) to list all the keys and then Get-ItemProperty (alias gp) to list the DriverDesc values. A simple Where-Object filter (alias where, ?) will find the key we need.

PS C:\> Get-ChildItem HKLM:\SYSTEM\CurrentControlSet\Control\Class\`{4D36E972-E325-
 11CE-BFC1-08002bE10318`}\[0-9]*\ | Get-ItemProperty -Name DriverDesc | 
 ? DriverDesc -eq "Intel(R) 82574L Gigabit Network Connection"
DriverDesc   : Intel(R) 82574L Gigabit Network Connection
PSPath       : Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\SY...0318}\0010
PSParentPath : Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\SY...0318}
PSChildName  : 0010
PSProvider   : Microsoft.PowerShell.Core\Registry

Note: the curly braces ({}) need to be prefixed with a back tick (`) so they are not interpreted as a script block.

So now we have the Key for our target network interface. Next, we need to generate a random MAC address. Fortunately, Windows does not requires the use of colons (or dots) in the MAC address. This is nice as it makes our command a little easier to read (a very very little, but we'll take any win we can). The acceptable values are between 000000000000 and fffffffffffe (ffffffffffff is the broadcast address and should be avoided). This is the range between 0 and 2^48-2 ([Math]::Pow(2,8*6)-2 = 281474976710654). The random number is then formatted as a 12 digit hex number.

PS C:\> [String]::Format("{0:x12}", (Get-Random -Minimum 0 -Maximum 281474976710655))
16db434bed4e
PS C:\> [String]::Format("{0:x12}", (Get-Random -Minimum 0 -Maximum 281474976710655))
a31bfae1296d

We have a random MAC address value and we know the Key, now we need to put those two pieces together to actually change the MAC address. The New-ItemProperty cmdlet will create the value if it doesn't exist and the -Force option will overwrite it if it already exists. This results in the final version of our ugly command. We could shorten the command a little (very little) bit, but this is the way it's mother loves it, so we'll leave it alone.

PS C:\> ls HKLM:\SYSTEM\CurrentControlSet\Control\Class\`{4D36E972-E325-11CE-BFC1-
 08002bE10318`}\0*\ | Get-ItemProperty -Name DriverDesc | ? DriverDesc -eq 
 "Intel(R) 82574L Gigabit Network Connection" | New-ItemProperty -Name 
 NetworkAddress -Value ([String]::Format("{0:x12}", (Get-Random -Minimum 0 
 -Maximum 281474976710655))) -PropertyType String -Force

You would think that after all of this mess we would be good to go, but you would be wrong. As with most things Windows, you could reboot the system to have this take affect, but that's no fun. We can accomplish the same goal by disabling and enabling the connection. This syntax isn't too bad, but we need to use a different long name here.

PS C:\> netsh set interface name="Wired Ethernet Connection" admin=DISABLED
PS C:\> netsh set interface name="Wired Ethernet Connection" admin=ENABLED

At this point you should be running with the new MAC address.

And now you can see why I recommend a better tool to do this...and why I envy Hal.

EDIT:
Andres Elliku wrote in and reminded me of the new NetAdapter cmdlets in version 3. Here is his response.

This is directed mainly to Tim as a suggestion to decrease his pain. :) (Tim's comment: for this I'm thankful!)

Powershell has included at least since version 2.0 the NetAdapter module. This means that in Powershell you could set the mac aadress with something like:

PS C:\> Set-NetAdapter -Name "Wi-Fi" -MacAddress ([String]::Format("{0:x12}", 
(Get-Random -Minimum 0 -Maximum 281474976710655))) | Restart-NetAdapter

NB! The adapter name might vary, but usually they are still pretty short.

The shorter interface names is one of my favorite features of Windows 8 and Windows 2012. Also, with these cmdlets we don't need the name if the device (Intel blah blah blah) but the newly shortened interface name. Great stuff Andres. Thanks for writing in! -Tim

EDIT 2:

@PowerShellGuy tweeted an even shorted version using the format operator and built-in byte conversion:

PS C:\> Set-NetAdapter "wi-fi" -mac ("{0:x12}" -f (get-random -max (256tb-1))) | 
Restart-NetAdapter

Well done for really shortening the command -Tim

Tuesday, March 12, 2013

Episode #166: Ping A Little Log For Me

We've been away for a while because, frankly, we ran out of material. In the meantime we tried to come up with some new ideas and there have had a few requests, but sadly they were all redundant, became scripts, or both. We've been looking long and hard for Fu that works in this format, and we've finally found it!

Nathan Sweaney wrote in with a great idea! It isn't a script, it isn't redundant, and it is quite useful. Three of the four of the criteria that makes a great episode (the fourth being beer fetching or beer opening). To top it off Nathan wrote the CMD.EXE portion himself. Thanks Nathan!

--Tim

Nathan Sweaney writes in:

Ping Network Monitor

Occasionally we have issues in the field where we think a customer's device is occasionally losing a connection, but we're not sure if, or when, or for how long. We need a log of when the connection is dropping so that we can compare to the customer's reports of issues. Sure there are fancy network monitoring tools that can help, but we're in a hurry with no budget.

In Linux this would be easy, but these are Windows boxen. So I hacked together the following one-liner for our techs to use in the field.

This command will ping an IP address once every second and when it doesn't get a response, it will log the time-stamp in a text file. Then we can compare those time-stamps to failure reports from the customer.

To use it, simply change the IP address 8.8.8.8 near the beginning to whatever IP we need to monitor. Then open a command prompt, CD into the directory you want the log file created, and run the command.

C:\> cmd.exe /v:on /c "FOR /L %i in (1,0,2) do @ping -n 1 8.8.8.8 |
find "Request timed out">NUL && (echo !date! !time! >> PingFail.txt) &
ping -n 2 127.0.0.1>NUL"

So let's dissect this. It's mostly just a combination of examples Ed has mentioned in the past.

First, we're using the "cmd.exe /v:on /c" command to allow for delayed environment variable expansion. Ed has explained in the past why that lets us do flexible variable parsing. This command wraps everything else.

The next layer of our onion is an infinite "FOR /L" loop that Ed mentioned WAY back. We're counting from 1 to 2 in steps of 0 so that our command will continue running until we manually stop it.

Inside of our FOR loop is where we really get to the meat. We've basically got 4 steps:

1) First we see @ping -n 1 8.8.8.8. The @ symbol says to hide the echo of the command to the screen. The switch (-n 1) says to only ping the IP once. And of course 8.8.8.8 is the address we want to ping.

2) Next we pipe the results of our ping into the FIND command and search for "Request timed out" to see if the ping failed. The last part of that >NUL says to dump the output from this command into NUL, because we don't really need to see it.

3) Now we get fancy. The && says to only run this command if the previous command succeeded. In other words, if our FIND command finds the text, which means our ping failed, then we run this command. And we've enclosed this command in parenthesis contain it as a single command. We need to use the "cmd.exe /v:on /c" command at the beginning to allow for delayed environment variable expansion; that way our time & date changes each iteration. So %date% and %time% becomes !date! and !time!.

And finally we're redirecting our output to a file called PingFail.txt. We use the >> operator append each new entry rather than overwrite with just >.

4) And finally we're on to the last step. As mentioned before, the & says to run the next command no matter what has already happened. This command simply pings localhost with (-n 2) which will give us a one-second delay. The first ping happens immediately, and the second ping happens after one second. This slows down our original ping back in step 1 which would otherwise fire off like a machine gun as fast as the FOR loop can go. Lastly, we're redirecting the output with >NUL because we don't care to see it.

WOW. I said it was convoluted. But it works, and it's rather simple to use.

Tim finds a letter in the mail slot:

Wow, it has been a while since we've dusted off the ol' kung fu for a blog post. I've missed it and I know Hal as too. In fact, he hasn't showered since our last episode. True story. This was his silent (but deadly) protest against our lack of ideas and usable suggestions. The Northwest can breathe a sigh of relief (in the now fresher air) now that we are back for this episode. I for one, missed the blog. Back to the Fu...

Nathan wrote in with his idea to log Ping failures. What a great idea for a quick and dirty network monitor. Thanks to CMD.EXE he's got a bit of funkyness to his command. Fortunately, we can be a little smoother with our approach.

PS C:\> for (;;) { Start-Sleep 2; ping -n 1 8.8.8.8 >$null; if(-not $?) { Get-Date -Format s | Tee-
Object mylog.txt -Append }}
2013-03-12T12:34:56

We start off with an infinite loop using the For loop but without any loop control structures. Without these structures there is nothing to limit the loop, and it will run forever...it will be UNSTOPPABLE! MWAAAAAHAHAHAHAHA! <cough> <cough> Sorry about that, it's been a while.

Inside our infinite loop we sleep for a few seconds. We could do it at the end, but for some reason I get inconsistent results when I do that. I have no idea why, and I've tried troubleshooting it for hours. That's OK, a pre-command nap never killed anyone.

After our brief nap, we do the ping. The results are sent into the garbage can that is the $NULL variable. Following this command we check the error state of the previous command by checking the value of $?. This variable is True if the previous command ran without error, if there was an error the the command is False. The If Statement is used to branch our logic based on this value. If it is False, the ping failed, and we need to log the error.

Inside our branch we get the current date with Get-Date (duh!) and change the format to the sortable format. We could use any format, but the OCD part of me likes this format. The formatted date is piped into the Tee-Object command which will append the date to a file as well as output to our console.

Notice we used the For loop here instead of a While loop. I did this to save single character. We can save a few more characters by using this command using aliases, shortened parameter names, and a little magic in our For loop.

PS C:\> for (;;sleep 2) {ping -n 1 8.8.8.8 >$null; if(-not $?) { date -f s | tee mylog.txt -a }}

I moved the Start-Sleep (alias sleep) cmdlet inside the For loop control. The For loops looks like this:

for ( variable initialization; condition; variable update ) {
  Code to execute while the condition is true
}

The variable initialization is run once before our loop starts. The condition is checked every time through the loop to see if we should continue the loop. We have no variable we care to initialize, and we want the loop to run forever so we don't use a condition. The variable update piece is executed after each time through the loop, and this we can use. Instead of modifying a variable used in the loop, we take a lovely two second nap. This gives us our nice delay between each ping.

There you have the long awaited PowerShell version of this command. It is better than CMD.EXE, but there is no nice way to use the short-circuit operator && or || operators to make this command more efficient. Don't tell Hal, but I'm really jealous of the way his shell can be used to complete this task. I'm jealous of his terseness...and his full head of hair.

Hal washes clean

Let's be clear. The only thing that smells around here is the Windows shells. Have to use ping to put a sleep in your loop? Sleep that works at the start of the loop but not the end? What kind of Mickey Mouse operating system is that?

The Linux solution doesn't look a lot different from the Windows solutions:

while :; do ping -c 1 -W 1 8.8.8.8 >/dev/null || date; sleep 1; done

"while :; do ... done" is the most convenient way of doing an infinite loop in the shell. The ping command uses the "-c 1" option to only send a single ping and "-W 1" to only wait one second for the response. We send the ping output to /dev/null so that it doesn't clutter the output of our loop. Whenever the ping fails, it returns false and we end up running the date command on the right-hand side to output a timestamp. The last thing in the loop is a sleep for one second. And yes, Tim, it actually works at the end of the loop in my shell.

Well that was easy. Hmmm, I don't want to embarrass Nathan and Tim by making my part of the Episode too short. How about we make the output of the date command a little nicer:

while :; do ping -c 1 -W 1 8.8.8.8 >/dev/null || date '+%F %T'; sleep 1; done
"%F" is the "full" ANSI-style date format "2013-03-12" and "%T" is the time in 24-hour notation. So we get "2013-03-12 04:56:22" instead of the default "Tue Mar 12 04:56:22 EST 2013"

Oh, you want to save the output in a file as well as having it show up in your terminal window? No problemo:

while :; do ping -c 1 -W 1 8.8.8.8 >/dev/null || date; sleep 1; done | tee mylog.txt

Hooray for tee!

Well I can't tart this up any more to save Tim and Nathan's fragile egos. So I'm outta here to go find a shower.

Sunday, January 6, 2013

An AWK-ward Response

A couple of weeks ago I promised some answers to the exercises I proposed at the end of my last post. What we have here is a case of, "Better late than never!"

1. If you go back and look at the example where I counted the number of processes per user, you'll notice that the "UID" header from the ps command ends up being counted. How would you suppress this?

There's a couple of different ways you could attack this using the material I showed you in the previous post. One way would be to do string comparison on field $1:

$ ps -ef | awk '$1 != "UID" {print $1}' | sort | uniq -c | sort -nr
    178 root
     58 hal
      2 www-data
    ...

An alternative approach would be to use pattern matching to print lines that don't match the string "UID". The "!" operator means "not", so the expression "!/UID/" does what we want:

$ ps -ef | awk '!/UID/ {print $1}' | sort | uniq -c | sort -nr
    178 root
     57 hal
      2 www-data
    ...

You'll notice that the "!/UID/" version counts one less process for user "hal" than the string comparison version. That's because the pattern match is matching the "UID" in the awk code and not showing you that process. So the string comparison version is slightly more accurate.

2. Print out the usernames of all accounts with superuser privileges (UID is 0 in /etc/passwd).

Remember that /etc/passwd file is colon-delimited, so we'll use awk's "-F" operator to split on colons. UID is field #3 and the username is field #1:

$ awk -F: '$3 == 0 {print $1}' /etc/passwd
root

Normally, a Unix-like OS will only have a single UID 0 account named "root". If you find other UID 0 accounts in your password file, they could be a sign that somebody's doing something naughty.

3. Print out the usernames of all accounts with null password fields in /etc/shadow.

You'll need to be root to do this one, since /etc/shadow is only readable by the superuser:

# awk -F: '$2 == "" {print $1}' /etc/shadow

Again, we use "-F:" to split the fields in /etc/shadow. We look for lines where the second field (containing the password hash) is the empty string and print the first field (the username) when this condition is true. It's really not much different from the previous /etc/passwd example.

You should get no output. There shouldn't be any entries in /etc/shadow with null password hashes!

4. Print out process data for all commands being run as root by interactive users on the system (HINT: If the command is interactive, then the "TTY" column will have something other than a "?" in it)

The "TTY" column in the "ps" output is field #6 and the username field is #1:

# ps -ef | awk '$1 == "root" && $6 != "?" {print}'
root      1422     1  0 Jan05 tty4     00:00:00 /sbin/getty -8 38400 tty4
root      1427     1  0 Jan05 tty5     00:00:00 /sbin/getty -8 38400 tty5
root      1434     1  0 Jan05 tty2     00:00:00 /sbin/getty -8 38400 tty2
root      1435     1  0 Jan05 tty3     00:00:00 /sbin/getty -8 38400 tty3
root      1438     1  0 Jan05 tty6     00:00:00 /sbin/getty -8 38400 tty6
root      1614  1523  0 Jan05 tty7     00:09:00 /usr/bin/X :0 -nr -verbose -auth ... 
root      2082     1  0 Jan05 tty1     00:00:00 /sbin/getty -8 38400 tty1
root      5909  5864  0 13:42 pts/3    00:00:00 su -
root      5938  5909  0 13:42 pts/3    00:00:00 -su
root      5968  5938  0 13:47 pts/3    00:00:00 ps -ef
root      5969  5938  0 13:47 pts/3    00:00:00 awk $1 == "root" && $6 != "?" {print}

We look for the keyword "root" in the first field, and anything that's not "?" in the sixth field. If both conditions are true, then we just print out the entire line with "{print}".

Actually, "{print}" is the default action for awk. So we could shorten our code just a bit:

# ps -ef | awk '$1 == "root" && $6 != "?"'
root      1422     1  0 Jan05 tty4     00:00:00 /sbin/getty -8 38400 tty4
root      1427     1  0 Jan05 tty5     00:00:00 /sbin/getty -8 38400 tty5
root      1434     1  0 Jan05 tty2     00:00:00 /sbin/getty -8 38400 tty2
...

5. I mentioned that if you kill all the sshd processes while logged in via SSH, you'll be kicked out of the box (you killed your own sshd process) and unable to log back in (you've killed the master SSH daemon). Fix the awk so that it only prints out the PIDs of SSH daemon processes that (a) don't belong to you, and (b) aren't the master SSH daemon (HINT: The master SSH daemon is the one who's parent process ID is 1).

This one's a little tricky. Take a look at the sshd processes on my system:

# ps -ef | grep sshd
root      3394     1  0  2012 ?        00:00:00 /usr/sbin/sshd
root     13248  3394  0 Jan05 ?        00:00:00 sshd: hal [priv] 
hal      13250 13248  0 Jan05 ?        00:00:02 sshd: hal@pts/0  
root     25189  3394  0 08:27 ?        00:00:00 sshd: hal [priv] 
hal      25191 25189  0 08:27 ?        00:00:00 sshd: hal@pts/1  
root     25835 25807  0 15:33 pts/1    00:00:00 grep sshd

For modern SSH daemons with "Privilege Separation" enabled, there are actually two sshd processes per login. There's a root-owned process marked as "sshd: <user> [priv]" and a process owned by the user marked as "sshd: <user>@<tty>". Life would be a whole lot easier if both processes were identified with the associated pty, but alas things didn't work out that way. So here's what I came up with:

# ps -ef | awk '/sshd/ && !($3 == 1 || /sshd: hal[@ ]/) {print $2}'

First we eliminate all processes except for the sshd processes with "/sshd/". We only want to print out the process IDs if it's not the master SSH daemon ("$3 == 1" to make sure the PPID isn't 1) or if it's not one of my SSH daemons ("/sshd: hal[@ ]/" means the string "sshd: hal" followed by either "@" or space). If everything looks good, then print the process ID of the process ("{print $2}").

Frankly, that's some pretty nasty awk. I'm not sure it's something I'd come up with easily on the spur of the moment.

6. Use awk to parse the output of the ifconfig command and print out the IP address of the local system.

Here's the output from ifconfig on my system:

$ ifconfig eth0
eth0      Link encap:Ethernet  HWaddr f0:de:f1:29:c7:18  
          inet addr:192.168.0.14  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::f2de:f1ff:fe29:c718/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:7724312 errors:0 dropped:0 overruns:0 frame:0
          TX packets:13553720 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          RX bytes:711630936 (711.6 MB)  TX bytes:17529013051 (17.5 GB)
          Memory:f2500000-f2520000 

So this is a reasonable first approximation:

$ ifconfig eth0 | awk '/inet addr:/ {print $2}'
addr:192.168.0.14

The only problem is the "addr:" bit that's still hanging on. awk has a number of built-in functions, including substr() which can help us in this case:

$ ifconfig eth0 | awk '/inet addr:/ {print substr($2, 6)}'
192.168.0.14

substr() takes as arguments the string we're working on (field $2 in this case) and the place in the string where you want to start (for us, that's the sixth character so we skip over the "addr:"). There's an optional third argument which is the number of characters to grab. If you leave that off, then you just get the rest of the string, which is what we want here.

There are lots of other useful built-in functions in awk. Consult the manual page for further info.

7. Parse the output of "lsof -nPi" and output the unique process name, PID, user ID, and port combinations for all processes that are in "LISTEN" mode on ports on the system.

Let's take a look at the "lsof -nPi" output using awk to match only the lines for "LISTEN" mode:

# lsof -nPi | awk '/LISTEN/'
sshd      1216     root    3u  IPv4   5264      0t0  TCP *:22 (LISTEN)
sshd      1216     root    4u  IPv6   5266      0t0  TCP *:22 (LISTEN)
mysqld    1610    mysql   10u  IPv4   6146      0t0  TCP 127.0.0.1:3306 (LISTEN)
vmware-au 1804     root    8u  IPv4   6440      0t0  TCP *:902 (LISTEN)
cupsd     1879     root    6u  IPv6  73057      0t0  TCP [::1]:631 (LISTEN)
cupsd     1879     root    8u  IPv4  73058      0t0  TCP 127.0.0.1:631 (LISTEN)
apache2   1964     root    4u  IPv4   7412      0t0  TCP *:80 (LISTEN)
apache2   1964     root    5u  IPv4   7414      0t0  TCP *:443 (LISTEN)
apache2   4112 www-data    4u  IPv4   7412      0t0  TCP *:80 (LISTEN)
apache2   4112 www-data    5u  IPv4   7414      0t0  TCP *:443 (LISTEN)
apache2   4113 www-data    4u  IPv4   7412      0t0  TCP *:80 (LISTEN)
apache2   4113 www-data    5u  IPv4   7414      0t0  TCP *:443 (LISTEN)
skype     5133      hal   41u  IPv4 104783      0t0  TCP *:6553 (LISTEN)

Process name, PID, and process owner are fields 1-3 and the protocol and port are in fields 8-9. So that suggests the following awk:

# lsof -nPi | awk '/LISTEN/ {print $1, $2, $3, $8, $9}'
sshd 1216 root TCP *:22
sshd 1216 root TCP *:22
mysqld 1610 mysql TCP 127.0.0.1:3306
vmware-au 1804 root TCP *:902
cupsd 1879 root TCP [::1]:631
cupsd 1879 root TCP 127.0.0.1:631
apache2 1964 root TCP *:80
apache2 1964 root TCP *:443
apache2 4112 www-data TCP *:80
apache2 4112 www-data TCP *:443
apache2 4113 www-data TCP *:80
apache2 4113 www-data TCP *:443
skype 5133 hal TCP *:6553

And if we want the unique entries, then just use "sort -u":

# lsof -nPi | awk '/LISTEN/ {print $1, $2, $3, $8, $9}' | sort -u
apache2 1964 root TCP *:443
apache2 1964 root TCP *:80
apache2 4112 www-data TCP *:443
apache2 4112 www-data TCP *:80
apache2 4113 www-data TCP *:443
apache2 4113 www-data TCP *:80
cupsd 1879 root TCP 127.0.0.1:631
cupsd 1879 root TCP [::1]:631
mysqld 1610 mysql TCP 127.0.0.1:3306
skype 5133 hal TCP *:6553
sshd 1216 root TCP *:22
vmware-au 1804 root TCP *:902

Looking at the output, I'm not sure I care about all of the different apache2 instances. All I really want to know is which program is using port 80/tcp and 443/tcp. So perhaps we should just drop the PID and process owner:

# lsof -nPi | awk '/LISTEN/ {print $1, $8, $9}' | sort -u
apache2 TCP *:443
apache2 TCP *:80
cupsd TCP 127.0.0.1:631
cupsd TCP [::1]:631
mysqld TCP 127.0.0.1:3306
skype TCP *:6553
sshd TCP *:22
vmware-au TCP *:902

In the above output you see cupsd bound to both the IPv4 and IPv6 loopback address. If you just care about the port numbers, we can flash a little sed to clean things up:

# lsof -nPi | awk '/LISTEN/ {print $1, $8, $9}' | \
    sed 's/[^ ]*:\([0-9]*\)/\1/' | sort -u -n -k3
sshd TCP 22
apache2 TCP 80
apache2 TCP 443
cupsd TCP 631
vmware-au TCP 902
mysqld TCP 3306
skype TCP 6553

In the sed expression I'm matching "some non-space characters followed by a colon" ("[^ ]*:") with some digits afterwards ("[0-9]*"). The digits are the port number, so we replace the matching expression with just the port number. Notice I used "\(...\)" around the "[0-9]*" to create a sub-expression that I can substitute on the right-hand side as "\1".

I've also modified the final "sort" command so that we get a numeric ("-n") sort on the port number ("-k3" for the third column). That makes the output look more natural to me.

I guess the moral of the story here is that awk is good for many things, but not necessarily for everything. Don't forget that there are other standard commands like sed and sort that can help produce the output that you're looking for.

Happy awk-ing everyone!