Tuesday, March 29, 2011

Episode #140: Folder Foulups

Tim checks the mailbag this time

Tom He-Who-Shall-Not-Be-Named "Doe" writes in:

We have a network share and while users do not have rights to create folders at the root of the drive, they have the ability to accidentally start dragging a copy of a root folder to some sub-location in a directory where they have modify rights, thus creating incomplete (and unnecessary) copies of root directories. I never get word of this (surprise!).

Can you put together a script that I can run periodically which will dynamically inventory the top-level directories (currently 28) on this network share and looking for sub-folders with matching names *** all the way down *** underneath those top-level dirs? Yes, I realize this could run for hours, but I will run it against a DR box that has DFSR on it, so I won't impact performance for the users.

He also included a sample directory structure:

-----> FolderA
-----> FolderB
-----> FolderC
----------> FolderA
----------> FolderB
----------> FolderC

Per our conversation with Tom I'm-on-the-run-from-the-mob Doe, all we need to do is match a subdirectory name with the name of one of the root directories. Here is the rather simple command to do what we need:

PS C:\MyShare> $a = ls | % {$_.Name}; ls -r * | 
? { $_.PSIsContainer -and $a -contains $_.Name } | select fullname


The are two parts to our command. First, we load a variable ($a) with the names of each of our root directories. We have to use the ForEach-Object cmdlet to extract the name in a scalar, string format.

The second portion is our search. We start off with a recursive directory listing and send the objects into a Where-Object (alias ?) filter. Our filter ensures that we are only looking at directory objects (Containers). It also checks to see if the Name of the current object is contained in our collection of directory names. If both criteria are satisfied the results are passed down the pipeline. Finally, we output just the full path (FullName) of our object.

There is one tiny little nugget of information that I didn't mention, but if you looked really close you might have wondered why I added the asterisk with ls -r. Why? Well let's see what happens when we don't use it:

PS C:\MyShare> $a = ls | % {$_.Name}; ls -r | 
? { $_.PSIsContainer -and $a -contains $_.Name } | select fullname


For some reason the command "ls -r" includes the directory listing for the current directory but "ls -r *" does not. It is a cool little trick, but I have no idea why it works. Here is a better demonstration using the Compare-Object cmdlet:

PS C:\MyShare> Compare-Object (ls -r) (ls -r *)

InputObject SideIndicator
----------- -------------
DIR1 <=
DIR2 <=
DIR3 <=

Weird, but it works.


While we can't take the same approach with CMD.EXE, we can accomplish this task with the following command:

C:\MyShare> for /F "tokens=*" %a in ('dir /B /S /A:D') do
@for /F "tokens=*" %r in ('dir /B /A:D') do @echo %a| findstr "\\%r$"


Our command wraps a For loop around a bare format (/B), recursive (/S) directory listing that only looks for directories (/A:D). The variable %a contains the full path of our directory. We then use another For loop to get the names (not full path) of the directories in our root folder. It should be noted that the bare format (/B) is different depending on whether the recursive option (/S) is used. The format is the full format when used with /S, and just the filename without /S.

We then use Echo to output %a (full directory path), and use Findstr to search for %r (a root folder name). The regular expression ensures the name is a full match, and that the match is at the end of the string. To do this we make sure that our name is between a backslash and the end of line ($). The backslash is the regular expression escape character so it must be escaped with another backslash.

Unfortunately, this command is going to be slooooooow. For each new directory we traverse, we will have to do a directory listing of our root directory. This means we will probably do thousands of extra directory listings that we don't need to do with PowerShell.

We can filter out our initial directories by adding an extra Findstr command to the end to filter out anything that is two levels deep. This additional filter looks like this:

 ... | findstr /V "C:\\MyShare\\[^\\]*$"

The /V only prints lines that do not contain a match. Our search string matches the first portion of our path ("C:\MyShare\"), then any character that isn't a backslash, and finally the end of line character. The means that C:\MyShare\Blah will be filtered out, but C:\MyShare\Blah\Blah will be displayed.

It may not be fast, but at least its [mostly] easy. Now, let's see if Hal fast and easy.

Hal wonders if Tim is projecting a bit

Personally I prefer to think of it as "efficient and morally flexible", but have it your own way. As far as the challenge goes, Unix makes this both fast and, well... easy is in the eye of the beholder I guess.

There is a simple find command for solving this challenge:

$ find */* -type d \( -name DIR1 -o -name DIR2 -o -name DIR3 \)

Notice that I'm doing "find */* ..." so that I start my search at the level below the top. Otherwise I'd get matches on the top-level directories themselves, and that wouldn't be helpful.

This command works great as long as there are only three directories and their names don't change. But Mystery Tom's challenge specifies that there are actually 28 directories, which is way more than I'd want to type by hand. So it would be cool if there was a way to create the list of "-name DIR1 -o -name DIR2 ..." automatically.

It's actually pretty straightforward to do this:

$ ls | tr \\n ' ' | sed 's/ $//; s/ / -o -name /g'
DIR1 -o -name DIR2 -o -name DIR3

I start with the list of top-level directories, which the ls command will output with newlines after each name. So I use tr to transform the newlines to spaces, which leaves me with a trailing space in place of the last newline. But since I'm feeding the whole thing into sed anyway, I first have sed remove the trailing space and then replace all of the remaining spaces with " -o -name ".

The only problem is that I have no "-name" at the beginning of my output, but I can fix that up when I do the output substitution into the original find command:

$ find */* -type d \( -name $(ls | tr \\n ' ' | sed 's/ $//; s/ / -o -name /g') \)

All I did here was take our ls pipeline and throw it inside "$(...)"-- the command output substitution operation-- in place of the "DIR1 -o -name DIR2 -o -name DIR3" that I hand-entered in the original find command. Since we needed the extra "-name" at the front, I just put that in manually before the "$(...)". So now we have a command that will work for any number of directories and even work when the directory names change.

And it's much sexier than either of Tim's solutions.

Tuesday, March 22, 2011

Episode #139: Text or Video... or Both?

Hal's in the mailbag again

Recently we got a request from Seth Feldman. Seth's trying to organize his directory of conference videos, which is structured like:


In each "leaf" directory there will be one or more video files. Each video file should have a corresponding *.txt file that describes the video. Usually the text file shares the same base file name as the video, but not always (as you can see above). Seth wants to find all of the videos in his (extensive) collection that don't yet have a descriptive text file and/or cases where he's created the text file but hasn't had a chance to download the video.

If everything were like the SANS2011 directory in our example-- where the video file and the text description share the same base file name-- then we can go with a simpler solution:

$ find SANS2011 -type f | sed 's/\.[^.]*$//' | sort | uniq -u

Just find all the files, strip off the file extension, and then use "uniq -u" to find all the base file names that only appear once in the output. Unfortunately this solution fails on directories like the BHDC2011 and Shmoo2011 dirs where the files have different names, giving you a false-positive.

I could make a small mod to help the situation in this case:

$ find . -type f | sed -r 's/\/(info|notes).txt/\/talk.txt/; s/\.[^.]*$//' | 
sort | uniq -u


If you can count on the text file being named "info.txt" or "notes.txt" and the corresponding video to be called "talk.*", then just tweaking the sed to "rename" the *.txt file as I'm doing above will work. But I'm not sure we can count on this pattern being repeated throughout the entire directory structure.

So I went with an uglier approach:

$ find . -type d -links 2 | 
while read dir; do
a=$(ls "$dir" | wc -l);
t=$(ls "$dir"/*.txt | wc -l);
o=$(($a - $t));
[[ $o == $t ]] || echo $dir - $t txt, $o vid;
done 2>/dev/null

./SANS2011/Pomeranz - 0 txt, 1 vid
./SANS2011/Medin - 1 txt, 0 vid

Here I'm using "find . -type d -links 2" to find all of the "leaf" directories. Why does this work? First, the minimum link count on any directory is two because there's the pointer to the directory from its parent plus the "." link in the directory which points back to itself. Any time you make a subdirectory, however, that subdirectory contains a ".." link that points back to its parent, increasing the parent's link count by one. So directories with link count 2 must have no subdirs, and thus they are "leaf" directories.

I next do a loop over all of the leaf directories I find. Inside the loop I calculate the total number of files in each directory and the number of *.txt files. Then I subtract the number of text files from the total number of files, giving me the number of non-text (or "other") files. If the number of text files doesn't equal the number of other files, then output some information about the directory.

Unfortunately, while our original solution gave us false-positives, this version ends up giving us false-negatives. The ./SANS2011/Skodo directory has an orphan *.txt file and an orphan video file. But since there are the same number of orphans of each type, our "count the file types" solution doesn't flag this directory as a problem.

So which do you prefer, false-positives or false-negatives? In this case, I'm going to go with getting some false-positives, because frankly the first version of the command is a lot easier to type. But your mileage, as always, may vary.

Now if Tim can leave off his Jitterbugging for a moment, we'll see what he's got up his sleeve.

Tim jitterbugs the night away

Hal took two approaches to this episode, so I will to. We'll start off with the false negative approach which just compares the number of txt files and the number of non-txt files. Here is the command:

PS C:\videos> Get-ChildItem -Recurse | ? {
$_.PSIsContainer -and
($_.GetFiles("*.txt").Count * 2) -ne $_.GetFiles().Count
} | Select-Object FullName


I broke the command into multiple lines for readability, but of course this could all be on one line. We start off with the basic recursive directory listing, followed by a Where-Object filter which has two parts:
1) Directories only
2) The number of files in the directory does NOT equal double the number of text files.

The last portion of the filter may not make sense at first glance, so let me explain it a bit further. The Pomeranz directory should have 1 text file, and 1 video file. The number of txt files in the directory is 0, and doubled is still 0. This is compared to the total number of files (1). The result is not equal so the object is passed down the pipeline.

Similarly, my (Medin) directory has 1 txt file, and doubled is 2. There should be two total files in the directory, but there aren't. The values are Not Equal (ne) and the object is passed down the pipeline.

Now let's look at the SANS2011\Skodo directory. There are 2 txt files, and double that is 4. This is compared to the total number of files in the directory (4) and the results are equal. Since we want Not Equal (ne) results, this object is not passed down the pipeline. Note that non-leaf directories will have 0 txt files and 0 non-txt files and will therefor not make it through our filter.

Of course, upon closer inspection of the SANS2011\Skodo directory we see that while Ed's directory does have the right number of files, but it does not have matching pairs. If we try to match the video and file names then we can use this command:

PS C:\videos> Get-ChildItem -Recurse | ? { !$_.PSISContainer } |
Group-Object -Property BaseName,PSParentPath -NoElement | ? { $_.Count -ne 2 }

Count Name
----- ----
1 info, Microsoft.PowerShell.Core\FileSystem::C:\videos\BHDC2011\Larimer
1 talk, Microsoft.PowerShell.Core\FileSystem::C:\videos\BHDC2011\Larimer
1 Jitterbug, Microsoft.PowerShell.Core\FileSystem::C:\videos\SANS2011\Medin
1 Rumba, Microsoft.PowerShell.Core\FileSystem::C:\videos\SANS2011\Pomeranz
1 AchyBreaky, Microsoft.PowerShell.Core\FileSystem::C:\videos\SANS2011\Skodo
1 Lambada, Microsoft.PowerShell.Core\FileSystem::C:\videos\SANS2011\Skodo
1 notes, Microsoft.PowerShell.Core\FileSystem::C:\videos\Shmoo2011\Coyne
1 talk, Microsoft.PowerShell.Core\FileSystem::C:\videos\Shmoo2011\Coyne

The output isn't great, but it does get our results. The "Name" property contains the full name of the path, including the provider (FileSystem). We'll clean up the results in a bit, but let's go over the command first.

We start off getting a recursive directory listing and we filter out the directories, so we are left with a collection of all the files. We then group the objects based on their path and the base name (filename without the extension). Any group that does not have exactly two members is removed, since that is a matching pair.

We have two minor problems. First, we have a messy output. Second, we have duplicate directories since each mismatched file will produce output. We'll first strip out the Provider portion of the path (Microsoft.PowerShell.Core\FileSystem::) we'll have nicer output.

 ... | Select-Object @{Name="Directory";Expression={$_.Name -replace '.*::', ''}}

We then remove duplicates like this:

... | Get-Unique -AsString


Honestly, I understand that you need the -AsString switch, but I don't understand why it isn't smarter. Here is the relevant section from the help page (Get-Help Get-Unique):
"[-AsString] Treats the data as a string. Without this parameter, data is treated as an object, so when you submit a collection of objects of the same type to Get-Unique, such as a collection of files, it returns just one (the first). You can use this parameter to find the unique values of object properties, such as the file names." Ok, whatever.

We can shorten our last command by using positional parameters, aliases, and short parameter names:

C:\videos> ls -r | ? { !$_.PSISContainer } | group BaseName,PSParentPath |
? { $_.Count -ne 2 } | select @{n="Directory";e={$_.Name -replace '.*::', ''}} |
unique -a

Now back to my jitterbug.

Tuesday, March 15, 2011

Episode #138: Flux This!

A Vaguely Familiar-Looking Stranger Begins:
As the scene opens, an unkempt man in tattered, dusty clothes walks forward. His wild eyes shift to and fro, barely concealed by his matted hair (what’s left of it) as he surveys the audience. He begins to speak and his creaky voice bears an uncanny resemblance to that of a long-lost friend…

Greetings, fellow Shell-Fu aficionados! Ed Skoudis here, a blast from the past, briefly back from my hiatus for this episode. I remember when we first started this blog way back, oh, about two years ago. We were so young then… na├»ve… but full of pluck and verve. Starry-eyed Paul Asadoorian had his hopes tied to building a successful commercial ferret farm, harnessing the little creatures on hamster wheels as a form of cheap electricity. Hal secretly yearned to participate in that Village People revival band that never really took off. And, Tim was but a twinkle in his father’s eyes, shortly followed by a firm slap in the face from his momma. My how long it’s been and how far we’ve come.

But, I digress. I’m here today because I was teaching my SANS Security 504 class last week, and I got a great question from a brilliant attendee. We were covering bot-net fast-flux techniques, naturally, in which a bad guy sends a phishing-style e-mail to many users, exhorting them to click on a link. Normally, the link would point directly to a web server where the bad guy had planted a fake bank, imposter e-commerce site, or other nefarious server waiting to dupe users into providing their login credentials. With the fast flux technique, however, the link in the e-mail doesn’t point directly to the fake bank. You see, sending such a link that points directly to the attacker’s website will help the good guys find where the evil site is, and get it taken down quickly. In the bad guy business, keeping your evil infrastructure resilient to take-down notices is helpful in making moolah.

To that end, attackers use the fast-flux technique. Instead of the link including a name that resolves directly to the IP address of the evil server, the link has a name that resolves via DNS to an intermediate HTTP relay, which is just a bot-infected machine that sits between the victim user and the attacker’s server. To stay one step ahead of the good guys, the attacker fluxes the DNS record so that after a short time-to-live expires (often 1 to 3 minutes), the name in the link from the phishing e-mail now resolves to a different IP address, another bot-infected machine acting as a relay. The attacker fluxes the A records in the DNS server so that the name points to a different address every minute or so, making the investigator’s job more difficult as the server appears to be jumping around from place to place, preserving the bad guy’s real server longer.

The question that popped up in class was this: Is there a free tool that simply queries DNS looking for the record for a given name every 30 seconds or so, and then displays any changes it sees over time? In class, I responded that I could create something that does that, Name-That-Tune-Style, in but a single Windows command -- a hideously ugly command, but still just in one command. Over some not-very-good hotel halibut for dinner that night, I put together the command and showed it in class the next day.

I mentioned this to Hal and Tim, and, desperate as they are for any idea whatsoever for an episode, they agreed that we should make it an episode. In our e-mail exchange, Tim asked if I’d send the command I had prepared. I told him that I’d do it as the single favor I owed him. He responded:

“That was my one favor, and I just wasted it. Dang! So much for the piggy-back ride I wanted during the week of SANS FIRE. I even had tiny matching cowboy hats.”

Tim, for a tiny cowboy hat, I’ll gladly grant you another favor. Fear not!

So, without further adieu, here is my command:

C:\> cmd.exe /v:on /c "set stuff=& for /L %i in (1,0,2) do @for /F "skip=1 
delims=" %j in ('"nslookup betty.target.tgt name.server.tgt 2>nul ^| find "Address""')
do @(set new=%j& (if NOT !stuff!==!new! (echo !date! !time! !new! & set
stuff=!new!)) & ping -n 6>nul)"

The command looks up the given name every 5 seconds (change the "6" to "31" to make it go every 30 seconds), and if the IP address changes, it prints out a date/time stamp and the new IP address or addresses associated with the name. You simply provide the name that interests you in place of “betty.target.tgt” and your name server in place of name.server.tgt. I structured the command so that it will work whether it gets a single IP address back or multiple addresses in a round-robin setup. Here is its output if there is a single name included in the answer for betty.target.tgt that is fluxing:

Fri 03/11/2011 13:11:50.95 Address:
Fri 03/11/2011 13:12:06.35 Address:
Fri 03/11/2011 13:12:47.43 Address:
Fri 03/11/2011 13:13:02.85 Address:

And, here is its output for a round-robin DNS flux (which I modeled by simply putting in www.google.com for the name we’re interested in):

Fri 03/11/2011 13:14:41.56 Addresses:,,,
Fri 03/11/2011 13:14:46.74 Addresses:,,,
Fri 03/11/2011 13:14:51.91 Addresses:,,,
Fri 03/11/2011 13:15:02.23 Addresses:,,,

Note that in the round-robin results, my command notices that a change occurred, but it cannot tell whether it was merely a change of order, or a change of addresses. Still, for fast-flux botnet that uses round-robin DNS, it is almost always a change of addresses, so my command works well for its intended use.

Now, diligent readers of this blog (both of you) will instantly be familiar with the mechanics of this command. But for those that aren’t, here’s a brief synopsis of what I’ve wrought:
  • cmd.exe /v:on /c: Turn on delayed variable expansion, so my variables can float when I refer to them as !var!.
  • set stuff=&: Clears the stuff variable, so that it contains nothing. You’ve gotta put no space between the equals sign and the &, or else stuff will contain a space. Stuff will hold my current address from the lookup.
  • for /L %i in (1,0,2) do: An infinite loop that starts counting at 1 and counts in steps of zero all the way up to 2. This keeps us spinning, kind of like the tire rims Paul has on his “babe-mobile”.
  • @: Turn off echo of commands.
  • for /F “skip=1 delims=” %j in: Set up parsing of the output of nslookup | find. If my nslookup command succeeds, find will scrape through its output looking for lines that say “Address”. There will be two such lines: one that indicates the IP address of the name server, which I want to skip (skip=1) and one that indicates the address(es) of the names we searched for. The delims= says to turn off default parsing on spaces and tabs. I want whole lines, baby!
  • ‘”…”’: This single quote / double quote combo says to run the command inside.
  • nslookup betty.target.tgt name.server.tgt 2>nul: Look up the name, and throw error messages away.
  • | find “Address”: Look for output lines that have the word “Address” in them.
  • do @(): Take the output of the nslookup | find command and do stuff with it.
  • set new=%j&: Take our output of nslookup (%j) and assign its value to the variable named new. Again, we don’t want a trailing space included in this assignment, so we follow with an & immediately.
  • (if NOT !stuff!==!new!: If our previous lookup result does not match our most recent lookup, we’ve got a change!
  • (echo !date! !time! !new!: Because we have a change, let’s display the date, time, and our new result.
  • & set stuff=!new!)): We better store our new result in stuff, so we can see if it changes going forward.
  • & ping –n 6>nul)”: Introduce a 5-second delay by pinging ourselves 6 times, before we loop back.

Gee, that was fun! Maybe I won’t wait so long for a return visit next time. So, what have you got, Village Hal and Cowboy Tim?

Tim gets out his hats and prepares to saddle up:

I am so looking forward to making SANS FIRE now! Anyone know somebody who embroiders tiny hats?

Besides a shared loved of tiny hats, we also share a similar approach on this episode. The process is to get the output, store it, wait, get more results, compare, if the results are different then show output and update the stored results. Rinse and repeat. But, unlike the cmd.exe version, we'll handle DNS Round Robin by sorting the output addresses, like this:

PS C:\> nslookup -d www.google.com 2>$null | Select-String 'internet address' |
Select-Object -Expand Line | Sort-Object

internet address =
internet address =
internet address =
internet address =
internet address =

I decided to use nslookup with the -d option. The format of nslookup is different between Windows 7/2008 and that shown above (I assume Ed used XP). Specifically, Windows 7/2008 has line breaks between each address instead of commas. Using the -d option makes finding the addresses easier since we can filter for "internet address".

The Select-String cmdlet does our filtering, but it creates a bunch of objects containing a information regarding the match, such as line number and the match pattern. The -ExpandProperty switch used with Select-Object just outputs the Line information as a scalar (not a MatchInfo object).

PS C:\> while (1) { $new = nslookup -d  betty.target.tgt name.server.tgt 2>$null |
Select-String 'internet address' | Select-Object -Expand Line | Sort-Object;
if ( compare-object $new $old ) { Get-Date; $new; $old = $new }; sleep 10 }

Saturday, March 12, 2011 12:57:58 AM
internet address =
Saturday, March 12, 2011 12:58:08 AM
internet address =

We start off with an infinite while loop, since 1 is always true. Then we set $new equal to the output of our fancy nlsookup fu. The Compare-Object cmdlet is used to compare the valued between $old and $new. If they are the same, then there is no output (null). Since a null is treated as False by the If statement, nothing happens in our command. If the variables are different, there is output from Compare-Object and the if statement takes the True path.

On the path of Truth, the date and output from nslookup are displayed. Then the value from $new is copied into $old. The $old value is used on every loop to see if our values changed. Finally, the command will sleep for 10 seconds and start again. Not too bad right?

They say it takes a village, and sometimes we are that village. But I'm not sure what takes The Village People. Hal?

Hal returns from his visit to the Y-M-C-A

If there's one thing my time in the Village People tribute band has taught me, it's that there's no problem that can't be overcome if you're wearing tight leather and cut-off t-shirts. So let me strap on my coding chaps and get down with it.

There are a number of options for doing DNS lookups from the command-line in Unix, but let's just go with the host command because it's simpler:

$ host www.google.com
Using domain server:

www.google.com is an alias for www.l.google.com.
www.l.google.com has address
www.l.google.com has address
www.l.google.com has address
www.l.google.com has address
www.l.google.com has address
www.l.google.com has address

What I really care about are the lines that read "<host> has address <address>". Unlike my co-authors, however, I don't consider round-robin DNS to be alert-worthy. So what I'd like to do is sort my results into a canonical order, meaning that I'll only raise a flag if there's an actual change to one of the IP addresses.

Not a problem:

$ host www.google.com | awk '/ has address / { print $4 }' | sort | tr \\n ' '

At this point, it's just a matter of creating a loop like the Ed and Tim did. Only mine's much prettier, of course:

$ while :; do 
new=$(host www.google.com | awk '/ has address / { print $4 }' | sort | tr \\n ' ');
[ "$new" == "$old" ] || echo $(date) -- $new;
sleep 30;

Sat Mar 12 11:41:26 PST 2011 --

No need to bother with a full-on if statement inside the loop. I'm just using the short-circuit "||" operator after the equality test. As long as $new and $old are the same, the echo statement never gets executed. But when the IP addresses change, we'll spit out the date and the new values.

The rest of the loop is very much like Tim's (or even Ed's if you don't count the fact that we don't have to ping in order to sleep for a specified period of time). We assign the value of $new to the variable "old" and sleep for our chosen interval before starting the loop all over again.

So another easy one for Unix. Now if you all will excuse me, I need to go practice my dancing. After all, you can't stop the music!

Tuesday, March 8, 2011

Episode #137: Free-base64-ing

Hal spreads the fu

Lately I've been teaching Lenny Zeltser's Reverse Engineering Malware course for SANS. It's chock full of great information and is a lot of fun to teach. Plus there are all kinds of opportunities for me to bust out the Command Line Kung Fu.

For example, in one exercise we analyze the behaviors of a trojan that's receiving command and control messages via base64 encoded strings inside of web page comments. The comments themselves are easy enough to extract from a memory image of the suspicious process:

$ strings proc-memory.img | grep '<!-- '
<!-- BgAAAA== --><br><br><html>
<!-- Y21kIC9jIGRlbCBzeXN0ZW1pbmZvLnR4dA== --><br><br><html>
<!-- YzpcYm9vdC5pbmk= --><br><br><html>
<!-- Y21kIC9jIGRlbCBwcm9jZXNzZXMudHh0 --><br><br><html>
<!-- cHJvY2Vzc2VzLnR4dA== --><br><br><html>
<!-- BAAAAA== --><br><br><html>
<!-- AgAAAA== --><br><br><html>
<!-- AgAAAA== --><br><br><html>
<!-- BAAAAA== --><br><br><html>

As you can see, once we decode the strings there is going to be a certain amount of duplication. So what we'd like to do in order to get an idea of the capabilities of this trojan is to dump out a list of the unique, decoded command strings.

Happily, most Linux distros these days include the base64 command line utility which does both encoding and decoding. However, we need to extract just the base64 encoded text from the comments before feeding it into the decoding routine. So we'll modify our command line a little bit and use awk instead of grep:

$ strings proc-memory.img | awk '/<!-- / {print $2}' | base64 -d
cmd /c del systeminfo.txtc:\boot.inicmd /c del processes.txtprocesses.txt...

I'm using the built-in pattern matching operator in awk ("/.../") to replace grep, and then I simply "{print $2}" to extract the base64 encoded text out of each comment. That output simply gets fed into "base64 -d" which decodes anything it gets on the standard input.

The only problem here is that our base64 encoded text doesn't include newlines. So our output gets all run together. This is going to be a problem when we want to extract the unique strings from the output. I decided to insert a loop so that I could format the output a little more nicely:

$ strings proc-memory.img | awk '/<!-- / {print $2}' | 
while read comment; do echo $comment | base64 -d; echo; done

cmd /c del systeminfo.txt
cmd /c del processes.txt

My while loop just feeds one comment at a time into the base64 program, and then uses echo to output a newline after each line of output. It's less efficient, but the output is more readable.

Now all we have to do to get the unique strings is pipe the whole mess into "sort -u":

$ strings proc-memory.img | awk '/<!-- / {print $2}' | 
while read comment; do echo $comment | base64 -d; echo; done | sort -u

cmd /c copy /B /Y *dump + 201* pwdump.txt
cmd /c del 201*
cmd /c del drivers.txt
cmd /c del *dump*

In the course materials, Lenny actually gives the students a Python program for accomplishing this task. But we don't need no stinking Python when we've got the Unix shell! Heck, I'm pretty sure even PowerShell can handle this one, right Tim?

Tim spreads the Brie

We can handle it! Yes we can! Although...

Windows (still) doesn't have a built in strings command, but we can use Select-String with Regular Expressions to find the strings we want.

PS C:\> gc proc-memory.img | Select-String -AllMatches '(?<=<!-- )[a-zA-Z0-9+/]+?=*(?= -->)' |
% { $_.Matches } | % { $_.Value }

Get-Content (alias gc) is used to output the file, which is then piped into Select-String. The Select-String cmdlet's -AllMatches switch returns all matches in a line, not just the first (default). This cmdlet also accepts regular expressions for searching, and we used a regular expression to find the begin comment tag (<!--), followed by base64 characters (A-Z, a-z, 0-9, +, /), optional padding (=), and finally the end comment tag (-->). Since we don't actually want the comment tags, rather just the text between the tags, we can use a regular expression look behind ((?<=) and a regular expression look ahead ((?=)) to make sure the tags exist but not actually select that text.

The Select-String cmdlet populates the $matches variable, which contains all the matches on a line. A ForEach-Object cmdlet (alias %) is used to access each match, which is piped into another ForEach-Object cmdlet that outputs the matched strings.

Now that we have the strings, but we need to base64 decode them. Unfortunately, there isn't a native command to this. There are a few ways to add this feature to the shell, but we'll use PowerShell's ability to access the .NET framework to accomplish this portion of our task.

To base64 decode a string, we use this command:

PS C:\> [text.encoding]::utf8.getstring([convert]::FromBase64String("Y21kIC9jIGRlbCBzeXN0ZW1pbmZvLnR4dA=="))
cmd /c del systeminfo.txt
We can then combine these two command to output the decoded versions of all the matched strings.

PS C:\> gc proc-memory.img | Select-String -AllMatches '(?<=<!-- )[a-zA-Z0-9/+]+?=*(?= -->)' | %
{ $_.Matches } | % { [text.encoding]::utf8.getstring([convert]::FromBase64String($_.Value)) }

cmd /c del systeminfo.txt
cmd /c del processes.txt
There is a lot of output, so let's do what Hal did by sorting the output and removing any duplicates.

PS C:\> <previous command> | Sort-Object -Unique
cmd /c copy /B /Y *dump + 201* pwdump.txt
cmd /c del *dump*
cmd /c del 201*
cmd /c del drivers.txt
cmd /c del ipconfig.txt
To answer Hal's qustion, yes, "even" PowerShell can do it!

Tuesday, March 1, 2011

Episode #136: Reporting for Duty

Hal dips into the mailbag:

We got an interesting challenge from Juan Cortes this week. Juan's got a text report that looks like this:

User Report                                         Date: 02/16/2011 09:57:14
All Users Page: 1 of 27

User Name Default Login Name Default Shell Name
-> Token Serial No./ Replacement Last Login Orig. Token Type/Auth with

Karim Abdul-Jabbar kajabbar
-> 000403861445 02/12/2011 00:30:01 Key Fob/Passcode
Larry Byrd lbyrd
-> 000203863210 09/27/2010 15:28:11 Key Fob/Passcode
System Admin administrator

LaBron James ljames
-> 000303861288 02/15/2011 15:52:21 Key Fob/Passcode

User Report Date: 02/16/2011 09:57:14
All Users Page: 2 of 27

User Name Default Login Name Default Shell Name
-> Token Serial No./ Replacement Last Login Orig. Token Type/Auth with

Derek Jeter djeter

Satchel Page spage
-> 000234203706 02/16/2011 12:28:40 Key Fob/Passcode

So we're looking at a lot of headers and other useless text and entries that are split across two lines. Juan wanted to filter out the useless text and empty records (like Derek Jeter) and create one line records for the useful information, specifically "<name> <username> <serial#> <date>".

I can do that with a couple of lines of awk:

$ awk '/-> [0-9]/ { print n, u, $2, $3 }; 
{ u = $NF; $NF = ""; n = $0 }' report.txt

Karim Abdul-Jabbar kajabbar 000403861445 02/12/2011
Larry Byrd lbyrd 000203863210 09/27/2010
LaBron James ljames 000303861288 02/15/2011
Satchel Page spage 000234203706 02/16/2011

Every single line in the report is going to get processed by the second block, which sets the "u" (username) and "n" (full name) variables. The username is set to the last (whitespace-delimited) field on the line, which is correct on the lines where the user's full name and username exist. We then null out the last field and set the full name variable to be the remainder of the line.

Now obviously the values in these variables are going to be erroneous on most of the lines in the report, but that doesn't matter because we're only outputting n and u in the first block of awk. And that block is only triggered on lines that match "-> followed by space and a digit" ("/-> [0-9]/")-- i.e., the second of the two lines in each user record. In this case we will have just processed a line that contains the user's full name and username, so n and u will be set appropriately. All we have to do at this point is select the values we care about from the second line of each record and output everything on one line. Matching on the "->" also allows us to easily eliminate the empty records than don't have serial number and date information. Note that if you want things to line up in nice, pretty columns you could use printf instead of print here.

This "accumulate values and output on trigger" approach is very useful when you're collecting data fields that span multiple lines. You can use this idea when processing XML files, Windows *.ini files, and many other file formats.

Let's see what kind of PowerShell trickery Tim has up his sleeve this week...

Tim puts the mailbag in the dip:

This is a cool little challenge, but you silly Linux guys always want text parsing. I'll follow along and give you text results, but you guys need to up your shell so you can use objects. Text parsing it is...

We first need to find the relevant lines in the report, and we can do that with the PowerShell equivalent of grep, Select-String.

PS C:\> Select-String -Path report.txt -Pattern '-> \d' -Context 1,0

report.txt:7:Karim Abdul-Jabbar kajabbar
> report.txt:8:-> 000403861445 02/12/2011 00:30:01 Key Fob/Passcode
report.txt:9:Larry Byrd lbyrd
> report.txt:10:-> 000203863210 09/27/2010 15:28:11 Key Fob/Passcode
report.txt:13:LaBron James ljames
> report.txt:14:-> 000303861288 02/15/2011 15:52:21 Key Fob/Passcode
report.txt:24:Satchel Page spage
> report.txt:25:-> 000234203706 02/16/2011 12:28:40 Key Fob/Passcode

Similar to what Hal's search, our search string looks for '->' followed by a number (\d = digit). The Context parameter is used to grab the line before our match so we can get the name in addition to the token information.

The Context parameter takes one or two numbers. If we give it one number, this is the number of lines captured before AND after the match. If we specify two numbers, the first specifies how many lines before the match to capture, the second specifies how many lines after the match to capture. We just want the line before the match and that is why we specified 1,0.

The default output of Select-String shows us which file and line number contained the match. The greater than sign designates which line contained the match so it is easy to see which lines match and which are context.

We have the lines we want, but now we need to figure out what to do with them. When I got to this point, I wasn't sure where to go next, so I used Get-Member (gm) to figure out what properties and methods were available for the output object. So let's see what's available.

PS C:\> Select-String -Path report.txt -Pattern '-> \d' -Context 1,0 | gm

TypeName: Microsoft.PowerShell.Commands.MatchInfo

Name MemberType Definition
---- ---------- ----------
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
RelativePath Method string RelativePath(string directory)
ToString Method string ToString(), string ToString(string directory)
Context Property Microsoft.PowerShell.Commands.MatchInfoContext Context {get;set;}
Filename Property System.String Filename {get;}
IgnoreCase Property System.Boolean IgnoreCase {get;set;}
Line Property System.String Line {get;set;}
LineNumber Property System.Int32 LineNumber {get;set;}
Matches Property System.Text.RegularExpressions.Match[] Matches {get;set;}
Path Property System.String Path {get;set;}
Pattern Property System.String Pattern {get;set;}

The Context property is what we are interested in, so let's look at that further.

PS C:\> Select-String -Path report.txt -Pattern '-> \d' -Context 1,0 | % { $_.Context } | gm

TypeName: Microsoft.PowerShell.Commands.MatchInfoContext

Name MemberType Definition
---- ---------- ----------
Clone Method System.Object Clone()
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
DisplayPostContext Property System.String[] DisplayPostContext {get;set;}
DisplayPreContext Property System.String[] DisplayPreContext {get;set;}
PostContext Property System.String[] PostContext {get;set;}
PreContext Property System.String[] PreContext {get;set;}

The PreContext property is an array of strings which represents multiple lines of PreContext. In this case we just have one, so we can access it via index 0.

Next, let's juice up our search filter. If we use regular expression groups in our search we can access the groups via the $matches object.

PS C:\> Select-String -Path report.txt -Pattern '\d{12} \d{2}/\d{2}/\d{4}' -Context 1,0

The $matches object will now contain an object representing the token and the date. The $matches object can contain an array of match objects if there is more than one group, but in our new search there is only one (index 0).

We now have all the pieces, all we have to do is put them together:

PS C:\> Select-String -Path report.txt -Pattern '\d{12} \d{2}/\d{2}/\d{4}' -Context 1,0 |
% { Write-Host $_.Context.PreContext[0] $_.Matches[0] }

Karim Abdul-Jabbar kajabbar 000403861445 02/12/2011
Larry Byrd lbyrd 000203863210 09/27/2010
LaBron James ljames 000303861288 02/15/2011
Satchel Page spage 000234203706 02/16/2011

We just use Write-Host to output the data, but of course we could send the output to a file with the Out-File cmdlet.

A quick side note: If you don't understand the Regular Expressions used here, I highly recommend you start reading about them as they are powerful and very useful. Even a basic understanding of them will help a lot when it comes to automating tasks and parsing text.