Tuesday, June 29, 2010

Episode #102: Size Does Matter

Hal is not ashamed to admit it:

So there I was once again reviewing Ed's highly useful Linux Intrusion Discovery Cheat Sheet and I was reminded of this little gem:

find / -size +10000k -print

The plus sign ("+") before the 10000k means "greater than", so this means "find all files whose size is greater than 10MB (10,000 kilobytes)". Why is this included as a way of spotting malicious activity on your systems? Consider that files larger than 10MB are just not that common in a typical Unix-like OS. Often the files that you turn up with this search will either be malicious-- packet sniffer traces, etc-- or indicators of compromise-- "warez" like images, video, or pirated software.

We could shave a few characters off of Ed's expression though. The most terse we could be is:

find / -size +10M

You don't need "-print" with modern find programs-- it's implicit if there are no other action statements. Also, in addition to "k" for kilobytes, the GNU find command supports "M" ("megabytes") and "G" ("gigabytes") as well as "c" or "b" for bytes and even "w" for two-byte words (not that useful). Actually these size suffixes may not be that portable across all versions of Unix, but "c" is common to the find commands I've used. So you can always write it like this:

find / -size +10000000c

Let's see what the other guys have got this week. I'm guessing that Ed probably has something up his sleeve on the Windows side of the fence.

Nor is Ed:
This is actually one of the original questions that inspired this blog. Justin Searle, a guy on our team at InGuardians, asked how to do this in Windows, and I sent him a response. I then tweeted the response, and Paul Asadoorian mapped it to Linux via a response tweet. Hal then completely trounced suggested some noteworthy improvements to Paul's work, and the blog was born.

Here is my approach:
C:\> FOR /R C:\ %i in (*) do @if %~zi gtr 10000000 echo %i %~zi
In this command, I'm using a FOR /R loop to recurse through a directory structure. I'm recursing through C:\ here, although you could put any directory in its place. I'm using an iterator variable of %i which my FOR loop will assign file names to. I'm doing this for a file set of (*), so I'm looking at any type of file. For each file, in my do clause, I turn off echo of commands (@) and then run an IF command. If %i is a my file name, Windows FOR loops give us some interesting capabilities to refer to various properties of that file. Here, I'm using %~zi, which is the file's length. Other properties we can grab include (from the output of FOR /?):
    %~I         - expands %I removing any surrounding quotes (")
%~fI - expands %I to a fully qualified path name
%~dI - expands %I to a drive letter only
%~pI - expands %I to a path only
%~nI - expands %I to a file name only
%~xI - expands %I to a file extension only
%~sI - expanded path contains short names only
%~aI - expands %I to file attributes of file
%~tI - expands %I to date/time of file
%~zI - expands %I to size of file
%~$PATH:I - searches the directories listed in the PATH
environment variable and expands %I to the
fully qualified name of the first one found.
If the environment variable name is not
defined or the file is not found by the
search, then this modifier expands to the
empty string

The modifiers can be combined to get compound results:

%~dpI - expands %I to a drive letter and path only
%~nxI - expands %I to a file name and extension only
%~fsI - expands %I to a full path name with short names only
%~dp$PATH:I - searches the directories listed in the PATH
environment variable for %I and expands to the
drive letter and path of the first one found.
%~ftzaI - expands %I to a DIR like output line

Wow! That's a lot of wonderful options we can use in the do clause of our FOR loops. Here, I'm just using my IF statement to see if the size (%~zi) is greater (GTR) than 10000000 (that's 10**7, but we have to list bytes). If it is, I echo out the file's name (%i ) and size (%~zi).

Now, we can't sort this output using built-in commands, because the Windows sort command only sorts alphanumerically, not numerically (so, for example, 1 comes before 10, which comes before 2, which comes before 20, which comes before 3, and so on). I usually just dump this kind of output into a .csv file (adding a comma in the above command between %i and %~zi, followed by a >> bigfiles.csv ) and open it in a spreadsheet for sorting.

Tim is:

This is pretty straight forward in PowerShell.

PS C:\> Get-ChildItem -Recurse -Force | Where-Object { $_.Length -ge 10000000 }
Get-ChildItem -Recurse is used to recurse through the directory tree. The -Force option is added to ensure hidden and system directories are searched. The Where-Object cmdlet is used to filter for files greater than or equal to 10^7 bytes.

That is a bit long, so let's shorten it up a bit:

PS C:\> ls -r -fo | ? { $_.Length -gt 10000000 }
In our short version, we replace Get-ChildItem with its most terse alias, ls. We also use the short version of each switch. However, we can't shorten the Force option to F since it would match both Force and Filter. Using FO disambiguates the parameter. We also replace Where-Object with its tiniest alias, the question mark.

That's about it, so go find yourself some big ones.