Tuesday, June 29, 2010

Episode #102: Size Does Matter

Hal is not ashamed to admit it:

So there I was once again reviewing Ed's highly useful Linux Intrusion Discovery Cheat Sheet and I was reminded of this little gem:

find / -size +10000k -print

The plus sign ("+") before the 10000k means "greater than", so this means "find all files whose size is greater than 10MB (10,000 kilobytes)". Why is this included as a way of spotting malicious activity on your systems? Consider that files larger than 10MB are just not that common in a typical Unix-like OS. Often the files that you turn up with this search will either be malicious-- packet sniffer traces, etc-- or indicators of compromise-- "warez" like images, video, or pirated software.

We could shave a few characters off of Ed's expression though. The most terse we could be is:

find / -size +10M

You don't need "-print" with modern find programs-- it's implicit if there are no other action statements. Also, in addition to "k" for kilobytes, the GNU find command supports "M" ("megabytes") and "G" ("gigabytes") as well as "c" or "b" for bytes and even "w" for two-byte words (not that useful). Actually these size suffixes may not be that portable across all versions of Unix, but "c" is common to the find commands I've used. So you can always write it like this:

find / -size +10000000c

Let's see what the other guys have got this week. I'm guessing that Ed probably has something up his sleeve on the Windows side of the fence.

Nor is Ed:
This is actually one of the original questions that inspired this blog. Justin Searle, a guy on our team at InGuardians, asked how to do this in Windows, and I sent him a response. I then tweeted the response, and Paul Asadoorian mapped it to Linux via a response tweet. Hal then completely trounced suggested some noteworthy improvements to Paul's work, and the blog was born.

Here is my approach:
C:\> FOR /R C:\ %i in (*) do @if %~zi gtr 10000000 echo %i %~zi
In this command, I'm using a FOR /R loop to recurse through a directory structure. I'm recursing through C:\ here, although you could put any directory in its place. I'm using an iterator variable of %i which my FOR loop will assign file names to. I'm doing this for a file set of (*), so I'm looking at any type of file. For each file, in my do clause, I turn off echo of commands (@) and then run an IF command. If %i is a my file name, Windows FOR loops give us some interesting capabilities to refer to various properties of that file. Here, I'm using %~zi, which is the file's length. Other properties we can grab include (from the output of FOR /?):
    %~I         - expands %I removing any surrounding quotes (")
%~fI - expands %I to a fully qualified path name
%~dI - expands %I to a drive letter only
%~pI - expands %I to a path only
%~nI - expands %I to a file name only
%~xI - expands %I to a file extension only
%~sI - expanded path contains short names only
%~aI - expands %I to file attributes of file
%~tI - expands %I to date/time of file
%~zI - expands %I to size of file
%~$PATH:I - searches the directories listed in the PATH
environment variable and expands %I to the
fully qualified name of the first one found.
If the environment variable name is not
defined or the file is not found by the
search, then this modifier expands to the
empty string

The modifiers can be combined to get compound results:

%~dpI - expands %I to a drive letter and path only
%~nxI - expands %I to a file name and extension only
%~fsI - expands %I to a full path name with short names only
%~dp$PATH:I - searches the directories listed in the PATH
environment variable for %I and expands to the
drive letter and path of the first one found.
%~ftzaI - expands %I to a DIR like output line

Wow! That's a lot of wonderful options we can use in the do clause of our FOR loops. Here, I'm just using my IF statement to see if the size (%~zi) is greater (GTR) than 10000000 (that's 10**7, but we have to list bytes). If it is, I echo out the file's name (%i ) and size (%~zi).

Now, we can't sort this output using built-in commands, because the Windows sort command only sorts alphanumerically, not numerically (so, for example, 1 comes before 10, which comes before 2, which comes before 20, which comes before 3, and so on). I usually just dump this kind of output into a .csv file (adding a comma in the above command between %i and %~zi, followed by a >> bigfiles.csv ) and open it in a spreadsheet for sorting.

Tim is:

This is pretty straight forward in PowerShell.

PS C:\> Get-ChildItem -Recurse -Force | Where-Object { $_.Length -ge 10000000 }
Get-ChildItem -Recurse is used to recurse through the directory tree. The -Force option is added to ensure hidden and system directories are searched. The Where-Object cmdlet is used to filter for files greater than or equal to 10^7 bytes.

That is a bit long, so let's shorten it up a bit:

PS C:\> ls -r -fo | ? { $_.Length -gt 10000000 }
In our short version, we replace Get-ChildItem with its most terse alias, ls. We also use the short version of each switch. However, we can't shorten the Force option to F since it would match both Force and Filter. Using FO disambiguates the parameter. We also replace Where-Object with its tiniest alias, the question mark.

That's about it, so go find yourself some big ones.

Tuesday, June 22, 2010

Episode #101: Third-Party Party

Ed Genially Sets Up:

Sometimes, doing things at the Windows cmd.exe shell can be really tough as we've seen several times recently here, here, and even here. And, don't even get me started about Episode #100! What a debacle.

But, on the other hand, sometimes this venerable lil' shell gives us some useful functionality, easily accessible. Yes, believe it or not, there are instances where some things are really straight-forward and fun on Windows, and are inscrutably ugly on Linux. I spend a considerable amount of time thinking about such things to throw Hal into a tizzy. Consider, if you will, the topic of gathering information about third-party products installed on a box.

On Windows, we can turn to our faithful friend, wmic:

C:\> wmic product list brief
Caption IdentifyingNumber Name
Vendor Version
Microsoft .NET Framework 3.5 SP1 {CE2CDD62-0124-36CA-84D3-9F4DCF5C5BD9} Micros
oft .NET Framework 3.5 SP1 Microsoft Corporation 3.5.30729
VMware Tools {FE2F6A2C-196E-4210-9C04-2B1BC21F07EF} VMware
Tools VMware, Inc. 8.2.4.7509

Your output will include the product's name, Vendor, and version number, along with a unique identifying number for each product. That version number is typically quite explicit, giving you an item you can research to see if the given program is vulnerable.

On some systems, it can take a while for this command to run. But, when it finishes, you should have a nice list of most of the third-party products installed on the machine. Note that this list includes programs installed using a standard Windows installer. If someone simply copies a program somewhere, it won't find that.

We can get more detail, including the InstallDate and InstallLocation by running:
C:\> wmic product list full

Description=Microsoft .NET Framework 3.5 SP1
IdentifyingNumber={CE2CDD62-0124-36CA-84D3-9F4DCF5C5BD9}
InstallDate=20091203
InstallLocation=
InstallState=5
Name=Microsoft .NET Framework 3.5 SP1
PackageCache=c:\Windows\Installer\52814a.msi
SKUNumber=
Vendor=Microsoft Corporation
Version=3.5.30729


Description=VMware Tools
IdentifyingNumber={FE2F6A2C-196E-4210-9C04-2B1BC21F07EF}
InstallDate=20100404
InstallLocation=
InstallState=5
Name=VMware Tools
PackageCache=C:\Windows\Installer\2ed60.msi
SKUNumber=
Vendor=VMware, Inc.
Version=8.2.4.7509
Want even prettier output? You can get it in HTML format in a file called C:\products.htm by running:
C:\> wmic /output:c:\products.htm product list full /format:hform.xsl
Open that sucker in a browser and behold the results:
C:\> products.htm
Or, if you prefer a csv file, you could run:
C:\> wmic /output:c:\products.csv product list full /format:csv.xsl
Now that I've set this up... let's see what my beloved fellow bloggers have up their command-line sleeves.

Update: Reader John Allison writes in:

Just wanted to let you know that Windows 7 has changed how the WMIC “/format” switch works. You can no longer append “.xsl” when specifying the CSV or HFORM format. If you do you get an error “Invalid XSL format (or) file name.” If you just use the keywords CSV or HFORM it works fine. Thanks for all the hard work you put in on this blog and keep it up.

Good stuff, John! Thank you for that.

Uptade: Reader Rickard Uddenberg writes in:

To use the /format switch you have to move the localisation-files to correct forlder. They are in the C:\Windows\System32\wbem\en-US, but since i have sv-SE i have to create that folder and copy all files to it... (or at least the XML-files). With out that copy it doesn't work, and you get the infamous "Invalid XSL format (or) file name.".

Tim rolls up his sleeves:

Sometimes I feel bad for Ed. Episode 99 was brutal and I can't even begin to describe how I would have done the cmd portion in Episode 100. Fortunately, it was pretty easy for him this week, and even better (for me) since my portion is just a rip off of Ed's fu.

As we have previously discussed, the PowerShell equivalent of wmic is Get-WmiObject (alias gwmi). Most wmic classes require a win32_ prefix in PowerShell. Now that we have that knowledge, let's rip off Ed's portion do some PowerShell.

PS C:\> Get-WmiObject win32_product

IdentifyingNumber : {DBBC72B2-2442-4B8B-9D70-2ED6C0322916}
Name : VMware vSphere PowerCLI
Vendor : VMware, Inc.
Version : 4.0.1.2164
Caption : VMware vSphere PowerCLI
We can get even more detail by piping the output into Format-List *. If we splat the Format-List cmdlet we get every property, not just the common ones, but the output is huge. This is one reason why I like to export the results into CSV.

PS C:\> gwmi win32_product | Export-Csv product.csv
There are two benefits to exporting to CSV. First, we can use something like Excel to create a pretty report. Second, we can easily import the results to quickly manipulate the objects. Here is how we do the import:

PS C:\> Import-Csv product.csv | select Name
The really cool thing is that we can treat the imported objects the same as the original command output. In my tests it was nearly 100 times faster to import the CSV (0.13 seconds) than to re-run the Get-WmiObject command (10.3 seconds). This is handy if you want to filter in different ways or hand off your results for someone else to work with.

Ed exported a few different ways, and we can do the same thing. The two most common are HTML and XML. The output of ConvertTo-Xml and ConvertTo-Html writes the output to standard output. If you want to save the output as a file you also need to use Out-File or output redirection (>).

PS C:\> gwmi win32_product | ConvertTo-Html | products.html
PS C:\> gwmi win32_product | ConvertTo-Xml > products.xml
So Ed and I have had a pretty easy week. This is pretty hard for Hal, but I don't have much pity for Hal, he makes our lives miserable.

And Hal has to take out the garbage:

Ed just wouldn't let this one alone, so I figured I'd give him his tawdry little moment of glory. Yep, managing third-party software on different Unix variants ranges from the trivial to the impossible. Let's survey the carnage, shall we?

Debian systems have a very active group of package maintainers and nearly all of the Open Source third-party packages you would want to use are included in the Debian repositories. That means that on Debian systems you can mostly rely on the built-in package management tools to tell you about third-party software. As we've covered in Episode #90 you can use "dpkg --list" to get information about all installed packages on the system.

In general, Red Hat systems have relatively less packages available in the core OS repositories. But Red Hat admins should really get to know Dag Wieers' RPM repository, which as of this writing contains over 95,000 packages in RPM format. Using the repository couldn't be simpler, since the necessary configuration files are themselves incorporated into an easy to install RPM:

rpm -Uhv http://apt.sw.be/redhat/el5/en/x86_64/rpmforge/RPMS/rpmforge-release-0.3.6-1.el5.rf.x86_64.rpm

The above command installs the necessary configuration files for obtaining x86_64 architecture RPMs for RHEL 5.x releases. Similar commands for other processor architectures and OS revisions can be found on Dag's installation and configuration page. Once you've got the appropriate configuration files installed, then you can "yum install" extra third-party packages to your heart's content.

But the question posed by this week's Episode is how to inventory those packages once you've installed them. I haven't found a way for yum to directly tell me what packages have been installed from a specific repo, but I have come up with the following hack:

# yum --disablerepo=rpmforge list extras
Loaded plugins: fastestmirror, security
Loading mirror speeds from cached hostfile
* addons: mirror.5ninesolutions.com
* base: mirrors.usc.edu
* extras: mirrors.cat.pdx.edu
* updates: mirrors.usc.edu
Extra Packages
VMware-server.x86_64 2.0.2-203138 installed
libpfm.x86_64 3.2-0.060926.4.el5 installed
mpack.x86_64 1.6-2.el5.rf installed
pfmon.x86_64 3.2-0.060926.5.el5 installed
rpmforge-release.x86_64 0.3.6-1.el5.rf installed

Here I'm explicitly telling yum to ignore the Dag's rpmforge repo and then asking it to "list extras"-- meaning list all packages that cannot be associated with any of the remaining repos that we are using. This does get me a listing of several packages I've installed from Dag's site, but it also lists other packages like VMware-server that I've downloaded and installed directly from the vendor site.

BSD systems generally use some variant of the ports system for handling Open Source third-party software. "pkg_info -a" will give you tons of information about installed packages-- exactly how much info depends on which BSD variant you're using. FreeBSD gives you lots of information because a typical base install includes dozens if not hundreds of packages out of the ports tree. At the opposite end of the spectrum, OpenBSD handles the core install separately from the ports mechanism, so "pkg_info -a" only lists packages that you've explicitly loaded from the ports.

So far I've only really been talking about Open Source third-party packages. The problem with commercial software is that there's no standard for how such software ends up being installed. Some vendors might give you their software in your system's native package management format-- e.g., the VMware example we saw above. But other vendors might use a proprietary installation framework or even require you to build their package from source. In these cases, there really isn't a good way to tell whether a given software package is installed other than knowing the standard installation directory and manually auditing your systems.

Similarly, if you download software from the 'net in source code form and install it on your systems, then you won't be able to track that software using your built-in package management system. If you have a software package that's only available in source code form and which you need to use widely in your environment, you might consider making your own installation package in whatever format your system(s) use natively. While it's extra work to create such a package, it does make distributing the software around your network a lot simpler and pays long-term dividends when it comes to tracking and upgrading the software.

Seth (Our Go-to-Mac Guy) adds:
So the nice thing about having a Mac is that you can get all of the power of 'nix without all the chaos of something that isn't centrally managed; it's similar to RedHat in that regard. With its BSD underbelly, the Mac has similar commands to gather data about installed software as well as having some Apple-provided tools.

For the rest of Seth's insights on Mac 3rd Party command line, check out his posting here.

Tuesday, June 15, 2010

Episode #99: The .needle in the /haystack

Tim is on the road:

This week I'm at the SANS Penetration Testing & Vulnerability Assessment Summit hanging out with Ed. And no, I don't get any money for saying that. Although, Ed did give me some money to stay away from him. Come to think of it, Hal did the same thing before. It must be that they just can't stand being next to the most handsome one of the trio, and it has nothing to do with my love of onions, garlic, and German Brick Cheese*.

Back in the regular world, I had a bunch of files to review and search, but I didn't have any idea what types of files were in the mix. I whipped up a quick some PowerShell to give me a quick overview of the file types in the directory tree. Once I knew what type of files I'm was dealing with, I was better able to pick the tool to review the documents. Here is the command:

PS C:\> ls mydir -Recurse | ? { -not $_.PSIsContainer } | group Extension -NoElement | sort count -desc

Count Name
----- ----
145 .pdf
19 .rtf
16 .doc
7 .xml
7 .docx
4 .xls
1
1 .xlsx
We start off by getting a recursive directory listing. The Where-Object cmdlet (alias ?) is used to remove directories from the listing. The PSIsContainer is a good way to differentiate files from folders, directories are containers and files aren't. Next, we then use Group-Object (alias group) to group based on file extension. The NoElement switch tells the Group-Object cmdlet not to include in the output the collection of all the file objects. Finally, we sort, in descending order, based on the count in each group. By the way, any parameter or switch name can be shortened as long as it is not ambiguous. We could use Des, but not D or De since it would match Descending and Debug.

I have to say, I have a bit of envy for the Linux "file" command. Although, since Windows relies so heavily on the file extension it typically works well unless someone is trying to hide something.

Let's see what Ed and Hal have cooking?

*Warning: Never try German Brick Cheese, it tastes like sewage smells. It makes Limburger smell like flowers. Seriously, don't try it. I bought some in college as a joke and we could smell it through the Ziploc back in the fridge. Bleh! Oh, and sorry to Eric and Kevin for tricking you into trying it.

Ed's On the Road Too
So, like, when Tim initially proposed this article, he was all like, “Yeah, just count the number of files of a given file suffix on a partition. This will be hard for Ed.” And, I was like, “Uh… Dude… as if. I mean, just totally do this:

C:\> dir /b /s C:\*.ini | find /c /v “”

And you’ll have the total number of ini files. Lather, rinse, and repeat for any other suffix, ya know.”

Tim responded, “Oh yeah. Never mind. I’ll write my part first.”

AND THEN… my esteemed colleague unfurls something that automatically figures out which extensions are on the partition and creates a beautiful summary of their counts. I’m not saying that he set me up. But, well, come to think of it, I think he set me up. It’s a good thing that I’m not very busy this week, or else that would have been a problem.

Well, Mr. Medin, if that is your real name, my German-brick-cheese-eating friend, put this in your cmd.exe pipe and smoke it:
C:\> cmd.exe /v:on /c "set directory=c:\windows\system32& (for /f "delims=" %i in 
('dir /a-D /L /s /b !directory!') do @set name=%i& echo !name:~-4,4! >>
c:\suffix.txt) & sort c:\suffix.txt > c:\sortsuf.txt & (set previous= & for /f
%j in (c:\sortsuf.txt) do @set current=%j& if NOT !current!==!previous! (echo
%j >> c:\uniqsuf.txt) & set previous=!current!) & for /f %k in (c:\uniqsuf.txt)
do @echo %k: & dir /b /s !directory!\*%k | find /c /v """ & del c:\suffix.txt
c:\sortsuf.txt c:\uniqsuf.txt
.acm:
10
.acs:
1
.bak:
2
.bat:
1
.bin:
4
.bmp:
1
.btr:
1
.bud:
2
---SNIP---
To make this whole shebang go, all ya have to do is put the appropriate directory in the “set directory =” part. Note that there is no space after the directory name and the &, which is important. When I first showed that command to Tim, he responded, “That, is art. Not-so-much Rembrandt as Salvador Dali.” You know, I’ve been considering growing one of those Dali-style mustaches.

As for the command itself, I think this is all pretty self-explanatory. Right? I mean, it just kinds rolls of the fingers and is the obvious way to do this. Easy as pie.

Well, if you insist, I’ll give you a synopsis of what’s going on here followed by the details. My command can be broken into three phases, along with a preamble up front and a clean-up action at the end. First, I isolate the last four characters of each file name (which should be the suffix, letting me catch stuff like “.xls” and “xlsx”), storing the result in a file called c:\suffix.txt. In the second phase, I jump into my uniquifier (virtually identical to the uniq command I implemented in Episode #91) which sorts the c:\suffix.txt file and plucks out the unique entries. And, thirdly, I then go through each of my unique suffixes and count the number of files that have each given suffix. There is a bit of a down side. If a file doesn’t have a three-or-four-character suffix, my command will give a “File Not Found” message, but that’s not so bad.

Those are the highlights of what I’ve wrought. Let’s jump into the details. In my preamble, I start by invoking delayed variable expansion (cmd.exe /v:on /c), because I’m gonna have a metric spit-ton of variables whose value will have to float as my command runs. Next, I set a variable called directory to the directory we’re going to count in. I didn’t have to do it this way, but without it, our dear user would have to type in the directory name a couple of different places. We care about human factors and ease of use here at CommandLineKungFuBlog. The directory name is immediately followed by an &, without a space. That way, it won’t pick up an extra space at the end in the variable value itself.

With that preliminary planning done, I move into Phase 1. I have a FOR /F loop, with default parsing on spaces and tabs turned off (“delims=”) and an iterator variable of %i. I’m iterating over the output of a dir command, with options set to not show directories (/a-D), with all file names in lower case (/L). I’ve gotta use the lowercase option here, or else we’d have separate counts for .doc, .DOC, .Doc, and so on. I want to recurse subdirectories (/s) and have the bare form of output (/b) so that I just get full file paths. And, of course, I want to do all of this for !directory!, using the !’s instead of %’s for the variable because I want the delayed expanded value of that sucker. In the body of my FOR loop, for each file, I take its name out of the iterator variable (%i) and stick it into the variable “name” so I can do substring operations on it (you can’t do substring operations on iterator variables, so you have to slide them into a regular variable). I then drop the last four characters of the name (!name:~-4,4!, which is a substring specifying an offset into the string of -4, for a substring of 4 characters long) into a temporary file called c:\suffix.txt. I’ve not snagged all of my file suffixes.

In Phase 2, I make a list of unique suffixes, again using the technique I described in detail in Episode 91. I start by sorting my suffix list into a separate temporary file (sort c:\suffix.txt > c:\sortsuf.txt). I then create a variable called previous, which I set to a space just to get started (set previous= ). I then have a FOR /F loop, which iterates over my sorted suffixes using an iterator variable of %j: “for /f %j in (c:\sortsuf.txt)”. In the body of my do loop, I store the current suffix (%j) in a value called current so I can do compares against it. You can’t do compares of iterator variable values, so I’ve gotta tuck %j into the “current” variable. Using an IF statement, I then check to see if my current value is NOT equal to my previous value (if NOT !current!==!previous!). If it isn’t, it means this suffix is unique, so I drop it into a third temporary file, called C:\uniqsuf.txt). I then set my new previous to my current value (set previous=!current!), and iterate. Phase 2 is now done, and I have a list of unique file suffixes.

Finally, in Phase 3, I simply invoke my third FOR /F loop, with an iterator variable of %k, iterating over the contents of my uniqsuf.txt file. For each unique suffix, the do clause of this loop first echo’s the suffix name followed by a colon (echo %k: ). Then, I run something very similar to my original plan for this episode. It’s a dir /b /s command to get a bare form of output (1 line per file), recursing subdirectories, looking for files with the name of *%k. I pipe that output into a little line counter I’ve used in tons of episodes (find /c /v “”), which counts (/c) the number of lines that do not have (/v) nothing (“”). The number of lines that do not have nothing is the number of lines. The output of the find command is displayed on the screen.

After this finishes, I’ve got some clean-up to do. I use the del command to remove the three temporary files I’ve created (c:\suffix.txt, c:\sortsuf.txt, and c:\uniqsuf.txt). And, voila! I’m done.

See, I told you it was straight forward!

For once Hal isn't on the road

While Tim and Ed are whooping it up in Baltimore, I'm relaxing here in the Fortress of Solitude. They're killing brain cells partying it up with all the hot InfoSec pros, while I'm curled up with my Unix command-line to keep me company. No sir, I sure don't envy them one bit.

Since Tim mentions the file command, I suppose I better discuss why I didn't use it for this week's challenge. The problem with file for this case is that the program is almost too smart:

$ file index.html 01_before_pass.JPG Changelog.xls For508.3_4.*
index.html: HTML document text
01_before_pass.JPG: JPEG image data, JFIF standard 1.01
Changelog.xls: CDF V2 Document, Little Endian, Os: Windows, Version 6.0, Code page: 1252,
Author: Kimie Reuarin, Last Saved By: Robin, Name of Creating Application: Microsoft Excel,
Last Printed: Wed Aug 13 21:22:28 2003, Create Time/Date: Mon Aug 11 00:16:07 2003,
Last Saved Time/Date: Wed Jan 6 00:27:30 2010, Security: 0
For508.3_4.pptx: Zip archive data, at least v2.0 to extract
For508.3_4.pdf: PDF document, version 1.6

The output of file gives me a tremendous amount of information about each type of file. However, the output is so irregular that it would be difficult to sort all of the similar file types together.

So I'm going with file extensions, just like the Windows guys. First, you can easily look for a specific extension just by using find:

$ find ~/Documents -type f -name \*.jpg
/home/hal/Documents/My Pictures/IMAGE_00004.jpg
/home/hal/Documents/My Pictures/kathy-web-cropped.jpg
/home/hal/Documents/My Pictures/hal-headshot.jpg
[...]

Here I'm finding all regular files ("-type f") whose name matches "*.jpg". Since the "*" is a special character, it needs to be backwhacked to protect it from being interpolated by the shell (I could have used quotes here instead if I had wanted).

Of course, some of my JPEG files might be named ".jpeg" or even ".JPG" or ".JPEG", so perhaps some egrep is in order:

$ find ~/Documents -type f | egrep -i '\.jpe?g$'
[...]

But the real challenge here is to enumerate all of the file extensions under a given directory. I'm able to extract the extensions using a little sed fu:

$ find ~/Documents -type f | sed 's/.*\.\([^\/]*\)$/\1/'
pdf
pdf
pdf
doc
gif
db
[...]
/home/hal/Documents/Manuals/aaa14612
[...]

The first part of the sed regex, ".*\.", matches everything up to the last dot in the pathname because the "*" operator is "greedy" and will consume as many characters as possible while still allowing the regex to match. Then the remainder, "\([^\/.]*\)$", matches all non-slash characters up to the end of the line. I specifically wrote the expression this way so I wouldn't match things like "/fee/fie/fo.fum/filename". We use the sed substitution operator here ("s/.../\1/") to replace the file name we get as input with the extension that we matched in the "\(...\)" grouping operator.

The only problem is that some of my files don't have extensions or any dot at all in the file name. In this case, the regex doesn't match and the substitution doesn't happen. So you just get the full, unaltered file path as output as you see above. So what I'm going to do is add another sed expression that simply changes any file names containing "/" to just be "other":

$ find ~/Documents -type f | sed 's/.*\.\([^\/]*\)$/\1/; s/.*\/.*/other/'
pdf
pdf
pdf
doc
gif
db
[...]
other
[...]

At this point, getting the summary by file extension is just a matter of a little sort and uniq action:

$ find ~/Documents -type f | sed 's/.*\.\([^\/]*\)$/\1/; s/.*\/.*/other/' | \
sort | uniq -c | sort -nr

1156 jpg
877 ppt
629 doc
315 html
213 other
[...]
56 JPG
[...]
6 html~
[...]
1 html?openidserver=1
[...]

Here I'm using the first sort to group all the extensions togther, then counting them with "uniq -c", and finally doing a reverse numeric sort of the counts ("sort -nr") to get a nice listing.

As you can see, however, there are a few problems in the output. First, I'm counting "jpg" and "JPG" files separately, when they should probably be counted as the same. Also, there are some files extensions with funny trailing characters that should probably be filtered off. The fix for the first problem is to just use tr to fold everything to lowercase before processing. Fixing the second problem can be done by adjusting our first sed expression a bit:

$ find ~/Documents -type f | tr A-Z a-z | \
sed 's/.*\.\([a-z0-9]*\)[^\/]*$/\1/; s/.*\/.*/other/' | \
sort | uniq -c | sort -nr

1212 jpg
878 ppt
631 doc
322 html
213 other
[...]

Now inside of the "\(...\)" grouping in my sed expression I'm explicitly only matching alphanumeric characters (I only have to match lower-case letters here because tr has already shifted all the upper-case characters to lower-case). Everything else after the alphanumeric characters just gets thrown away. Note that when I'm matching "everything else", I'm still being careful to only match non-slash characters.

I realize the sed expressions end up looking pretty gnarly here. But it's really not that difficult if you build them up in pieces. Other than that, the solution is nice and straightforward, and uses idioms that we've seen in plenty of other Episodes.

For those of you who don't like all the sed-ness here, loyal reader Jeff Haemer suggests the following alternate solution:

$ find ~/Documents -type f | while read f; do echo ${f##*/*.}; done | grep -v / | 
sort | uniq -c | sort -nr

1156 jpg
877 ppt
629 doc
321 html
147 pdf
[...]

The trick here is the "${f##*/*.}" construct, which strips the matching shell glob out of the value of the variable "$f". The "##" in the middle of the expression means "match as much as possible", so that basically emulates the greedy "maximal matching" behavior that we were relying on in our sed example.

You'll notice that Jeff's example doesn't do the fancy mapping to "other" for files that don't have an extension. Here he's just using "grep -v" to filter out any pathnames that end up still having a slash in them. We could use a little sed to fix that up:

$ find ~/Documents -type f | while read f; do echo ${f##*/*.}; done | 
sed 's/.*\/.*/other/' | sort | uniq -c | sort -nr

1156 jpg
877 ppt
629 doc
321 html
214 other
[...]

Jeff's code also doesn't deal with the "funny trailing characters" issue, but that's not a huge deal here. Nice work, Jeff!

Tuesday, June 8, 2010

Episode #98: Format This!

Hal is busy

Lately I've found myself having to make lots of file systems. This is mostly due to forensic work, where I'm either sanitizing hard drives and rebuilding file systems on them or I'm creating test file systems for research. Either way, I'm spending lots of time fiddling with file systems at the command line.

Way back in Episode 32 we talked about how to use dd to overwrite a disk device with zeroes:

# dd if=/dev/zero of=/dev/sdd bs=1M
dd: writing `/dev/sdd': No space left on device
992+0 records in
991+0 records out
1039663104 bytes (1.0 GB) copied, 299.834 s, 3.5 MB/s

Of course, this leaves you with an invalid partition table. Happily, the GNU parted utility makes short work of creating a new MS-DOS style disk label and adding a partition:

# parted /dev/sdd print
Error: /dev/sdd: unrecognised disk label
# parted /dev/sdd mklabel msdos
Information: You may need to update /etc/fstab.

# parted /dev/sdd mkpart primary 1 1G
Information: You may need to update /etc/fstab.

# parted /dev/sdd print
Model: LEXAR JUMPDRIVE SPORT (scsi)
Disk /dev/sdd: 1040MB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number Start End Size Type File system Flags
1 32.3kB 1036MB 1036MB primary

At this point we need to create a file system in our new partition. You actually can use parted to create file systems, but even the parted manual page suggests that you use an external program instead. In Linux, this would be mkfs, which allows you to choose between several different kinds of file systems.

Since this is a small USB key, you might want to just create a FAT file system on it to make it easy to share files between your Linux box and other, less flexible operating systems:

# mkfs -t vfat -F 32 /dev/sdd1

We're using the FAT-specific "-F" option to specify the FAT cluster address size-- here we're creating a FAT32 file system. For each file system type, mkfs has a number of special options specific to that file system. You'll need to read the appropriate manual page to see them all: "man mkfs.vfat" in this case.

If I didn't want my co-authors to be able to easily see the files on this USB stick, I could create an EXT file system instead:

# mkfs -t ext2 /dev/sdd1
mke2fs 1.41.9 (22-Aug-2009)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
63360 inodes, 253015 blocks
12650 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=260046848
8 block groups
32768 blocks per group, 32768 fragments per group
7920 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376

Writing inode tables: done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 30 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.

Here I'm creating an "ext2" file system because I didn't want to waste space on a file system journal, but you of course have the option of creating "ext3" and even "ext4" file systems if you want.

If you want to make NTFS file systems, you may have to download an additional package. For example, on my Ubuntu laptop I had to "sudo apt-get ntfsprogs". Once that's done, making NTFS volumes is a snap:

# mkfs -t ntfs -Q /dev/sdd1
Cluster size has been automatically set to 4096 bytes.
Creating NTFS volume structures.
mkntfs completed successfully. Have a nice day.

When creating NTFS volumes, you definitely want to use the "-Q" (quick) option. If you leave off the "-Q" then the mkfs.ntfs program overwrites the device with zeroes and performs a bad block check before creating your file system. This takes a really long time, particularly on large drives, and is also unnecessary in this case since we previously overwrote the drive with zeroes using dd.

It's interesting to note that you don't actually have to have a physical disk device to test file systems. mkfs will (grudgingly) create file systems on non-device files:

# dd if=/dev/zero of=testfs bs=1M count=4096
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 69.6688 s, 61.6 MB/s
# mkfs -t ntfs -Q -F testfs
testfs is not a block device.
mkntfs forced anyway.
[...]
# mount -o loop,show_sys_files testfs /mnt/test
# ls /mnt/test
$AttrDef $Bitmap $Extend $MFTMirr $UpCase
$BadClus $Boot $LogFile $Secure $Volume

Here I'm first using dd to make a file called "testfs" that contains 4GB of zeroes. Then I call mkfs on the file, using the "-F" (force) option so that it won't exit with an error when I tell it to operate on a non-device file. Though the command whines a lot, it does finally produce a working NTFS file system that can be mounted using a loopback mount.

Of course I can create EXT and FAT file systems in a similar fashion. However, the "-F" option for mkfs.vfat is used to specify the cluster address size. It turns out that you don't need a "force" option when making FAT file systems in non-device files-- the mkfs.vfat will create file systems without complaint regardless of the type of file it is pointed at. For EXT file systems, you can use "-F" if you want. However, if you leave the option off, you'll get a "are you sure?" prompt when running the command against a non-device file (as opposed to mkfs.ntfs which simply bombs out with an error). They say that "the wonderful thing about standards is that there are so many to choose from", but I really wish Linux could rationalize the various mkfs command-line interfaces a bit more.

In any event, being able to create file systems in raw disk files is a real boon when you want to test file system behavior without actually having to commandeer a physical disk drive from someplace. But I think I'd better stop there-- I'm already feeling the hatred and jealousy emanating from my Windows brethren. Let's see what Tim can cook up this week.

Tim was relaxing this weekend for his birthday

This week's episode is pretty easy, but only because there aren't a lot of options. Besides, why would you want to create raw disk files or test file system behavior without searching for a physical disk, connectors, power, ...

No, I'm not jealous. I have everything I need. I don't need all those options. Windows is good enough, smart enough, and doggone it, people like it!

The "streamlined" command in Windows is the good ol' Format command.

C:\> format d:

WARNING, ALL DATA ON NON-REMOVABLE DISK
DRIVE D: WILL BE LOST!
Proceed with Format (Y/N)? y
In Vista and later, the format command writes zeros to the entire disk when a full format is performed. In XP and earlier, the format command does not zero the disk. To zero the disk with XP you have to use the diskpart utility.

C:\> diskpart

Microsoft DiskPart version 6.1.7600
Copyright (C) 1999-2008 Microsoft Corporation.
On computer: MYMACHINE

DISKPART> list disk

Disk ### Status Size Free Dyn Gpt
-------- ------------- ------- ------- --- ---
Disk 0 Online 149 GB 0 B
Disk 1 Online 149 GB 0 B

DISKPART> select disk 1

Disk 1 is now the selected disk.

DISKPART> clean all
The clean all command within diskpart zeros the entire disk. One benefit of using clean all is that it actually zeros the disk and doesn't create the MFT. We usually want one though, so Format will suffice.

Format can be used to specify the file system too. We don't have all the options hassles of lots of choices such as EXT. If a file system isn't specified, the Format command uses the volume type to determine the default format for the disk. To explicitly specify the file system use the FS option.

C:\> format e: /FS:NTFS
C:\> format f: /FS:FAT32
Besides the size restriction, one of the biggest problems with the FAT file system is that it provides no security features. If a user has access to the disk then they have full access to the disk, i.e. there is no way to give a user read access and deny write access to a directory. NTFS allows finer control of ACLs, or even ACLs at all.

So how do we convert a FAT drive to NTFS? But of course, by using the convert command:

C:\> convert f: /FS:NTFS
The FS switch is required even though the only option is NTFS.

That's about it. Not a lot here these week, and no PowerShell either. There aren't any new cmdlets in PowerShell that provide any additional functionality.

Tuesday, June 1, 2010

Episode #97: Make me a Sandwich

Tim

One of the best ways to protect your computer is to run with lower permissions (not root or admin). Lots of security problems can be mitigated with this principle. While running as a regular human, we sometimes need to call upon great power to do some tasks. But remember, with great power, comes great responsibility.

So how do we call upon the super human powers in PowerShell? The command looks like this:

PS C:\> Start-Process "$psHome\powershell.exe" -Verb Runas
-ArgumentList '-command "command to execute"'


For example, I want to start Terminal Services:

PS C:\> Start-Process "$psHome\powershell.exe" -Verb Runas
-ArgumentList '-command "Start-Service TermService"'


Or Stop the service:

PS C:\> Start-Process "$psHome\powershell.exe" -Verb Runas
-ArgumentList '-command "Stop-Service TermService -force"'


The "-Verb Runas" means that the command should be run as the administrator. The Argument List parameter takes the commands to be passed to the elevated instance of PowerShell.

There are a few problems with this. First, depending on the settings in Vista or Windows 7, you will get the UAC popup. Second, this actually creates a new instance of PowerShell with new environment variables and a different working directory. Third, errors are pretty much impossible to read since the session is destroyed upon completion of the command. Forth, the command is pretty long and goofy. Fifth, it isn't so easy to say:

Start-Process "$psHome\powershell.exe" -Verb Runas
-ArgumentList '-command "Make me a Sandwich"'


Here is a simplified function that can be used to call elevated commands. I'd suggest adding it to your profile so it is ready when you need it.

function sudo
{
param( [string]$arguments = $args )
$psi = new-object System.Diagnostics.ProcessStartInfo "$psHome\powershell.exe"
$psi.Arguments = $arguments
$psi.Verb = "runas"
[System.Diagnostics.Process]::Start($psi)
}


Commands can be called like this:

PS C:\> sudo Stop-Service TermService -force


It does have the same limitations as above, execept it is easier to type.

Oh well, in true Windows form, we can run the entire shell as admin via the GUI (meh) or by using Ed's method.

Ed
So, Tim started his section by just saying "Tim". I guess we're into minimalism this week, so I'll just start with "Ed".

You know, this whole topic of elevated command line access comes up in my SANS classes a lot. It doesn't plague students who show up with Windows XP. But, those people who arrive in class with Windows Vista or Windows 7 are sometimes surprised by it. They logon to their laptop GUI as a user in the administrator's group, and then invoke a cmd.exe. Then, they try to run certain commands that alter the operating system configuration, and they get an "Access is denied" message. I'm not talking about UAC here, that delightful little dialog box Windows displays any time you want to do something interesting. I'm talking about not having the privileges to do what you want. Consider this nice little message from my Win 7 box, using the service controller command to try to stop the Windows Search service associated with indexing:
C:\> sc stop wsearch
[SC] OpenService FAILED 5:

Access is denied.
After this occurs in my class, a hand usually goes up, and I get asked a question about why it doesn't work. "You don't have elevated privileges," I respond. "But, I logged in as admin," they often shoot back. "Ahhh, but Microsoft is trying to protect you from yourself. They seem to think that the big, bad, scary command line is just too powerful for someone to use with admin privs unless they explicitly ask for such privs. So, when you logon with an admin account, and launch a cmd.exe, you don't have full admin privs in the resulting command prompt. You need to launch an elevated command prompt."

I then show them how to launch one at the GUI. Simply go to your Windows icon on your tool tray (still called the "Start" menu, but since it doesn't say "Start" anymore, I don't personally call it that). Click on it and do a search for cmd.exe. When you see the icon for cmd.exe pop up, right click on it and select "Run as administrator". Alternatively, you can point your mouse to hover over the cmd.exe and hit CTRL-SHIFT-ENTER to launch it with elevated privileges. I prefer the right-click action myself, because it just feels kinda weird to hover my mouse over something and then hit CTRL-SHIFT-ENTER. When your cmd.exe launches, its title bar will say "Administrator: cmd.exe", giving you a reminder that you have an elevated prompt.

If you find yourself frequently needing an elevated command prompt, you can create a shortcut to cmd.exe and place it on your desktop. Right click on your shortcut, go to Properties, click the "Shortcut" tab, and click "Advanced". Check the "Run as administrator" box. You may want to name your shortcut ElevatedCmd.exe or something to remind you about its use.

Well, this is all well and good, but how do you do launch an elevated command shell at, you know, the command line? Well, for that, we rely on the good old runas command. At a non-elevated prompt, you could simply run:

C:\> runas /u:administrator <command>

When prompted, type in the administrator's password, and you are good to go.

That command can be whatever you'd like, such as the "sc stop wsearch" or "sc start wsearch". Or, you can even launch another, elevated cmd.exe with:

C:\> runas /u:administrator cmd.exe

You know, all this musing about runas and (as Hal is certain to point out) sudo reminds me of a fun conversation I had with fellow InGuardians dude Tom Liston a few years ago. I told him that I was creating a new Windows command called "don't runas". It would take whatever command you specify, and not run it. But, in not running it, it wouldn't just do nothing... it would literally do nothing. It would actually run a bunch of nops, with the privileges of the user you specify. Tom then said we could do a Linux equivalent, called "sudont". Then, in a fit of creativity, we thought about offering a cloud-based Application-as-a-Service version of this, which would allow a user to kick off a dontrunas or sudont on their machine, and it would be submitted via an empty SOAP request to a bunch of servers on the Internet that would do nothing very quickly and in parallel, sending a response back with nothing in it. We decided that we could actually charge for such a service, making big money from users for doing nothing for them. But, then, we realized that we might get sued by various Certificate Authorities for infringing on their business models, so we never really implemented our idea. Which, come to think about it, actually makes sense. We were going to make a dontrunas and sudont command, but we just never got around to doing anything with it.

Hal (just keeping with the theme here, people)

Hey Ed, doesn't the US Congress have the patent on the "doing nothing in parallel" idea? You'd better watch out there. Oh wait, you said "doing nothing quickly and in parallel". I guess you're OK after all.

I'd like to thank my co-authors for serving up another easy one for me this week, as I'm currently in transit between one conference in the next. Of course the command to run a single command with superuser privileges is the venerable sudo command:

$ sudo grep ^hal: /etc/shadow
[sudo] password for hal: <not echoed>
hal:LIKEIMREALLYGOINGTOSHOWYOUMYPASSWORD.YOUMUSTBECRAZY.:14579:0:99999:7:::

sudo prompts the user for their own password. Assuming the system administrator has granted the user sudo access to the command the user is trying to execute, the command will run with elevated privileges.

Of course, those "elevated privileges" need not be root. With the "-u" option, you can specify another user to run your command as:

$ sudo -u mysql ls /var/lib/mysql/mysql
columns_priv.frm help_relation.MYI time_zone_leap_second.frm
[...]
help_keyword.MYI tables_priv.MYD user.frm
help_relation.frm tables_priv.MYI user.MYD
help_relation.MYD time_zone.frm user.MYI

Why wasn't I prompted for my password this time? sudo "remembers" that you typed your password recently and doesn't prompt you again as long as you keep using sudo within a relatively small interval. The default is 5 minutes, but you can customize this in the /etc/sudoers configuration file.

Anyway, I normally try to force my DBAs to use sudo instead of su-ing directly to the user mysql, oracle, etc. Of course they get tired of having to type "-u mysql" on every command. Just FYI, you can put the following in your /etc/sudoers file so that all members of the Unix group "dba" will sudo to the "mysql" user by default:

Defaults:%dba       runas_default = mysql

Of course, those users will now have to explicitly "sudo -u root ..." to do anything as root.

By the way, as Ed mentioned in his section, environment variables getting reset can be a problem when you're using a tool like sudo or runas to do things as an alternate user. Your DBAs are going to have particular problems with this, since most of their scripts are going to assume that they're logged in directly as the "oracle" user or whatever and have all of the environment variable settings that go along with logging into that account. You may want to look into the env_keep option in your /etc/sudoers file to selectively preserve certain environment variable settings your DBAs are expecting to have.

Of course, your DBAs are immediately going to try to "sudo -u oracle /bin/bash" or "sudo -u oracle su" in order to get an interactive shell. At this point they've "escaped from sudo" and you're no longer getting an audit trail of what they're doing. You can try to prevent them from doing this by writing your /etc/sudoers config in such a way as to not allow them to execute these commands, but remember that many Unix commands allow "shell escapes" to an interactive shell:

$ sudo vi
(inside of vi) :shell
# id
uid=0(root) gid=0(root) groups=0(root)...

There is an /etc/sudoers option called "noexec" that you can turn on to disable shell escapes from programs (which it does by some really clever substitution in LD_LIBRARY_PATH). "noexec" is useful although it can break programs like "crontab -e" that rely on being able to exec() your editor to let you edit your crontab.

There's also the "sudoedit" option for allowing people to securely edit privileged files. sudoedit uses superuser privileges to make a copy of the file that is writable by the user, edits the file as the user, and then uses superuser privileges to put the edited file back into place.

One last item bears mentioning as long as we're talking about sudo. Output redirection can be a problem with sudo:

$ cd /etc
$ sudo awk -F: '($2 == "") { print }' /etc/shadow >empty_passwds
bash: empty_passwds: Permission denied

The problem here is that while the awk command happens with superuser privileges, the output redirection to the file empty_passwds happens in a subshell that is not running via sudo. Since your normal user account doesn't have write permissions under /etc, you get the "Permission denied" message.

The work-around is to use "sudo tee" in a pipeline:

$ sudo awk -F: '($2 == "") { print }' /etc/shadow | sudo tee empty_passwds >/dev/null

The tee command writes its input to a file and also to the standard output. In this case, we just care about creating the empty_passwds file, so I redirect the standard output to /dev/null to discard it.

Whew! For an "easy" Episode, I sure ended up packing a lot in here. I hope this helps you with your sudo-ing in the future.