Tuesday, January 26, 2010

Episode #79: A Sort of List

Hal starts off:

Way back in Episode #11 I showed you a little trick for sorting directory listings by inode number. But it struck me recently that we hadn't talked about all of the other interesting ways you can sort directory listings.

For example, you can use "ls -S" to sort by file size:

$ ls -lS
total 6752
-rw-r----- 1 syslog adm 1271672 2010-01-18 05:36 kern.log.1
-rw-r----- 1 syslog adm 1016716 2010-01-18 05:39 messages.1
-rw-r----- 1 syslog adm 499580 2010-01-18 05:38 daemon.log.1
[...]

Add "-h" if you prefer to see those file sizes with human-readable units:

$ ls -lSh
total 6.6M
-rw-r----- 1 syslog adm 1.3M 2010-01-18 05:36 kern.log.1
-rw-r----- 1 syslog adm 993K 2010-01-18 05:39 messages.1
-rw-r----- 1 syslog adm 488K 2010-01-18 05:38 daemon.log.1
[...]

Also, adding "-r" (reverse sort) can be useful so that the largest files end up at the bottom of the directory listing, closer to your next command prompt:

$ ls -lShr
total 6.6M
[...]
-rw-r----- 1 syslog adm 488K 2010-01-18 05:38 daemon.log.1
-rw-r----- 1 syslog adm 993K 2010-01-18 05:39 messages.1
-rw-r----- 1 syslog adm 1.3M 2010-01-18 05:36 kern.log.1
$

You have to do much less scrolling around this way.

In addition to sorting by size, you can also sort by the so-called "MAC time" values: last modified (mtime), last access (atime), and last inode or meta-data update (ctime). By default, "ls -t" will sort by last modified time. This is another good one to use "-r" on so you can quickly find the most recently modified files in a directory:

$ ls -lrt
total 6752
[...]
-rw-r----- 1 syslog adm 86080 2010-01-18 08:10 kern.log
-rw-r----- 1 syslog adm 120492 2010-01-18 08:17 syslog
-rw-r----- 1 syslog adm 3310 2010-01-18 08:17 auth.log
$

If you want to sort by ctime you use "-c" in addition to "-t". However, to sort by atime you need to use "-u" ("-a" was reserved for something else, obviously):

$ ls -lrtu
total 6752
[...]
-rw-r--r-- 1 root root 219990 2010-01-18 08:00 udev
-rw-r--r-- 1 root root 120910 2010-01-18 08:00 Xorg.0.log
-rw-r----- 1 root adm 56275 2010-01-18 08:00 dmesg
$

Now let's see what my Windows brethren have up their sleeves, shall we?

Ed Responds:
Although not as full featured as the Linux ls command, the humble dir command offers us a bunch of options, allowing us to mimic pretty much everything Hal has done above. The main options we'll use here are:
  • /o followed by a one-character option that lets us specify a sort order (we'll use /os to sort by size and /od by date... with a - sign in front of the one character to reverse order)
  • /t, also followed by one character which lets us specify a time field we're interested in (the field options we have and their definitions, according to the dir command's help, are /tc for Creation time, /ta for Last Access time, and /tw for Last Written time).

So, to get a directory listing sorted by size (smallest to largest), we'd run:

C:\> dir /os

Want them reversed? We would use:

C:\> dir /o-s

Want those sizes in human readable form? Install Cygwin and use the ls command, for goodness sakes. This is the dir command we're talking about here. We don't need no stinkin' human readable format. Actually, the default output for dir does show commas in its size numbers, making things a little more readable than the stock Linux output.

To see directory contents listed by Last Written (which is what dir calls them... roughly the same as last modified times in Linux parlance), in reverse order (with the most recently modified near the top), you could execute:

C:\> dir /o-d /tw

But, like we see with the ls command, Last Written is the default, so you can leave off the /tw to get the same results.

Wanna sort by creation time, again in reverse? Use:

C:\> dir /o-d /tc

And, how about last access? You could go with:

C:\> dir /o-d /ta

It's a good thing that the /od and /o-d sort options pick up the proper timestamp specified by the /t option, or else we'd be forced to do some waaaaay ugly sort command nonsense. Whew!

Tim responds too:

To get a directory listing we use Get-ChildItem. The name is a bit odd, but it is a generic command and can be used to get the child items from any container such as the registry, file system, or the certificate store. Today we are just looking at the file system.

First, let's take a look at the aliases for this useful cmdlet.

PS C:\> Get-Alias -Definition Get-ChildItem

CommandType Name Definition
----------- ---- ----------
Alias dir Get-ChildItem
Alias gci Get-ChildItem
Alias ls Get-ChildItem


I typically use ls since it is 33% more efficient to type than dir. But I digress...

Let's sort by file size:

PS C:\> gci | sort length

Directory: C:\

Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 6/10/2009 4:42 PM 10 file1.txt
-a--- 6/10/2009 4:42 PM 24 file2.txt
-a--- 11/24/2009 3:56 PM 1442522 file3.zip


The Get-ChildItem cmdlet does not have sorting capability built in, none of the cmdlets do. But that is what the pipeline and the Sort-Object cmdlet are for.

Want to sort by file size in reverse order? Use the Descending parameter.

PS C:\> gci | sort length -descending

Directory: C:\

Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 11/24/2009 3:56 PM 1442522 file3.zip
-a--- 6/10/2009 4:42 PM 24 file2.txt
-a--- 6/10/2009 4:42 PM 10 file1.txt


We can sort by any property, including LastAccessTime, LastWriteTime, or CreationTime.

PS C:\> gci | sort LastWriteTime


We can even sort on two properties.

PS C:\> gci | sort LastWriteTime, Length

Directory: C:\

Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 6/10/2009 4:42 PM 24 file2.txt
-a--- 6/10/2009 4:42 PM 10 file1.txt
-a--- 11/24/2009 3:56 PM 1442522 file3.zip


The files will first be sorted by write time. If two files have the same write time, they will then be sorted by length.

Finally, we come to displaying the size in a human readable format, and it isn't pretty. We have to write a custom expression to display the size in KB or MB.

PS C:\> gci | format-table -auto Mode, LastWriteTime, Length,
@{Name="KB"; Expression={"{0:N2}" -f ($_.Length/1KB) + "KB" }},
@{Name="MB"; Expression={"{0:N2}" -f ($_.Length/1MB) + "MB" }},
Name


Mode LastWriteTime Length KB MB Name
---- ------------- ------ -- -- ----
-a--- 6/10/2009 4:42 PM 10 0.01KB 0.00MB file1.txt
-a--- 6/10/2009 4:42 PM 24 0.02KB 0.00MB file2.txt
-a--- 11/24/2009 3:56 PM 1442522 1,408.71KB 1.38MB file3.zip


We can specify custom properties to display. This format works with any of the format cmdlets (Get-Command -Verb Format) or select-object. The custom columns are created by using a hashtable. A hashtable is specified by using @{ key1=value1, key2=value2 }. In our case we specify a name and an expression. Here is a simple example.

..., @{Name="Foo"; Expression={ $_.Length + 1 }}, ...


In this case we would add a column with the heading Foo and with a value of the Length plus 1. The expression can include all sorts of math or other crazy PowerShell fu.

Ironically, getting a human readable output comes from a non-human readable command.