Tuesday, February 15, 2011

Episode #134: Never Out of Sorts

Hal takes it easy

Just a quick one this week, since I'm currently hopping around the world. I was recently working a case where we had extracted a bunch of date-stamped messages from unallocated space, and we wanted to output them in reverse-chronological order. Unfortunately, the date stamps were in "MM/DD/YYYY" format, which is not really conducive for easy sorting (before you start hating on the US date format standard, let me point out that DD/MM/YYYY isn't any easier to deal with for this sort of thing).

Now many of you are probably aware that the "sort" command allows you to specify the field(s) to sort on with "-k". But did you know that you were allowed to use multiple "-k" options?

$ sort -nr -t/ -k3,3 -k1,2 messages
01/05/2011 12 keyboards drumming
01/04/2011 11 admins smiling
01/03/2011 10 systems thrashing
01/02/2011 9 networks crashing
01/01/2011 8 hosts a-pinging
12/31/2010 7 Windows versions
12/30/2010 6 (billion) Linux distros
12/29/2010 5 Windows loops!
12/28/2010 4 authors coding
12/27/2010 3 shells hacked
12/26/2010 2 types of hosts
12/25/2010 Plus command line hist-or-y!

The "-t" option specifies my field delimiter-- the "/" between the pieces of the date. Then I sort first on the 3rd field ("-k3,3", aka the year) and break ties with fields 1 and 2 ("-k1,2", the month and day, respectively). "-nr" gives me a reversed numeric sort, so I get my messages in reverse chronological order like I wanted.

Oh dear. I hope this one won't be too difficult for Tim. Do you think I should dare him to come up with a CMD.EXE version?

Tim is easy

This isn't too bad for PowerShell. All we need to do is convert the date to an object, sort, and output.

PS C:\> Get-Content messages.txt |
Select-Object @{Name="Date";Expression={Get-Date($_.substring(0,10))}},@{Name="Line";Expression={$_}} |
Sort-Object date -Descending | Select-Object Line


Line
----
01/05/2011 12 keyboards drumming
01/04/2011 11 admins smiling
01/03/2011 10 systems thrashing
01/02/2011 9 networks crashing
01/01/2011 8 hosts a-pinging
12/31/2010 7 Windows versions
12/30/2010 6 (billion) Linux distros
12/29/2010 5 Windows loops!
12/28/2010 4 authors coding
12/27/2010 3 shells hacked
12/26/2010 2 types of hosts
12/25/2010 Plus command line hist-or-y!


We start by reading the file messages.txt. The output is piped into, and objectified by, Select-Object. The first 10 characters are used to create a Date Object via Get-Date. We also create an object (Line) represting the full line. The results are then sorted based on our newly created Date Object. Finally, the Line object is output to show our results.

If we really wanted to PowerShell this command, we would convert the entire output to an object. In addition, we can shorten the command by using aliases and shortened parameter names.

PS C:\> gc messages.txt | select @{n="Date";e={Get-Date($_.substring(0,10))}},
@{n="Text";e={$_.substring(11)}} | sort date -desc


Date Text
---- ----
1/5/2011 12:00:00 AM 12 keyboards drumming
1/4/2011 12:00:00 AM 11 admins smiling
1/3/2011 12:00:00 AM 10 systems thrashing
1/2/2011 12:00:00 AM 9 networks crashing
1/1/2011 12:00:00 AM 8 hosts a-pinging
12/31/2010 12:00:00 AM 7 Windows versions
12/30/2010 12:00:00 AM 6 (billion) Linux distros
12/29/2010 12:00:00 AM 5 Windows loops!
12/28/2010 12:00:00 AM 4 authors coding
12/27/2010 12:00:00 AM 3 shells hacked
12/26/2010 12:00:00 AM 2 types of hosts
12/25/2010 12:00:00 AM Plus command line hist-or-y!


CMD.EXE

So Hal decides to take it easy this week, but he wants me to put in the extra shift and do CMD.EXE as well! And he knows I can't back down from a dare, so here its.

Hal's file contains tabs and we need to use the tab character as one of the delimiters in our for loop. But when you hit the tab key at the command prompt it tries to use tab completion. We need to turn this feature off, so we have to start a new shell and tell it to F Off (literally).

C:\> cmd /F:off


Not only can we insult our shell, but we also turn off tab completion so we can use the tab character in our commands. Without this command it isn't possible to type the following command. Obviously, substitute <tab> with a press of the tab key.

C:\> (for /F "tokens=1-3,* delims=/<tab>" %a in (messages.txt) do
@echo %c/%a/%b<tab>%d) | sort /R

2011/01/05 12 keyboards drumming
2011/01/04 11 admins smiling
2011/01/03 10 systems thrashing
2011/01/02 9 networks crashing
2011/01/01 8 hosts a-pinging
2010/12/31 7 Windows versions
2010/12/30 6 (billion) Linux distros
2010/12/29 5 Windows loops!
2010/12/28 4 authors coding
2010/12/27 3 shells hacked
2010/12/26 2 types of hosts
2010/12/25 Plus command line hist-or-y!


The For loop parses our text file using the forward slash and tab as delimiters. We then rewrite the date into a sortable format. You might say that is cheating, but I say that is the right way to write dates. The first token, %a, represents the month, %b the day, %c the year, and %d is the remainder of the line. We then rewrite the line by putting the year first. The entire For loop and output is wrapped in parenthesis before being piped to sort. Parenthesis must be used or each line will be sent, one at a time, to the sort command. If the sort command recieves only one line at a time it [effectivly] has nothing to sort.

That gives us the "fixed" output, but we can put the date back to "normal":

C:\> for /F "usebackq tokens=1-3,* delims=/<tab><space>" %m in
(`^(for /F "tokens=1-3,* delims=/<tab>" %a in ^(messages.txt^) do
@echo %c/%a/%b<tab>%d^) ^| sort /R`) do
@echo %n/%o/%m<tab>%p

01/05/2011 12 keyboards drumming
01/04/2011 11 admins smiling
01/03/2011 10 systems thrashing
01/02/2011 9 networks crashing
01/01/2011 8 hosts a-pinging
12/31/2010 7 Windows versions
12/30/2010 6 (billion) Linux distros
12/29/2010 5 Windows loops!
12/28/2010 4 authors coding
12/27/2010 3 shells hacked
12/26/2010 2 types of hosts
12/25/2010 Plus command line hist-or-y!


There you go Hal, you got your CMD.EXE. And don't forget to /F:off.

Edit: Ed Skoudis just cheered that last statement