Tuesday, February 22, 2011

Episode #135: His name is my name too

Tim takes it easy this week

This week Cory Williams writes in:

I would like to have a script that will find files with the same file name prefix but different extension. For instance, let's say within \Windows\System32 there are two files, FileName1.EXE and FileName1.DLL. One given is that it will always be an .EXE and a .DLL in combination. However, I will not know the prefix names. Is there a script that can find such files?

One of our basic rules of this site is no scripts. We have pushed the line a few times, but if we do it again the universe as we now it may implode. And yes, this blog is that important to the universe. Now back to the very important task at hand.

This task is quite simple in PowerShell. File and Directory objects both contain a BaseName property that contains the name minus the extension.

PS C:\> ls myfile.txt | select name,basename | ft -a

Name BaseName
---- --------
myfile.txt myfile


We can use this property with the Group-Object cmdlet to find files with matching basenames. The results can be piped into the Where-Object cmdlet (alias ?) to filter for groups with more than one object.

PS C:\Windows\system32> ls | group basename | ? { $_.Count -gt 1 }

Count Name Group
----- ---- -----
2 Boot {Boot, boot.sdi}
2 config {config, config.nt}
2 Dism {Dism, Dism.exe}
2 DRVSTORE {DRVSTORE, drvstore.dll}
2 ias {ias, ias.dll}
2 migwiz {migwiz, migwiz.lnk}
2 Msdtc {Msdtc, msdtc.exe}
...


If you don't want to include directories in the mix then we can filter them out before the grouping.

PS C:\Windows\system32> ls | ? { -not $_.PSIsContainer } |
group basename | ? { $_.Count -gt 1 }


Count Name Group
----- ---- -----
2 activeds {activeds.dll, activeds.tlb}
2 certmgr {certmgr.dll, certmgr.msc}
3 cliconfg {cliconfg.dll, cliconfg.exe, cliconfg.rll}
2 DELS3ci {DELS3ci.dll, DELS3ci.exe}
2 DELS3L3 {DELS3L3.DLL, DELS3L3.SMT}
2 diskcopy {diskcopy.com, diskcopy.dll}
2 dssec {dssec.dat, dssec.dll}
...


Pretty simple. Now let's see what Hal's got in store for us.

Hal takes it where he can get it:

Of course Tim's solution makes me immediately think of the Unix basename command, and ultimately how frustrating it is. Frustrating because, unlike most Unix commands, basename won't read input from stdin. You can only call basename on individual strings fed in on the command line. And even then you can only specify a single file name extension to use when reducing the string-- e.g. "basename hosts.deny .deny".

But instead of moaning about what we don't have, let's look at what we do have. sed will allow us to read in a list of file names on stdin and chop off everything after the last dot:

$ ls | sed 's/\.[^.]*//'
a2ps
a2ps-site
acpi
adjtime
alchemist
aliases
aliases
...

Now all we have to do is list the prefixes that are duplicated. A little bit of tweaking to our sed expression, some uniq action, and output redirection gets us where we need to be:

$ ls -d $(ls | sed 's/\.[^.]*/.*/' | uniq -d)
ant.conf cron.weekly issue.rpmnew prelink.conf
ant.d csh.cshrc logrotate.conf prelink.conf.d
auto.master csh.login logrotate.d rc.d
auto.misc dnsmasq.conf modprobe.conf rc.local
auto.net dnsmasq.d modprobe.d rc.news
auto.smb dovecot.conf ntp.conf rc.sysinit
cron.d dovecot.conf-dist ntp.conf-dist xinetd.conf
cron.daily dovecot.conf.rpmnew ntp.conf.rpmnew xinetd.d
cron.deny hosts.allow php.d
cron.hourly hosts.deny php.ini
cron.monthly issue.net prelink.cache

In the pipeline inside the "$(...)" construct I'm using sed to convert the output of ls from "<filename>.<ext>" to "<filename>.*". The "uniq -d" command gives me only the base <filename>.* patterns that are listed more than once. Then we use these wildcards as command-line arguments in the outer ls command, and we get the list of matching files.

You'll notice that I used "ls -d ..." here, because some of the returned values are going to match directories. If you want to filter out directories as Tim does above, then things get a lot more complicated:

$ ls -d $(find . -maxdepth 1 ! -type d | sed 's/.\/\(.*\.\)[^.]*/\1*/' | sort | uniq -d)
auto.master dovecot.conf-dist ld.so.conf prelink.conf
auto.misc dovecot.conf.rpmnew ld.so.conf.d prelink.conf.d
auto.net hosts.allow ld.so.conf.rpmnew rc.d
auto.smb hosts.deny ntp.conf rc.local
csh.cshrc issue.net ntp.conf-dist rc.news
csh.login issue.rpmnew ntp.conf.rpmnew rc.sysinit
dovecot.conf ld.so.cache prelink.cache

The find command locates the non-directory objects in the current directory. But the output of the find command gives us "./<filename>.<ext>". So we have a more complicated sed expression to convert this to "<filename>.*". Also, the output of the file command need not be in sorted order like the output of ls, so we have to "sort" before we feed the whole mess into "uniq -d".

The problem now is that our output still contains some directories. You'll notice, for example, that "rc.*" matches not only rc.local, rc.news, and rc.sysinit, but also rc.d, which is a directory. So we actually have to post-process the output and get rid of directories:

$ for file in $(ls -d $(find . -maxdepth 1 ! -type d | sed 's/.\/\(.*\.\)[^.]*/\1*/' | 
sort | uniq -d)); do
[ -d $file ] || echo $file;
done

...
prelink.cache
prelink.conf
rc.local
rc.news
rc.sysinit

Man, that's pretty fugly! But I just don't see a clean way to do this. Let me know if you can think of a better one.

When all you have is a Haemer...

It was nice to hear from "friend of the blog" Jeff Haemer again. He sent us this alternative approach to our problem:

$ for f in *.*; do echo ${f%.*}.*; done | uniq -d
...
prelink.cache prelink.conf prelink.conf.d
rc.d rc.local rc.news rc.sysinit
xinetd.conf xinetd.d

Jeff's basically bypassing external tools here and using the shell built-ins to do almost everything. The outer loop matches any file with a dot in the name, but inside the loop is where the magic happens.

The construct "${var%pattern}" strips off anything matching "pattern" from the right-hand side of "var". In this case, Jeff is getting rid of the final dot and any extension from the file name. So, for example, "xinetd.conf" and "xinetd.x" are both reduced to "xinetd".

Then Jeff simply does "echo <result>.*"-- e.g., "echo xinetd.*", continuing with the previous example. In the case of files that have the same prefix, that will yield multiple lines of duplicate output, which Jeff post-processes the loop for with "uniq -d".

The only difficulty is that Jeff's solution matches directories as well as files. You'll have to do some post-processing to filter out the directories if that's important to you. However, you'll need to be careful with lines like our "xinetd" example. Once you get rid of "xinetd.d", you'll be left with just "xinetd.conf", which will also need to be filtered out because there are no other files in the directory with this prefix.

Anyway, thanks for writing in again, Jeff! Always good to hear from you.

Tuesday, February 15, 2011

Episode #134: Never Out of Sorts

Hal takes it easy

Just a quick one this week, since I'm currently hopping around the world. I was recently working a case where we had extracted a bunch of date-stamped messages from unallocated space, and we wanted to output them in reverse-chronological order. Unfortunately, the date stamps were in "MM/DD/YYYY" format, which is not really conducive for easy sorting (before you start hating on the US date format standard, let me point out that DD/MM/YYYY isn't any easier to deal with for this sort of thing).

Now many of you are probably aware that the "sort" command allows you to specify the field(s) to sort on with "-k". But did you know that you were allowed to use multiple "-k" options?

$ sort -nr -t/ -k3,3 -k1,2 messages
01/05/2011 12 keyboards drumming
01/04/2011 11 admins smiling
01/03/2011 10 systems thrashing
01/02/2011 9 networks crashing
01/01/2011 8 hosts a-pinging
12/31/2010 7 Windows versions
12/30/2010 6 (billion) Linux distros
12/29/2010 5 Windows loops!
12/28/2010 4 authors coding
12/27/2010 3 shells hacked
12/26/2010 2 types of hosts
12/25/2010 Plus command line hist-or-y!

The "-t" option specifies my field delimiter-- the "/" between the pieces of the date. Then I sort first on the 3rd field ("-k3,3", aka the year) and break ties with fields 1 and 2 ("-k1,2", the month and day, respectively). "-nr" gives me a reversed numeric sort, so I get my messages in reverse chronological order like I wanted.

Oh dear. I hope this one won't be too difficult for Tim. Do you think I should dare him to come up with a CMD.EXE version?

Tim is easy

This isn't too bad for PowerShell. All we need to do is convert the date to an object, sort, and output.

PS C:\> Get-Content messages.txt |
Select-Object @{Name="Date";Expression={Get-Date($_.substring(0,10))}},@{Name="Line";Expression={$_}} |
Sort-Object date -Descending | Select-Object Line


Line
----
01/05/2011 12 keyboards drumming
01/04/2011 11 admins smiling
01/03/2011 10 systems thrashing
01/02/2011 9 networks crashing
01/01/2011 8 hosts a-pinging
12/31/2010 7 Windows versions
12/30/2010 6 (billion) Linux distros
12/29/2010 5 Windows loops!
12/28/2010 4 authors coding
12/27/2010 3 shells hacked
12/26/2010 2 types of hosts
12/25/2010 Plus command line hist-or-y!


We start by reading the file messages.txt. The output is piped into, and objectified by, Select-Object. The first 10 characters are used to create a Date Object via Get-Date. We also create an object (Line) represting the full line. The results are then sorted based on our newly created Date Object. Finally, the Line object is output to show our results.

If we really wanted to PowerShell this command, we would convert the entire output to an object. In addition, we can shorten the command by using aliases and shortened parameter names.

PS C:\> gc messages.txt | select @{n="Date";e={Get-Date($_.substring(0,10))}},
@{n="Text";e={$_.substring(11)}} | sort date -desc


Date Text
---- ----
1/5/2011 12:00:00 AM 12 keyboards drumming
1/4/2011 12:00:00 AM 11 admins smiling
1/3/2011 12:00:00 AM 10 systems thrashing
1/2/2011 12:00:00 AM 9 networks crashing
1/1/2011 12:00:00 AM 8 hosts a-pinging
12/31/2010 12:00:00 AM 7 Windows versions
12/30/2010 12:00:00 AM 6 (billion) Linux distros
12/29/2010 12:00:00 AM 5 Windows loops!
12/28/2010 12:00:00 AM 4 authors coding
12/27/2010 12:00:00 AM 3 shells hacked
12/26/2010 12:00:00 AM 2 types of hosts
12/25/2010 12:00:00 AM Plus command line hist-or-y!


CMD.EXE

So Hal decides to take it easy this week, but he wants me to put in the extra shift and do CMD.EXE as well! And he knows I can't back down from a dare, so here its.

Hal's file contains tabs and we need to use the tab character as one of the delimiters in our for loop. But when you hit the tab key at the command prompt it tries to use tab completion. We need to turn this feature off, so we have to start a new shell and tell it to F Off (literally).

C:\> cmd /F:off


Not only can we insult our shell, but we also turn off tab completion so we can use the tab character in our commands. Without this command it isn't possible to type the following command. Obviously, substitute <tab> with a press of the tab key.

C:\> (for /F "tokens=1-3,* delims=/<tab>" %a in (messages.txt) do
@echo %c/%a/%b<tab>%d) | sort /R

2011/01/05 12 keyboards drumming
2011/01/04 11 admins smiling
2011/01/03 10 systems thrashing
2011/01/02 9 networks crashing
2011/01/01 8 hosts a-pinging
2010/12/31 7 Windows versions
2010/12/30 6 (billion) Linux distros
2010/12/29 5 Windows loops!
2010/12/28 4 authors coding
2010/12/27 3 shells hacked
2010/12/26 2 types of hosts
2010/12/25 Plus command line hist-or-y!


The For loop parses our text file using the forward slash and tab as delimiters. We then rewrite the date into a sortable format. You might say that is cheating, but I say that is the right way to write dates. The first token, %a, represents the month, %b the day, %c the year, and %d is the remainder of the line. We then rewrite the line by putting the year first. The entire For loop and output is wrapped in parenthesis before being piped to sort. Parenthesis must be used or each line will be sent, one at a time, to the sort command. If the sort command recieves only one line at a time it [effectivly] has nothing to sort.

That gives us the "fixed" output, but we can put the date back to "normal":

C:\> for /F "usebackq tokens=1-3,* delims=/<tab><space>" %m in
(`^(for /F "tokens=1-3,* delims=/<tab>" %a in ^(messages.txt^) do
@echo %c/%a/%b<tab>%d^) ^| sort /R`) do
@echo %n/%o/%m<tab>%p

01/05/2011 12 keyboards drumming
01/04/2011 11 admins smiling
01/03/2011 10 systems thrashing
01/02/2011 9 networks crashing
01/01/2011 8 hosts a-pinging
12/31/2010 7 Windows versions
12/30/2010 6 (billion) Linux distros
12/29/2010 5 Windows loops!
12/28/2010 4 authors coding
12/27/2010 3 shells hacked
12/26/2010 2 types of hosts
12/25/2010 Plus command line hist-or-y!


There you go Hal, you got your CMD.EXE. And don't forget to /F:off.

Edit: Ed Skoudis just cheered that last statement

Tuesday, February 8, 2011

Episode #133: Name's the Same?

Hal's on mailroom duty

This week's challenge comes to us courtesy of Brian Finn:

We are building a new external DNS server for our network, and I was trying to figure out a way to make sure I had the new one giving out the same answers as the old one. I was going to throw a bunch of host names at both servers and make sure they handed out the same IP addresses. I figured out how to do it with dig on Linux:

dig -f hostnames.txt +noall +answer @dnsserver1 > output1
dig -f hostnames.txt +noall +answer @dnsserver2 > output2
diff output1 output2

Is there a one-liner that can do this? Can it even be done in Windows?

I'll leave the Windows answer to my esteemed colleague, but the one-liner answer should be familiar to regular readers of the blog:

diff <(dig -f hostnames.txt +noall +answer @dnsserver1) \
<(dig -f hostnames.txt +noall +answer @dnsserver2)

The solution is to use bash's "<(...)" syntax to substitute the command output in place of the file name arguments in the diff command line.

I did want to compliment Brian on his excellent dig fu here. Normally I use the "host" command when doing DNS queries from the command-line because the output is so much easier to deal with. But in this case because we want to query an entire list of hostnames, the "-f" option to dig comes in really handy. To simplify the dig output, Brian first sets the "+noall" option which would normally suppress all output from the command. But he follows that up with "+answer", which means to only output the answers returned by the remote name server. So you get nice clean output like this:

$ dig -f hostnames.txt +noall +answer @127.0.0.1
www.deer-run.com. 3600 IN CNAME newwinkle.deer-run.com.
newwinkle.deer-run.com. 3600 IN A 67.18.149.10
test.deer-run.com. 3600 IN A 192.168.168.168
$ dig -f hostnames.txt +noall +answer @67.18.149.10
www.deer-run.com. 14400 IN CNAME newwinkle.deer-run.com.
newwinkle.deer-run.com. 14400 IN A 67.18.149.10
test.deer-run.com. 14400 IN CNAME newwinkle.deer-run.com.
newwinkle.deer-run.com. 14400 IN A 67.18.149.10

You can see from the above output that I've set up a test case using a fake DNS server on my local box and the master server for my domain. The records for www.deer-run.com are the same, but test.deer-run.com differs between the two servers. So what's going to happen when I try my one-liner here?

$ diff <(dig -f hostnames.txt +noall +answer @127.0.0.1) \
<(dig -f hostnames.txt +noall +answer @67.18.149.10)

1,3c1,4
< www.deer-run.com. 3600 IN CNAME newwinkle.deer-run.com.
< newwinkle.deer-run.com. 3600 IN A 67.18.149.10
< test.deer-run.com. 3600 IN A 192.168.168.168
---
> www.deer-run.com. 14400 IN CNAME newwinkle.deer-run.com.
> newwinkle.deer-run.com. 14400 IN A 67.18.149.10
> test.deer-run.com. 14400 IN CNAME newwinkle.deer-run.com.
> newwinkle.deer-run.com. 14400 IN A 67.18.149.10

Why is diff reporting that all of the output is different between the two commands? Because if you look carefully at the second column of output you'll see that the time-to-live values returned by the two servers are different. That's because I set the default TTL value differently in the two DNS installations.

So the question is, is this a significant difference that deserves to be reported on? Some folks might think that it was, and so the output above is correct. However, if you don't care about differing TTL values, then you'll probably want to do something like this:

$ diff <(dig -f hostnames.txt +noall +answer @127.0.0.1 | awk '{print $1,$4,$5}') \
<(dig -f hostnames.txt +noall +answer @67.18.149.10 | awk '{print $1,$4,$5}')

3c3,4
< test.deer-run.com. A 192.168.168.168
---
> test.deer-run.com. CNAME newwinkle.deer-run.com.
> newwinkle.deer-run.com. A 67.18.149.10

Kind of long for a "one-liner", but it gets the job done.

Well Brian basically did my work for me this week. Let's see how Tim's handling things on his solo mission.

Tim celebrates a Packers win!

I'm a big Packers fan and I'm excited they won on Sunday. Congratulations to the Pack and Go Pack Go! But now that football is over, we'll get back to some Windows fu.

Unfortunately, there isn't a nice PowerShell cmdlet to get the results from DNS. We have to use the old nslookup command, which doesn't give us nice output objects. In addition, nslookup sends some of its messages to stderr, so we'll strip that from the output by sending it to the $null garbage can. Also, we'll set the timeout value to make sure we don't miss any lookups and mess up our output. Here is the command and its output.

PS C:\> nslookup -timeout=20 www.commandlinekungfu.com 8.8.8.8 2>$null
Server: google-public-dns-a.google.com
Address: 8.8.8.8

Name: www.commandlinekungfu.com
Address: 209.20.73.195


Our problem is the results show the name and IP address of name server we are querying. Since we don't have a built-in cmdlet to just give us answers, we will have to strip the first three lines from the output. We can do just that by using the range operator to only access the lines of output we want.

PS C:\> (nslookup -timeout=20 www.commandlinekungfu.com 8.8.8.8 2>$null )[3..99]
Name: www.commandlinekungfu.com
Address: 209.20.73.195


This command returns the 3rd through 99th line of output. Of course we don't have 99 lines of output, but it makes sure that we get all the output. The output of each command can be fed into the Compare-Object cmdlet to see if the DNS servers are returning different results.

PS C:\> Compare-Object `
(nslookup -timeout=20 www.commandlinekungfu.com 8.8.8.8 2>$null )[3..99]
(nslookup -timeout=20 www.commandlinekungfu.com 8.8.4.4 2>$null )[3..99]


<No Output>


No output means everything matches. Great!

What does it look like if the output is different?

PS C:\> Compare-Object `
(nslookup -timeout=20 www.commandlinekungfu.com 8.8.8.8 2>$null )[3..99]
(nslookup -timeout=20 www.commandlinekungfu.com 127.0.0.1 2>$null )[3..99]


InputObject SideIndicator
----------- ------------
Address: 209.20.73.195 <=
Address: 111.22.33.222 =>


That works for manually checking each hostname, but what if we have a text file containing all the DNS names we want to check? Our first problem is the output above does not show the name we are looking up so we would end up with a list of IP addresses without the context of the hostname. We have to change the output so the name and address is on one line.

PS C:\> [string]::Join(" ", (nslookup -timeout=20 www.commandlinekungfu.com 8.8.8.8 2>$null )[3..99])
Name: www.commandlinekungfu.com Address: 209.20.73.195


This command removes each line break and replaces it with a space. We then read the file and and check each hostname inside the ForEach-Object cmdlet (alias %).

PS C:\> Get-Content hostnames.txt | % { Compare-Object
[string]::Join(" ", (nslookup -timeout=20 $_ 8.8.8.8 2>$null )[3..99])
[string]::Join(" ", (nslookup -timeout=20 $_ 127.0.0.1 2>$null )[3..99])
}


InputObject SideIndicator
----------- ------------
Name: www.commandlinekungfu.com Address: 209.20.73.195 <=
Name: www.commandlinekungfu.com Address: 111.22.33.222 =>


While definitely not as simple and elegant as Hal's solution, it does work.

By the way, there are a number of plug-ins that emmulate dig and give us pretty objects, my favorite is over here.