Monday, March 2, 2009

Episode #5 - Simple Text Manipulation - Reverse DNS Records

Paul Says:

There are many times when I run commands to collect information, such as hostnames and IP addresses, and the output is, well, less than desirable. For example, lets say that you have a file called "lookups.txt" that contains the following: domain name pointer domain name pointer

The output is not easy to read, so I like to manipulate it such that I get a list of IPs and hostnames:

$ awk -F . '{print $4 "." $3 "." $2 "." $1 " " $6 "."$7"."$8"."$9}' lookups.txt | cut -d" " -f1,6

Hal Comments:

The problem with your awk expression, Paul, is that you're assuming that all of the fully-qualified hostnames are four levels deep. What if your file also contains lines like: domain name pointer domain name pointer

The awk doesn't choke and die, but you do end up with weird output:

$ awk -F . '{print $4 "." $3 "." $2 "." $1 " " $6 "."$7"."$8"."$9}' lookups.txt | cut -d" " -f1,6

Yuck! Frankly, this looks like a job for sed to me:

$ sed 's/\([0-9]*\)\.\([0-9]*\)\.\([0-9]*\)\.\([0-9]*\) domain name pointer\(.*\)\./\4.\3.\2.\1\5/' \

sed expressions like this end up looking like nasty thickets of backwhacks, because of all the "\( ... \)" expressions, but this approach allows us to re-order the octets of the IP address and remove all of the extra text in one fell swoop.

And, yes, a lot of people (including me) would probably use Perl instead of sed for this, because Perl's regular expression syntax allows for a much more compact command line. But Paul, Ed, and I have agreed to avoid diving into pure scripting languages like Perl.

Paul (aka Grasshopper) Says:

Yes, I was assuming a static hostname, and wrote it as a one off to quickly parse my particular output. I now see that sed is even more powerful than I thought! This will certainly be a nice addition to some of the command line one-liners I use a on regular basis. Many times when doing a penetration test you have to move information, such as IP addresses, between tools and this will make the job much easier.

Ed (aka Ed) Says:

I really do wish we had awk or sed on Windows. I know, I know... we can get them with Cygwin or other shells that we could add in. But, our ground rules here force us to rely on built-in commands. That means, to parse in Windows, we rely on FOR /F loops, which can parse files, strings, or the output of commands.

When I first saw Paul's post above, I came up with this pretty straight-forward approach:

C:\> FOR /F "tokens=1-4,10-14 delims=. " %a in (lookups.txt) do @echo %d.%c.%b.%a %e.%f.%g.%h

Here, I'm parsing the file, using iterator variables starting with %a (FOR /F will automatically allocate more vars while it parses) and delimiters of . and spaces (gotta have that space there, because the dot overrides default parsing on spaces). I tokenize my variables around the first four and tenth through fourteenth places in the line, the IP address and domain name. Then, I dump everything out in our desired order. Simple and effective.

But, Hal brings up an interesting twist. Like Paul's approach, mine also has those ugly variable number of periods at the end, because we can't always assume that the domain name has four elements. I thought about it for a while, trying to push my first FOR /F loop to deal with this, and it got real ugly, real fast. Lots of IF statements made it impractical. So, I came up with a simpler approach: embedded FOR /F loops, the outer one to parse the file, and the inner loop to parse a string from the outer loop's results. Here it is:

C:\> FOR /F "tokens=1-5" %a in (lookups.txt) do @(@FOR /F "tokens=1-4 delims=." %i in ("%a") do @echo %l.%k.%j.%i %e)

What's this mess? Well, I use my outer FOR loop to parse lookups.txt into five components, using the default delims of spaces. %a will contain the IP address, with dots and all. The fifth item (%e) is the domain name. Then, in my inner FOR loop, I parse the string %a, using delims of periods and a variable of %i. That'll drop each octet of our IP address into a variable, which we can echo out. Furthermore, it preserves our domain name as one chunk in %e, regardless of the number of entities it has in it. I then just echo the IP address (reversing the octets, of course) followed by the domain name. There's one small drawback here: I leave the trailing period at the end of every domain name. There's only one there, and it's there for all of them, unlike the earlier approach. Still, this is very workable, and keeps the command syntax almost typable. :)