Tuesday, June 21, 2011

Episode #150: Long Line of Evil

Tim goes long

While trying to trace down some rogue php shells I needed a way to find legitimate php files injected with bad code. It happened that the injected lines of code were usually quite long, so I needed a way to find php files with long lines of code, where long is defined as more than 150 characters. How would we find these files in Windows?

First off, let's try our old grumpy friend, cmd.exe.

C:\> for /R %i in (*.php) do @type %i | findstr "..........<140 more periods>" > NUL && @echo %i

C:\webroot\my1.php
C:\webroot\subdir\my4.php


This command does a recursive directory listing while looking for .php files. When it finds one it outputs the contents of the files (via the type command) and uses FindStr to find strings with at least 150 characters. When this happens the second half of our short circuit And statement is run, which outputs our file name (%i).

It will output the same file more than once, but we don't have much of a choice. It won't output the matching line number either, but it is functional. If we use PowerShell we can better targeted results, like this:

PS C:\> Get-ChildItem -Recurse -Include *.php | Select-String -Pattern '.{150,}' | Select-Object Path, LineNumber, Line

Path LineNumber Line
---- ---------- ----
C:\webroot\subdir\my4.php 3 This is a really really really...
C:\webroot\my1.php 9 This is a really really really...


This command does a recursive directory listing using Get-ChildItem with the -Recurse option. The -Include parameter is given to make sure we only check .php files. The resulting files are piped into Select-String where we find lines with at least 150 characters. We then output the matching file name, the matching line number, and the line itself.

Per usual, we can trim the command using aliases, shortened parameter names, and positional parameters.

PS C:\> ls -r -i *.php | Select-String '.{150,}' | Select-Object Path, LineNumber, Line


Oddly enough, when I did this I couldn't use windows (shudder!). I know what I used isn't as efficient as what Hal is about to do, so I won't bore you with my scripty Linux solution.

Hal goes longer

Actually, what's interesting to me about this week's challenge is that it's surprisingly difficult for such a relatively simple problem. Sort of like my relationship with Tim.

This issue here is that there's no built-in Unix primative to get the length of the longest line in a file. So we'll write our own:

$ max=''; \
while read line; do [[ ${#line} -gt ${#max} ]] && max="$line"; done </etc/profile; \
echo ${#max}: $max

70: # /etc/profile: system-wide .profile file for the Bourne shell (sh(1))

The trick here is using the bash built-in "${#variable}", which returns the number of characters in the string in $variable (unless the variable is an array, in which case it returns the number of elements in the array). So first I create an empty variable called "max" that I'll use to track my longest line. Then my while loop reads through my target file and compares the length of the current line to the length of the string currently in "max". If the new line is longer, I set max to be the newly crowned longest line. At the end of the loop, "max" will be the longest line in the file (technically it will be the first line of that longest length, but close enough), so I print out the length of our "max" line followed by the line itself.

So that will give us the longest line of a single file, but Tim's challenge is actually to find all files which contain a line that's longer than a certain fixed length. In some ways this makes our task easier, since we can stop reading the file as soon as we find any line that exceeds our length limit. But we have to add an extra find command to give us a list of files:

# find /etc -type f -exec /bin/bash -c \
'while read line; do [[ ${#line} -gt 150 ]] && echo {} && break; done < {}' \;

/etc/apt/trusted.gpg~
/etc/apt/trusted.gpg
/etc/apt/apt.conf.d/20dbus
/etc/apt/apt.conf.d/99update-notifier
...

Our while loop now becomes the argument of the "-exec /bin/bash -c ..." action at the end of the find command. And you'll notice that inside the while loop were just looking for any line that's longer than 150 characters. When we hit this condition we print out the file name and simply call "break" to terminate the loop and stop reading the file.

If you really want to see all the long lines from each file along with the file names, it actually makes our while loop a little simpler:

# find /etc -type f -exec /bin/bash -c \
'while read line; do [[ ${#line} -gt 150 ]] && echo {}:$line; done < {}' \;

...


So the final solution a little complicated, but stays well short of the borders of Scriptistan I'd say. And it's less than 150 characters long...

Davide for the touchdown!

Loyal reader Davide Brini writes in to note that the GNU version of wc actually has a "-L" switch that will output the longest line of a file. So on Linux systems, or any box that has GNU coreutils installed, we could use this option to find files with long lines:

find /etc -type f | xargs wc -L | awk '$1 > 150 {print $1}'

"wc -L" gives us the length of the longest line in the file, followed by the file name. So we use awk to see if the longest line is more than 150 characters, and if so we print out the file name.

But as long as we're using awk, Davide points out that we could just:

find /etc -type f -exec awk 'length > 150 {print FILENAME; exit}' {} \;

Here we're using "find ... -exec awk ..." to call awk on each file in turn. awk will call length() on every line of the file and if we hit a line that's longer than 150 characters, we'll spit out the FILENAME variable which awk helpfully sets for us and terminate awk so we go on to the next file.

And again, if you're on a system with all the GNU utilities installed, then you can do this even more efficiently:

find /etc -type f -exec awk 'length > 150 {print FILENAME; nextfile}' {} +

With GNU find, "-exec ... +" functions like "find ... | xargs ...", calling the awk program as little as possible, but using large groups of matching file names as arguments. The nice thing about the GNU version of awk is that you have the "nextfile" operator to stop reading from the current file and move on to the next one as soon as we encounter a long line.

Thanks, Davide, as always for your insight!