Monday, May 26, 2014

Episode #178: Luhn-acy

Hal limbers up in the dojo

To maintain our fighting trim here in the Command Line Kung Fu dojo, we like to set little challenges for ourselves from time to time. Of course, we prefer it when our loyal readers send us ideas, so keep those emails coming! Really... please oh please oh please keep those emails coming... please, please, please... ahem, but I digress.

All of the data breaches in the news over the last year got me thinking about credit card numbers. As many of you are probably aware, credit card numbers have a check digit at the end to help validate the account number. The Luhn algorithm for computing this digit is moderately complicated and I wondered how much shell code it would take to compute these digits.

The Luhn algorithm is a right-to-left calculation, so it seemed like my first task was to peel off the last digit and be able to iterate across the remaining digits in reverse order:

$ for d in $(echo 123456789 | rev | cut -c2- | sed 's/\(.\)/\1 /g'); do echo $d; done
8
7
6
5
4
3
2
1

The "rev" utility flips the order of our digits, and then we just grab everything from the second digit onwards with "cut". We use a little "sed" action to break the digits up into a list we can iterate over.

Then I started thinking about how to do the "doubling" calculation on every other digit. I could have set up a shell function to calculate the doubling each time, but with only 10 possible outcomes, it seemed easier to just create an array with the appropriate values:

$ doubles=(0 2 4 6 8 1 3 5 7 9)
$ for d in $(echo 123456789 | rev | cut -c2- | sed 's/\(.\)/\1 /g'); do echo $d ${doubles[$d]}; done
8 7
7 5
6 3
5 1
4 8
3 6
2 4
1 2

Then I needed to output the "doubled" digit only every other digit, starting with the first from the right. That means a little modular arithmetic:

$ c=0
$ for d in $(echo 123456789 | rev | cut -c2- | sed 's/\(.\)/\1 /g'); do 
    echo $(( ++c % 2 ? ${doubles[$d]} : $d )); 
done
7
7
3
5
8
3
4
1

I've introduced a counting variable, "$c". Inside the loop, I'm evaluating a conditional expression to decide if I need to output the "double" of the digit or just the digit itself. There are several ways I could have handled this conditional operation in the shell, but having it in the mathematical "$((...))" construct is particularly useful when I want to calculate the total:

$ c=0; t=0; 
$ for d in $(echo 123456789 | rev | cut -c2- | sed 's/\(.\)/\1 /g'); do 
    t=$(( $t + (++c % 2 ? ${doubles[$d]} : $d) )); 
done
$ echo $t
38

We're basically done at this point. Instead of outputting the total, "$t", I need to do one more calculation to produce the Luhn digit:

$ c=0; t=0; 
$ for d in $(echo 123456789 | rev | cut -c2- | sed 's/\(.\)/\1 /g'); do 
    t=$(( $t + (++c % 2 ? ${doubles[$d]} : $d) ));
$ echo $(( ($t * 9) % 10 ))
2

Here's the whole thing in one line of shell code, including the array definition:

doubles=(0 2 4 6 8 1 3 5 7 9); 
c=0; t=0; 
for d in $(echo 123456789 | rev | cut -c2- | sed 's/\(.\)/\1 /g'); do 
    t=$(( $t + (++c % 2 ? ${doubles[$d]} : $d) )); 
done; 
echo $(( ($t * 9) % 10 ))

Even with all the extra whitespace, the whole thing fits in under 100 characters! Grand Master Ed would be proud.

I'm not even going to ask Tim to try and do this in CMD.EXE. Grand Master Ed could have handled it, but we'll let Tim use his PowerShell training wheels. I'm just wondering if he can do it so it still fits inside a Tweet...

Tim checks Hal's math

I'm not quite sure how Hal counts, but I when I copy/paste and then use Hal's own wc command I get 195 characters. It is less than *2*00 characters, not long enough to tweet.

Here is how we can accomplish the same task in PowerShell. I'm going to use a slightly different method than Hal. First, I'm going to use his lookup method as it is more terse then doing the extra match via if/then. In addition, I am going to extend his method a little to save a little space.

PS C:\> $lookup = @((0..9),(0,2,4,6,8,1,3,5,7,9));

This mutli-dimensional array contains a lookup for the number as well as the doubled number. That way I can index the value without an if statement to save space. Here is an example:

PS C:\> $isdoubled = $false
PS C:\> $lookup[$isdoubled][6]
6
PS C:\> $isdoubled = $true
PS C:\> $lookup[$isdoubled][7]
15

The shortest way to get each digit, from right to left, is by using regex (regular expression) match and working right to left. A longer way would be to use the string, convert it to a char array, then reverse it but that is long, ugly, and we need to use an additional variable.

The results are fed into a ForEach-Object loop. Before the objects (the digits) passed down the pipeline are handled we need to initialize a few variables, the total and the boolean $isdoubled variables in -Begin. Next, we add the digits up by accessing the items in our array as well as toggling the $isdoubled variable. Finally, we use the ForEach-Object's -End to output the final value of $sum.

PS C:\> ([regex]::Matches('123456789','.','RightToLeft')) | ForEach-Object 
-Begin { $sum = 0; $isdoubled = $false} -Process { $sum += $l[$isdoubled][[int]$_.value];  $d = -not $d } 
-End { $sum }

We can shorten the command to this to save space.

PS C:\> $l=@((0..9),(0,2,4,6,8,1,3,5,7,9));
([regex]::Matches('123456789','.','RightToLeft')) | %{ $s+=$l[$d][$_.value];$d=!$d} -b{$s=0;$d=0} -en{$s}

According to my math this is exactly 140 characters. I could trim another 2 by removing a few spaces too. It's tweetable!

I'll even throw in a bonus version for cmd.exe:

C:\> powershell -command "$l=@((0..9),(0,2,4,6,8,1,3,5,7,9));
([regex]::Matches("123456789",'.','RightToLeft')) | %{ $s+=$l[$d][$_.value];$d=!$d} -b{$s=0;$d=0} -en{$s}"

Ok, it is a bit of cheating, but it does run from CMD.

Hal gets a little help

I'm honestly not sure where my brain was at when I was counting characters in my solution. Shortening variable names and removing all extraneous whitespace, I can get my solution down to about 150 characters, but no smaller.

Happily, Tom Bonds wrote in with this cute little blob of awk which accomplishes the mission:

awk 'BEGIN{split("0246813579",d,""); for (i=split("12345678",a,"");i>0;i--) {t += ++x % 2 ? d[a[i]+1] : a[i];} print (t * 9) % 10}'

Here it is with a little more whitespace:

awk 'BEGIN{ split("0246813579",d,""); 
            for (i=split("12345678",a,"");i>0;i--) {
                t += ++x % 2 ? d[a[i]+1] : a[i];
            } 
            print (t * 9) % 10
          }'

Tom's getting a lot of leverage out of the "split()" operator and using his "for" loop to count backwards down the string. awk is automatically initializing his $t and $x variables to zero each time his program runs, whereas in the shell I have to explicitly set them to zero or the values from the last run will be used.

Anyway, Tom's original version definitely fits in a tweet! Good show, Tom!