Wednesday, 19 September 2012

Turning letters into numbers in Ruby (Excel style)

Today I was writing some code to update an Excel file with more recent data from various sources. I was using JRuby with jruby-poi, it provides a nice API and so far it's the only gem I found that can open and edit an xlsx file without breaking it.

I had an interface problem however, I was trying to provide the people that will use my code an easy interface, so I needed them to be able to pass me [excel's] columns as letters (A, B ,C...), but for editing with poi the interface is the object Rows, an Enumerable with numbered indexes like an Array, that is, I can't do rows['A'][1], I must do rows[0][0].

Googled around for a few ideas and I came up with this (it works up to column AZ as far as I know)

class String
    def to_number
        number ="A-Z", "1-9a-q").to_i(27)
        number -= 1 if number > 27
        return number

It works fine with strings that are just an capital letter, I didn't do a sanity check because you might wanted convert a bunch of letters in the same string, also because your column might be something like 'AA' or 'AB'. If you were looking for the simple answer you can stop reading here.

I'm gonna math-nerdout here a bit to explain what is happening:
the method "tr" here will replace everything from "A" to "I" with the numbers from 1 to 9, everything else will be replaced with the letters from "a" to "q" which are the numbers and letters used in the base 27 numeric representation.

For those of you unfamiliar with numeric which aren't base 10, they are represented as such:
0 to 9 stays the same, but everything after that are letters from English alphabet.

So 10 in base 27 is 'a', 26 is 'q' (the last letter used by base 27) and 27 is '10'.

(Don't ask me how are numbers represented in any base greater than 36 (where 35 is 'z'), Ruby won't accept any greater base anyway.)

The argument 27 in the "to_i" call is to tell ruby that the string is a number in base 27 for when it does the conversion to Fixnum (you can also "to_s 27 "if you want to convert a number to base 27 ).

Let's say your string was 'AA', the tr method will turn it into '11', this is not "eleven", in base 27 this is 28*. But the result we wanted is 27, not 28 (AA is column number 27), thus the "-1 if number > 27".

I could explore this solution to come up with some better math to make sure this method worked correctly even if you're looking for column 'ZZZZZ', but I don't need it, as this was a very small problem from the big problem I was facing today.

*Trying to explain this in layman's terms (since I myself am a layman) 
Normally we represent numbers in base 10, this means that when we get to the number "ten" it kinda loops the representation and goes back to zero preceded by a 1, this one indicates how many "full loops" we've done to the base so we represent the number ten like so: "10".

However, if we increase the base , we need more symbols, so at base 20 when we reach "ten" we can't represent as "10" (that would be twenty) because we haven't looped it yet, so we just borrow letters till we have enough symbols to represent all numbers in one loop of the base.

In the case of base 20,  we represent one as "1"; nine as "9", ten as "a", nineteen as "j", twenty as "10".