Greetings, I come in peace!

Today, I’d like to hit the ground running by telling you how to solve my favorite interview question, step-by-step. I’ve asked this question to over 250 candidates and it does a great job testing basic software skills.

First, the question:

We know that spreadsheet applications use letters to name their columns. For example, the first column is labled 'A' and second is 'B'. Eventually, we get to 'Z', and then 'AA', 'AB', and so on. Let's write a function that, when given the name of a column, returns its index. So 'A' would return 1, 'B' would return 2, 'Z' would return 26 and 'AA' would return 27.

Let’s talk a little bit about these types of problems. There are systematic techniques to approach problem solving and our goal is to explicitly define them. That way, they become concrete tools in your problem solving toolbox. Once you have these tools, if your intuition fails you, you can reach into your toolbox and see if anything can get you unstuck.

The king of all of techniques is understanding the characteristics of the input and output of the problem. This will lead you to ask clarifying questions and asking questions is part of the problem. The interviewer is trying to simulate a problem solving scenario. If somebody walked up to you and said, “Please build me a website with a few pages.” What would you do? You would want to ask a million questions, right? This is no different.

So what questions should we ask?

Questions about input:

- Will the input be a String?
- Do we need to check the input for errors?
- Is there a limit to the input?

Questions about output:

- Should the output be an integer?
- What should we output when there’s an error?

In this case, I’ll say the input will be valid. That is, it will be a valid, non-null String with all uppercase characters A-Z. The input will also be of arbitrary length. Meaning, don’t assume it will be one, two or even three letters in length. The output will be an integer, though I understand there will be issues with overflow. Let’s not worry about that right now.

Ok, so we’ve got our problem, we know inputs and outputs. The next technique? Solve it like a human. That’s right, it turns out we are human problem solvers and out purpose is to teach a computer how to solve a computer.

You: Shouldn’t we be thinking in code?!

Meh, we’re humans, and we’re still figuring out the problem itself. We can ‘think in code’ later.

Back to the question. We are humans trying to solve this problem. There’s a column, we are trying to figure out if it’s the first, the fifth or the one hundred fifth column. Let’s look at some test cases:

- ‘A’ returns 1
- ‘Z’ returns 26
- ‘AA’ returns 27
- ‘BA’ returns ?

Looking at the pattern, ‘BA’ comes after ‘AZ’ and before ‘BB’. So it should be one more than ‘AZ’. What’s ‘AZ’? If ‘AA’ is 27, then ‘AB’ is 28, which means, from ‘AA’ to ‘AZ’ is another 26 characters. ‘AA’ is 27, so ‘AZ’ is 52. So we see this pattern of adding 26 everytime we go across. So single digits are easy, ‘A’ to ‘Z’ is 1 to 26. ‘AA’ to ‘AZ’ is 27 to 52. ‘BA’ is 53.

Let’s line up our inputs and outputs:

A B...Z AA...AZ BA...BZ... ... ZA...ZZ AAA | | | | | | | | | | 1 2...26 27...52 53...79... ... ??...?? ???

Here’s a small leap we’ll need to make. This problem is a form of counting! When we use numbers to count, we know that each digit has a certain value. Each ‘1’ in ‘111’ has a different value. So it’s the same with the names of these columns! So how do we assign values to each character instead of each digit?

With normal numbers (i.e. base 10), we know that there’s a one’s place, a ten’s place, hundred’s place. Each digit in a number is related to its own value and a power of 10.

111 is really 1 * 10^2 + 1 * 10^1 + 1 * 10^0.

You: !? What if I didn’t know that?! There’s no way I could have never solved this.

Bad news: You’re probably right. Computer science, programming, etc, kind of assumes you know a bit of math. Good news: You’ve identified a hole in your knowledge. Converting between different number bases is something a lot of programmers take for granted. You’ve heard of binary? Hexadecimal? Those are different ways of counting. It’s fundamental to so many things in computer science. It’s one of things that I’m looking for in an interview. If this hole exists, I have to worry about what other fundamental holes exist in the candidate’s understanding of computer science. Could this candidate write software? Probably. Will this person be able to innovate by thinking abstractly and creating generic, reusable solutions? Maybe not today, but one day 🙂

Back to the problem! We know we need to assign a values to each character, just like we do for digits. Except instead of 10 digits, we have 26 characters. Since we need assign a value to each character, we’ll need to iterate over each character in the input String. So to compare numbers and the column names:

123 = 1 * 10^2 + 2 * 10^1 + 3 * 10^0 ABC = 1 * 26^2 + 2 * 26^1 + 3 * 26^0 So for each character we assign the value: value_ofcharacter * base ^ power where the power is related to its position. The left most character gets the highest power and goes down from there.

Here we go, it’s time for code! We are writing a function, it takes input and returns output. We’re going to iterate over the input, assign a value and sum it up. We’ll need some place to collect the sum. Lastly, you’ll notice we’ll need a way to convert a character to a value. With digits, it’s a non-issue, but with characters, how do we do it? Now it’s time to use one more technique: encapsulation. If we can describe a piece of functionality, but don’t know how to do it, we can encapsulate it into a “helper function”. Observe:

public int getColumnIndex(String col) { int index = 0; // collect sum as we iterate. int base = 26; // since we're dealing with 26 characters for (int i = 0; i < col.length; i++) { char c = col.charAt(i); int power = ???; index += getValue(c) * Math.pow(base, power); } return index; }

Oh no, how did you write all that code? Step by step, honestly. I am assuming you know how to manipulate Strings in your favorite language; if you don’t, that’s a hole. However, if you don’t know how to use exponents, I’m less concerned. It’s less common and something very easy to look up. Remember, if you’re stuck, you can always encapsulate functionality you don’t know to implement.

So far, the code looks good, but the power variable needs some work. We know from our pattern, the first character starts out at the highest power and goes down from there and we are accessing each character using the counter i. We know the counter, i, starts at zero and goes up to length-1. Our power starts at length-1 and goes down to zero. So how do we relate the two? This part is another leap. Indexing into arrays is tricky and as an interviewer, I want to see you do it effectively. There are a few ways to do this part, but the following is the most common approach I have seen. The power for a given character is length-1-i. Did you make the leap? If not, array indexing is another hole in your knowledge. If you want practice with array indexing, write a solver for Boggle.

In any case, the code now looks like this:

public int getColumnIndex(String col) { int index = 0; // collect sum as we iterate. int base = 26; // since we're dealing with 26 characters for (int i = 0; i < col.length; i++) { char c = col.charAt(i); int power = col.length()-1-i; index += getValue(c) * Math.pow(base, power); } return index; }

The last part is the part we encapsulated: the getValue function. Some candidates implement it with a pre-populated map of char to integer or index into string that is the alphabet. Others know that in some languages, characters can be cast as integers and you can perform arithmetic on characters. I personally love when candidates use this last approach. It’s efficient and shows understanding of a non-obvious part of the language which might be an indicator of overall mastery. Below is my favorite implementation, but if the candidate didn’t get this part, oh well. Either you know this part or you don’t, it’s not the most important part of the question. For you, however, it’s a hole if you didn’t know how to do this.

public int getValue(char c) { return c - 'A' + 1; }

If it’s not clear, what’s happening, here’s how it works. We know we want ‘A’ to be worth 1. In languages where characters can be cast as integers, you can do arithmetic like ‘B’ – ‘A’ => 1. ‘A’ – ‘A’ => 0, so for our purposes, we need to add one. Some folks know about the ASCII code. This is an old standard encoding characters as integers. In the ASCII scheme, ‘A’ == 65. So some candidates will hard code the subtraction as c – 65 + 1, or even, c – 64. This is cool, but less than ideal. For one, it makes it really hard to read for other programmers if they’re not familiar with ASCII. Second, you are banking on the fact that ‘A’ will always equal 65. If it were to change, for some reason, code would still compile but it wouldn’t work. If arithmetic between characters were to change, the compiler would highlight that for you. Beyond the specific reasons, the meta-point is that hard coding values is asking for trouble. Don’t ask for trouble.

We did it! We can now convert the name of a spreadsheet into it index. As they say, it’s the journey not the destination. So what did we learn along the way?

Tools:

- Understand the Inputs and Outputs – Understand the limits, restrictions, range of the input and output.
- Solve it like a human first – When problem solving, don’t jump to code. Solve an example input by hand and then go back an examine the steps you took.
- Establish a pattern – Sometimes the pattern between inputs and outputs will shed light on the problem. Note that sometimes the pattern is confusing. If nothing else, once you think you have a solution you can reference your pattern.
- Encapsulation – This is one of the most important tools you can have. As a programmer, you need to be able to encapsulate logic all the time. Sometimes by necessity, but mostly by design. We’ll talk a lot more about this, but this concept manifests itself all the time in so many ways. For problem solving, being able to describe functionality you can’t yet implement might give you the building blocks to solve the broader problem. Once that is solved, you can focus on solving a smaller sub-problem.

Holes we might have found:

- String manipulation
- Number Bases (Radix)
- Array Indexing
- Math library
- ASCII encoding

Want to test yourself on these concepts? Try solving the following problem and email me your solutions.

Arizona license plates take the form TCZ2341. That is, three characters followed by four digits. The first license plate issued was AAA0000 and the last one will be ZZZ9999. Given a license plate, return the remaining number of license plates that can issued after the given license plate.

Can’t get enough? Try answering either question recursively.

Really, email me your solutions, I’m happy to provide feedback!

Thanks for reading!