Excel: sort + COUNTIF() = utter mess

I’m still in shock. Such a stupid Excel mistake. I should know better, but it was Monday, so… Let me tell you about it. It’s as mistake as old as the hills, but it never goes away, and you are not immune to it.

I’ll exemplify with a simple data set. Here is a list of counties by state in the US (the data source), then the number of counties by state and finally the states sorted by number of counties:

County list in Excel, counting by state and sorting using regular ranges.

I couldn’t use a pivot table, so I had to count them with the COUNTIF() function. Then the states are sorted by the number of counties. Here is the formula I’m using, a typical COUNTIF() function:

=COUNTIF($B$6:$B$3104,$D6)

The problem with Excel COUNTIF()

So far so good. But you know that, to follow best practices, you should keep the data source in a separate sheet, right? So, you point to the data source in the other sheet and point to the criteria in the current sheet.

=COUNTIF(Sheet2!$A$6:$B$3104,Sheet1!J6) 

Let’s see what happens:

County list in Excel, counting by state and sorting and error.

Excel correctly counts the number of counties, but when you sort the states by number of counties but the list becomes utterly messed up. Let me show you how the formula looks like for Texas, currently in row 6:

=COUNTIF(Sheet2!$A$6:$B$3104,Sheet1!M49)

It points not to the the row Texas is in now, but to where it was before sorting (M49, instead of cell M6). I have no idea why Excel behaves like this, but it’s annoying and the error is easily overlooked.

Quick fix, but symptom of deeper issue

You can solve the problem by removing the reference to the current sheet (we don’t need it) before sorting the data. That’s a quick fix, but try to see beyond it. I suspect this is a common mistake and corroborates my long-held belief that you should avoid formulas in Excel whenever possible. When you can’t avoid them make them more resilient to Mondays.

The optimal solution is to use a pivot table to count the counties and sort the states. If you need the COUNTIF () function, turn both the data source and the analysis into tables first. When using it inside a table is much, much safer. Here is how COUNTIF() looks like in the unsorted table:

=COUNTIF(Table1,[@State])

And here is how it looks like after the table is sorted:

=COUNTIF(Table1,[@State])

No difference, and that’s exactly the point! And here are the tables:

County list in Excel, counting by state and sorting. Correct sorting with tables.

As you can see, the formula is not changed when you sort the table, so now you can stop worrying about Excel returning the wrong results, or even worse, not being aware that could happen.

Takeaway

Whatever you do, your first step should be turning these two ranges (data source and client) into tables. Stop everything and do it now. A range that looks like a table and is expected to behave like a table should always be a table. Virtually no exceptions.

This post focus on an annoying issue, but I’d like you to go beyond that and see it the solution as an example of a more structured approach to data analysis and manipulation in Excel.