Alphabetical Sorting Must Die

If you follow Jakob Nielsen’s Alertbox, you know I’m shamelessly stealing this post title from him. He says that most of the time a list is sorted alphabetically because a) it’s easier to find a name and b) designers are lazy and don’t want to bother finding a better sorting key. While he writes from a design and usability point of view, it fully applies to data visualization too. Data (like people) “rarely think A-Z”.

Alphabetical order is one of the stupidest choices we can make when displaying data visually and in our field we do it out of ignorance or laziness.

I already discussed how to sort bar charts but here is a simple and colorful example of what you can lose when using the infamous alphabetical order. The bars are color-coded for Census Regions and Divisions. On the left the chart is ordered alphabetically and what do you get? A useless rainbow. Yes, it’s easier to find your own state, but if that’s what you want, use a table.

On the right, you can easily see which states are the fattest and which ones are the leanest. So, the first benefit of a well sorted chart is obvious. But since there is a spatial pattern in the data, this pattern emerges too: the top 10 fattest states are all in the South region (and the District of Columbia is clearly an outlier in the region)

Obse Adult Population by State

The chart on the right contains exactly the same data but it’s richer and we can process it more efficiently just because we didn’t overlook how to sort the data.

It doesn’t matter how Tufte-compliant your chart is. If the sorting key hides relevant patterns don’t use it. Always use the data itself and don’t be afraid to make more than one chart with different sorting keys. And, quoting Jakob Nielsen again, “when you reach for an A–Z structure, you should give yourself a little extra kick and seek out something better”. Amen.

4 thoughts on “Alphabetical Sorting Must Die

  1. As you have pointed out elsewhere charts should be interactive….i usually build sorting options into the display to allow the user to view the data in various ways. Of course for this dataset alphabetical would be optional, not the default.

  2. Derek: The Tufte-compliant thing? Didn’t mean it. Just wanted to say that you can have a good data-ink ratio and everything that we associate with Tufte (formatting-wise) and still have the wrong chart just because of a stupid choice. And yes, Tufte actually discusses sorting keys, at least in the Challenger o-ring data analysis.

  3. It is interesting to compare the obesity rates to the political alignments of States as shown here:

    http://mdhealy.home.sprynet.com/Elections_1904_2008.png

    I trust the color coding is obvious. My sort order for the states used a simple exponential decay: if the Republican got their electoral votes in 2008 that was 1 point, for 2004 an R vote was 0.8 point, for 2000 an R vote was 0.64 point, etc. Sort by total points. My idea was, consider both consistency and recency. Basically States fall into three groups in my plot: States that have been solid D in recent elections, solid R, and swing States. Each group is roughly a third of the States.

Comments are closed.