Get off my shoulders, said the giant

Dear Stephen Few

I’m writing this assuming that my book Data at Work was one of the targets of your post “Data Visualization Lite”. If that is the case, thank you for spending some of your time reading the book. When I started my humble blog, never in my wildest dreams I though that would happen.

And now you say I wrote a lite book. At first, I couldn’t disagree with you, you know? I don’t think I have the right talent to write a book of substance like, say, Bertin’s Semiologie Graphique, or even your Show me the Numbers. Then you said books like mine “introduce errors and provide bad advice”. According to you, they are basically polluting an otherwise clean and bright day. They only add noise and, as we all know, only signal matters. My heart sank.

First, let’s put the errors aside. If you mean factual errors, I know they exist, I actually have a page on the book’s companion site for each chapter where readers can point them out. If you are kind enough to send me a list of such errors I will certainly correct them in a future edition, just like you did with Show me the Numbers. We are humans, we make mistakes, we don’t like them. We’ll get to “bad advice” in a minute. Let’s first discuss your post and your comments.

Your lite post

I often misread and misrepresent the richness of your thought. You told me so several times (a few in the comment section of older posts). So, chances are I’ll do it again. Sorry, it’s not my intention, it’s my limitation. If, by any chance, I’m reading your post correctly, I can only conclude that core data visualization principles are well established, and one just needs to read your book Show me the Numbers to become familiar with them. You wrote them clearly and better than anyone else.

You accept that much remains to be done. People in this field should apply sound scientific methodologies to study certain details or areas you haven’t mapped already, or make themselves useful and apply your principles to a specific tool. The layperson in the office shouldn’t worry about different perspectives, at least for the time being. Your book is basically the only source this person needs. A large majority of the books that in some way overlap yours are filled with errors and will confuse the reader, and they should never be published because, among other reasons, they harm our productivity. If, by any chance, someone can come up with a few interesting insights that you happen to agree with, but you haven’t mentioned in your book, this person should write a short blog post about them, and refer to your book for the core concepts.

You have been an independent voice. Many of us admire you for your assertive positions against vendor marketing and data visualization fads. But I think we all miss the Stephen Few who wrote “the information visualization research community produces many innovations each year, which I’m always excited to discover in the research literature or during visits to research labs.” Perhaps data visualization is now a bare and sterile place. Perhaps you’ve changed.

Models and voices

If you truly believe that “Those books written since 2004 that aren’t filled with errors and poor guidance, with few exceptions, merely repeat what has been written previously.” I don’t think there is much to talk about. But you already confessed you derive some pleasure from these discussions, so I’ll assume you wanted to be provocative. You’ve succeeded.

Let me characterize the data visualization community as containing an ecosystem of models and voices. By “model” I simply mean a set of principles, ideas, design options and other objects put together in a consistent fashion. If you prefer 3D effects and garish colors and use them consistently, that’s your model. If you have a more minimalist approach, that’s your model. And your model has consequences in the way your audience takes advantage and reacts to your outputs.

There are several models, but three can be identified easily: yours, Microsoft’s and Tufte’s. We all know that Microsoft’s model, as implemented in Excel, sucks. Microsoft should have listen to you when it changed the graph engine for Excel 2007. Tufte brought minimalism to data visualization and tried to convince us that this aesthetics was not aesthetics at all, it was the only acceptable design in data visualization.

Obviously, minimalist aesthetics requires skills that the average office user lacks. As we’ll see below, much of your model accepts Tufte’s principles, but with a twist: everything that vaguely resembles non-rational must be removed, and aesthetics above all.

Let me exemplify with three little words: aesthetics, beauty and elegance. Unlike Tufte, who writes a chapter with “aesthetics” in the title, you never mention the a- word. You do mention “beauty”, as something artists create. According to you, design in data visualization serves uniquely to improve communication. I like the word “elegance”, and so do you. For quite some time, I thought you were using it in its usual meaning (simple but sophisticated aesthetics, or something similar). Then I read this in your book: “The word elegance comes originally from the Latin eligere, which means to choose out or to select carefully.” I thought this was wrong, because the word apparently comes from elegantia, but then I found this on the Online Etymology Dictionary for elegant:

late 15c., “tastefully ornate,” from Middle French élégant (15c.), from Latin elegantem (nominative elegans) “choice, fine, tasteful,” collateral form of present participle of eligere “select with care, choose.” Meaning “characterized by refined grace” is from 1520s. Latin elegans originally was a term of reproach, “dainty, fastidious;” the notion of “tastefully refined” emerged in classical Latin.

Surely you jest! While your note is not completely wrong (but “collateral form” doesn’t mean “comes from”), it’s hard not to see it as cherry-picking. And you do it all the time in your model: go to great lengths to make it bullet-proof rational, even if you have to sacrifice a lot along the way.

Voices are unavoidable. The nature of data visualization leads to the proliferation of multiples perspectives. Some of them cluster together, and some will overlap to a certain degree: for example, if people recognize the need to take visual perception into account, the overlap will always happen. I welcome the emergence of competing data visualization voices. I see it as a sign of strength, not weakness. They act as a model amplifier. They should be cherished, not banned.

Everyone in the visualization community should go beyond their social media presence and come up with their own voice. They must spell it out, and make it more than the sum of all blog posts. They have to start from scratch and try to make sense of all they think they know. Finding a consistent narrative will require hard work. Some pieces will not fit together, and they’ll find holes in unexpected places. The final voice will overlap many other voices, but it will not be less personal because of it. It can even become a full model. Whether this turns into a book or not is irrelevant, because that’s a different logic. At the minimum, each person should make available an explicit stylebook. I’m sure some people would gladly volunteer to curate critically these stylebooks.

If data visualization is “progressing at snail’s pace” perhaps one of the reasons is not that too many lite books are published, but too few. Short-term thinking is encouraged by social media, and some people will turn into talking heads for their employers. If a person doesn’t sit down and think about the ramifications of his/her ideas, discussing other people’s voices and models will be less fruitful. I never reviewed books on my blog (well, just one) because I was unsure about my own thinking. I had to discover it by writing the book. Some people suggested I should focus more on Excel. You can’t imagine how I hated that idea. That’s why I strongly disagree with you when you advise people to write articles instead of books. That’s a terrible advice. You say “We don’t need voices to reflect the spirit of our time; we need voices to challenge that spirit—voices of transformation. Demand depth.” Demand depth, but please stick to blog posts (or maybe tweets)? Humm…

Also, how do you ask for “voices of transformation” and, at the same time, you offer your book as a single, prêt-à-porter reference for everything data visualization? You should encourage diverging approaches, since they will certainly make yours stand out. A bit of noise is actually useful to recognize signal.

All models are wrong, some models are useful. This fits like a glove. No model can capture the richness of data visualization. You didn’t like when I called you positivist, a few years ago. I changed my mind about you and Tufte, but I still think you’re a positivist. You believe every single chart must be a virtuous cocktail of rational decisions that make it effective and thus paves the way to enlightenment. Everything else is verboten. Everything else opens a Pandora box. If you don’t have everything under control, entropy creeps in. Essentially, your data visualization model is a cooking robot. And, I have to admit, you often are right: I saw this recently, when the graphs in a publication were updated by someone without the necessary skills and no understanding of the rationale behind the original design. It’s sad, really.

When people think you are not flexible enough, I admire you more, not less. We need solid and consistent models that don’t change with the latest fad. The model must be changed as a whole, and I believe you would change if you saw reasons for it to be changed. I truly hope so.

Revisiting Show me the Numbers

Telling people that they shouldn’t explore because you already provide the best of all possible paths is defenseless. And if they are willing to blindly accept your advice, do you suggest they should follow the first or the second edition of your book? Because they are different, and there are errors in the first one that you corrected in the second one. You are human, and you don’t want us to be.

I only felt the need to reply to your post when I read this amazing answer to Alberto Cairo:

“… feel free to provide examples of content [fusion_builder_container hundred_percent=”yes” overflow=”visible”][fusion_builder_row][fusion_builder_column type=”1_1″ background_position=”left top” background_color=”” border_size=”” border_color=”” border_style=”solid” spacing=”yes” background_image=”” background_repeat=”no-repeat” padding=”” margin_top=”0px” margin_bottom=”0px” class=”” id=”” animation_type=”” animation_speed=”0.3″ animation_direction=”left” hide_on_mobile=”no” center_content=”no” min_height=”none”][in books published during the last decade] that doesn’t appear in ‘Show Me the Numbers’ or that does appear but is presented in a way that extends our understanding of that content. I’ve observed that when these books depart from the content that exists in ‘Show Me the Numbers,’ they often introduce errors and provide bad advice.”

So, basically there is a canon that shouldn’t be challenged, and if people deviate from the righteous path they commit some kind of apostasy. That sounded like a fun challenge.

I can’t really go through all the books I bought during the last decade, but I wrote one. I’m less interested in comparing the books than the models, but here are a few differences between my book and yours:

You obviously write better English;
A large format is great for a data visualization book, but I’m not sure if you took full advantage of it; also, while business visualization should use large screens, mobile should at least be taken into account. You don’t need a large page size for this.
I used real data, not only because it is more relatable than a few dummy data points but because it can potentially add a small level of uncertainty that made-up datasets lack.
You use generic examples (not specific to data visualization) when discussing gestalt laws. I try to give a data visualization example for each law.
You didn’t feel the need to illustrate many of your ideas with a corresponding graph. I did.
You write extensively about table design. I preferred to focus exclusively on graphs.

Since you use your book as a reference and mention the last 10 years, I’m assuming that you want us to use its first edition. That’s fine with me. Let’s go through a few topics.

Value-encoding methods in graphs

You say graphs contain quantitative and categorical components, and that the “structural variations of graphs are defined primarily by differences in the components that encode quantitative values (e.g., lines versus bars).” Also, quantitative values can be encoded using points, lines, bars and shapes, but shapes are not effective and should be crossed out. This is where your famous quote comes from: “I don’t use pie charts, and I strongly recommend that you abandon them as well.” You dismiss other objects, like bubbles.

I’ll try to stick to your book, but I couldn’t help noticing that you downplay bubble charts because they are rarely needed in business communication. In your article “Leave pies for dessert”, you say that the only advantage of pie graphs is that they are able to display cumulative values, but that’s rarely used. So, you want to change people’s minds but current practices are OK if something doesn’t fit your model. I believe bubble charts are fine, provided we see them like scatterplots that add extra, but not critical, information when encoding the circle size (muting the bubbles and placing a dot at the center helps). Interestingly, you accept this “secondary information” when using stacked bar charts but not when using bubbles.

I don’t share your view. I see no reason to remove areas or use objects like bars, when the traditional primitives (point, line, area and volume) makes more sense. We encode values using points, so that we can evaluate their distances, then we use dots, lines and areas to make these points visible, but those are design choices. This is far from new, but It helps us become aware that all graphs share the same roots and, because of that, creating new ones or improving existing ones becomes easier. Consistent with this view, in a sense there are no categorical axis: if you have a single quantitative variable you can only measure distances along a single axis. The categorical axis is nothing more than an offset from the opposite axis to make it easier to display and label categories. Being aware of this can be useful when sorting data points, a common issue with bar graphs. You talk about lines, vertical bars and horizontal bars, but that’s confusing, because they are variations of a single entity, line. It does serve your purpose of avoid adding geometric primitives in a more creative way.

Because it separates data mapping from graph design, my perspective represents a fundamental departure from your model. I strongly believe that, far from from being an error or a bad advice, this is a better starting point to understanding graphs, and I find it hard to associate this with your notion of data visualization lite.

Relationships in graphs

You say that there are seven types of relationships in graphs: nominal comparisons, time series, ranking, part-to-whole, deviation, distribution, correlation.

I defined a different set of relationships. I don’t see the usefulness of “nominal comparison” because I believe there should be a data-based sorting key in all charts. “Deviation” is more interesting, but not enough to be in its own category (it’s more data-specific). I used an “Order and Ranking” category that includes all these point comparisons. I also used “Profiling” where I include small multiples and similar constructions, because we should see these constructions as a single graph and not as multiple graphs.

Graph design solutions

This is an interesting section in your book. You take each of the previously defined relationships and select the value-encoding methods that best suit them. You came up with this:

Nominal comparison: bars and points
Time-series: lines, points and bars
Ranking: bars and points
Part-to-Whole: bars
Deviation: bars, lines and points
Distribution: bars, lines and points
Correlation: points and bars.

First conclusion: you do believe that every single relationship can be represented using a bar graph. Maybe Amanda Cox is not totally wrong, after all (teasing). Since you crossed-out areas, you don’t really have much choice.

This is one of the problems in your model: instead of a generic concept of line, you specify lines as connectors and talk about vertical and horizontal bars. Also, there isn’t much to be creative about with points, and you remove areas from the options. You end up with a very limited number of choices to make your graphs. For example, there is no support in this model to praise horizon charts.

I think I now understand why you disliked Abela’s classification (and mine, I presume) so much. Instead of actual graph types, it makes more sense to identify the best encoding objects for each type of relationship. This means that people new to data visualization will not automatically assume that there is a type of graph for each type of relationship.

I like this idea, I really do. Problem is, if my definition of components (point, line, area) gets too open because people basically can design a graph for each relationship using variations of any of component, your definition (line, vertical bar, horizontal bar, point) is too closed, and using a bar graph for each relationship is a serious possibility.

My own preference goes to suggesting a few graph types for each relationship but making it clear that each graph type can be used in more than one relationship and that small changes in a graph can impact its nature, making it more useful in one category and less useful in another. I know you disagree.

The big issue in this section is how to represent part-to-whole relationships. This was already discussed ad nauseam, so there is no point in doing it again extensively. You think bar graphs should be used to display these relationships. I believe percentages are not enough to define part-to-whole relationships and the whole must be visible. I played with the idea of having a non-stacked pie graph, but the right solution is not to use graphs that display something else, but to encourage people to go beyond the uninteresting part-to-whole analysis.

Visual Perception

If I had any doubts regarding your positivist attitude towards data visualization, the chapter on visual perception would remove them once and for all.

Most data visualization books recognize the need for at least a basic understanding of how human visual perception works. Yours and mine are no exception. There are two major differences, though. I believe personality and social/cultural context are relevant enough to be discussed in an autonomous chapter. The second difference is that you overly emphasize the bottom-up dimension (how visual stimuli are acquired by the eye-brain system) and hardly mention the top-down process (the brain is not a passive receiver of visual stimuli, it engages actively in its selection).

Both the top-down process and context add a little sand to the beautiful mechanics of visual perception, where things are complex but ultimately we can explain and model them. To see, like you do, the eye as a camera and the eye-brain system like a computer is reassuring. We are in control, and taking rational decisions is easy. Positivism, again.

As I said above, graphs are scarce in this chapter. I think readers would benefit more if each of the sections were illustrated with a data visualization example and not only verbalized or illustrated with a generic example.

General design for communication

If I was puzzled by the lack of graphs to illustrate perception-related concepts, I find their total absence in this chapter even more bizarre.

The chapter begins with a paragraph that summarizes your approach to data visualization design: creating beauty is the work of artists, but we are here to communicate. Just like you don’t want culture messing with the mechanics of perception, you don’t want beauty wreak havoc on your rational design.

General and component-level graph design

In the final chapters of your book you go through the details of graph design. Most of them are consistent with your model and reinforce its principles. I don’t think you have a compelling answer to the problem of scale breaking in line graphs, and I have to mention the broad consensus among “experts” and experts who oppose using dual axis graphs. And I’m sure you’ll agree now that the “Correlation Bar Graph” is really, really, a bad idea.

It’s also interesting to note that this notion of when people “depart from the content that exists in ‘Show Me the Numbers,’ they often introduce errors and provide bad advice” actually began with you, when you thought that using a log scale with a bar chart was a smart idea.

Color

I follow Tufte’s Envisioning Information and dedicate a full chapter to color, where I tried to classify all its major uses in data visualization, discuss color palettes or color versus shades of gray. My goal was to find ways of “avoiding catastrophe” (Tufte’s words).

Being such an elusive but fundamental component of graph design, you couldn’t possibly ignore it. But, except for a passing reference about the use of color in exotic countries, and the need to maintain two versions of the same hue to manage attention, there is nothing interesting to learn about color in your book. I can’t say this surprises me.

To wrap up: Here be dragons

Let me reiterate: this is not a review of your book. If it were, I would certainly emphasize its historical significance and how you managed to create the best data visualization model for a business context. I don’t forget that. But, like any other model, it has its flaws. By defining it as the golden standard and challenging your readers to provide examples of things that were not in the book or were said in a better way, you inevitably asked us to reread your book with a more critical eye.

What I found was an attempt to remove all risk, all irrationality, all the little things that could jeopardize this positivist notion of data visualization as mostly objective. No wonder you describe my book as lite. I tried to push readers into unknown territories. I warned them about the dragons, I told them to pursue at their own risk, but I provided a map as detailed as possible of my own exploration. I do regret any errors, but I’m not sure if avoiding them at all cost would be the right strategy.

I’m sure you’ll reply and once again dismiss everything as error and poor advice. At most there will be your “yes, but rarely used”, but I’m not counting on it.

You wrote your book for farmers, I wrote mine for explorers. We need both, but after reading your book again I feel even more reassured about my choices. Today, I would go further.

Jorge Camoes

PS: I have two things to add. I like to keep private communications private. I don’t see social media as an extension of those private communications. I hope you agree that everything I wrote above is fully independent from them. Also, I was suggested that, for full disclosure, I should say here that you read a few draft chapters of my book. You not only read it but you also sent me very useful feedback, for which I’m grateful. I already said it in the book, but I agree that I should repeat it here.

PS2: I don’t know if this post is mere coincidence or a half-response to this letter. Either way, good post on dealing with errors in public work (and how our errors can haunt us long after we forgot them).
[/fusion_builder_column][/fusion_builder_row][/fusion_builder_container]