Syntax highlighting is backwards

June 2018

Most code editors color different pieces of your program in different ways. For instance, they’ll make keywords like if bold and bright so that you notice when you’ve misspelled them. They’ll make non-executable parts like comments and documentation fainter so that you know that the computer isn’t seeing that part of the program. Take this example in Pycharm colors:

def frobnicate(swizzle):
    """Frobnicates the given swizzle."""
    pass

But isn’t this exactly backwards?

Take as an example the following function definition from the Zulip codebase (source):

def user_avatar_path(user_profile: UserProfile) -> str:
    # WARNING: If this method is changed, you may need to do
    # a migration similar to
    # zerver/migrations/0060_move_avatars_to_be_uid_based.py .
    return user_avatar_path_from_ids(
        user_profile.id, user_profile.realm_id)

def is the least important part of this snippet–I know it’s a function. The comment is way more important: if I update the code without reading it, I’ll probably ship a bug.

So what if we flipped these two styles?

def user_avatar_path(user_profile: UserProfile) -> str:
    # WARNING: If this method is changed, you may need to do
    # a migration similar to
    # zerver/migrations/0060_move_avatars_to_be_uid_based.py .
    return user_avatar_path_from_ids(
        user_profile.id, user_profile.realm_id)

Seems better! The keywords fade into the background and I definitely won’t forget to write a migration. I’m not convinced about bolding every comment–it’s a little obtrusive–but I don’t have any better ideas and I’d rather read too many comments than not enough.

I’ve been using this color scheme for a few weeks now and it’s been fine so far! The most noticeable effect is that my source code is way less noisy and distracting. I think I’m paying attention to comments more (and leaving them to be stale less often), but that’s a lot harder to tell so I can’t say for sure.

Random notes:

Enjoyed this post? Get notified of new ones via email or RSS. Or comment:

Public (as ) submit ⤇

maxkwallace

“I tried to search around for any systematic studies… “

Me too. And I also didn’t find anything. The closest I found was:

https://www.cs.cmu.edu/~ckaestne/pdf/ese12.pdf - about background colors for #ifdef statements

https://dl.acm.org/citation.cfm?doid=2846680.2846685 (free PDF available if you search the name of the article) - which discusses syntax highlighting but doesn’t seem to present any actual research.

I interviewed the HCI researcher Ben Shneiderman once in college (for a technical writing assignment). I was only interested in his early work on program comprehensibility– IIRC he did systematic studies showing that (1) indentation helps and (2) at a certain point too many levels of control structure nesting (if, for) within functions is bad (better to have separate methods)– but he was much more keen to talk about his later research on HCI, touchscreens, etc.

The general impression I got was that source code comprehensibility is not a career-making field for an academic (I have no idea why). And there seems to be little active research in this area despite its practical importance.

My own $0.02:

For programmers, keywords like “def”, “return”, etc. are strongly chunked and act as a sort of signpost– to the effect that we don’t read these words as words themselves, but rather structural information that modifies the content they’re presented with. They “feel” a certain way based on their logical function in the language.

I agree that at the level of an individual method or function they are unimportant. But there are different “tasks” that we do when programming, and reading an individual method is only one of them. Two other “tasks” are scanning a file and figuring out the relationship between a set of different methods that are used together. And I think highlighting “def” is justified for these tasks because it provides important structural information at a more macro level in the overall program.

My guess is that when syntax highlighting was first designed, comments were an easy target for deemphasis because they are less “regular”-- highlighting them doesn’t help with detecting syntax errors– and more important, they are less relevant for macro-level scanning, and tend to be read at the level of individual methods. If it’s a comment about the whole method I think we tend to chunk it into the name of the method itself rather than reading it many times.

Deemphasizing comments always bothered me philosophically but the above are my conclusions as to why it didn’t seem to cause problems for me in practice. I tend to read comments in a different way than other parts of programs, and when I do, I am focused on that comment alone, so it’s not a big deal if it’s greyed out. But everyone is used to this style now so this could be partly post-rationalization? I feel like the optimal thing would be adaptive highlighting that changes based on what task you’re doing, or setting up a keybinding to toggle highlighting for comments, but I haven’t tried this yet.

Ben

I’m interested in your comments about how syntax highlighting helps with macro-level scanning. While writing this comment I actually realized that I almost never do that! When I’m trying to get a high-level overview of code, I instead use a mix of tools like:

  • grep (or its various smarter alternatives)

  • find-as-you-type

  • jump to definition
  • “find occurrences”
  • the editor’s “file structure” sidebar
  • code folding
  • drawing pictures

I also find that indentation is usually fine for figuring out structure, although this is definitely at least in part due to language–the codebase I’ve worked in most is a Python server, so it has (a) significant whitespace, (b) minimal syntactic noise, and (c) almost nothing (<5% of lines) is indented more than 3 levels. Plus, we have a fairly strong convention of including a “high level overview” in each module docstring, which eliminates some of the need for code navigation at all. I could see this working differently in a different language or a codebase with different conventions.

Max Wallace

I agree that it’s codebase-dependent. After I read your reply I realized that I did more scanning at my first job than at my second, because the codebase at my first job was not as well structured, and it wasn’t always clear what the relationship between certain methods was. I agree that indentation makes a big difference. I also use all the tools you do, with the exception of the “file structure” sidebar and code folding.

I guess what I was trying to say is that I think syntax highlighting (i.e. a consistent, unique color) for keywords like “def” and “return” helps with letting us process them more like unique symbols rather than words or letters, since it differentiates them from other types of text present in code. But this is all just conjecture :)