Syntax highlighting is backwards

Most code editors color different pieces of your program in different ways. For instance, they’ll make keywords like if bold and bright so that you notice when you’ve misspelled them. They’ll make non-executable parts like comments and documentation fainter so that you know that the computer isn’t seeing that part of the program. Take this example in Pycharm colors:

def frobnicate(swizzle):
    """Frobnicates the given swizzle."""
    pass

But isn’t this exactly backwards?

I want the syntax of my language to get out of my way, not jump into the foreground. I can easily recognize keywords, and I’d rather focus on the parts of the code with higher information density—method names, arguments, and so on.
If your code is locally readable, then there are only two times you need comments: (a) high level overviews and signposts; or (b) to call out some tricky non-local effect of the code. These are the two most important and least straightforward parts of programming: I want them emphasized, not faded out!

Take as an example the following function definition from the Zulip codebase (source):

def user_avatar_path(user_profile: UserProfile) -> str:
    # WARNING: If this method is changed, you may need to do
    # a migration similar to
    # zerver/migrations/0060_move_avatars_to_be_uid_based.py .
    return user_avatar_path_from_ids(
        user_profile.id, user_profile.realm_id)

def is the least important part of this snippet—I know it’s a function. The comment is way more important: if I update the code without reading it, I’ll probably ship a bug.

So what if we flipped these two styles?

def user_avatar_path(user_profile: UserProfile) -> str:
    # WARNING: If this method is changed, you may need to do
    # a migration similar to
    # zerver/migrations/0060_move_avatars_to_be_uid_based.py .
    return user_avatar_path_from_ids(
        user_profile.id, user_profile.realm_id)

Seems better! The keywords fade into the background and I definitely won’t forget to write a migration. I’m not convinced about bolding every comment—it’s a little obtrusive—but I don’t have any better ideas and I’d rather read too many comments than not enough.

I’ve been using this color scheme for a few weeks now and it’s been fine so far! The most noticeable effect is that my source code is way less noisy and distracting. I think I’m paying attention to comments more (and leaving them to be stale less often), but that’s a lot harder to tell so I can’t say for sure.

Random notes:

I really do want to find something better than bold for comments.
I was curious why many languages today display keywords in bold so I looked up the history of syntax highlighting on Wikipedia. It sounds like the first IDE to bold keywords may have been MacPascal, which was written for a monochrome display. In that case it makes sense: with a monochrome display and one font, they could either make keywords bold or make everything else bold, and keywords was the less-intrusive option. But now that we have 16-million-color displays, maybe it’s time to revisit that.

I started thinking about this after discovering Robert Melton’s No Frils Vim theme, which I noticed also has an option “to turn on high contrast comments rather than the default faded style, for projects where the comments are not deceit and lies.” Maybe the deceit and lies problem is why this isn’t more popular? On the other hand, it seems like fading out comments and docstrings probably makes the staleness problem a lot worse.

It’s also probably a lot easier to do this in codebases with familiar and obvious naming conventions (e.g. classes are CamelCase, functions are verbs, decorators are adjectives, timestamps start with when_…).
I tried to search around for any systematic studies of what kinds of syntax highlighting help, and I couldn’t find a single one. I’d love any pointers.

Comments

maxkwallace

June 2018

“I tried to search around for any systematic studies… "

Me too. And I also didn’t find anything. The closest I found was:

https://www.cs.cmu.edu/~ckaestne/pdf/ese12.pdf - about background colors for #ifdef statements

https://dl.acm.org/citation.cfm?doid=2846680.2846685 (free PDF available if you search the name of the article) - which discusses syntax highlighting but doesn’t seem to present any actual research.

I interviewed the HCI researcher Ben Shneiderman once in college (for a technical writing assignment). I was only interested in his early work on program comprehensibility– IIRC he did systematic studies showing that (1) indentation helps and (2) at a certain point too many levels of control structure nesting (if, for) within functions is bad (better to have separate methods)– but he was much more keen to talk about his later research on HCI, touchscreens, etc.

The general impression I got was that source code comprehensibility is not a career-making field for an academic (I have no idea why). And there seems to be little active research in this area despite its practical importance.

My own $0.02:

For programmers, keywords like “def”, “return”, etc. are strongly chunked and act as a sort of signpost– to the effect that we don’t read these words as words themselves, but rather structural information that modifies the content they’re presented with. They “feel” a certain way based on their logical function in the language.

I agree that at the level of an individual method or function they are unimportant. But there are different “tasks” that we do when programming, and reading an individual method is only one of them. Two other “tasks” are scanning a file and figuring out the relationship between a set of different methods that are used together. And I think highlighting “def” is justified for these tasks because it provides important structural information at a more macro level in the overall program.

My guess is that when syntax highlighting was first designed, comments were an easy target for deemphasis because they are less “regular”-- highlighting them doesn’t help with detecting syntax errors– and more important, they are less relevant for macro-level scanning, and tend to be read at the level of individual methods. If it’s a comment about the whole method I think we tend to chunk it into the name of the method itself rather than reading it many times.

Deemphasizing comments always bothered me philosophically but the above are my conclusions as to why it didn’t seem to cause problems for me in practice. I tend to read comments in a different way than other parts of programs, and when I do, I am focused on that comment alone, so it’s not a big deal if it’s greyed out. But everyone is used to this style now so this could be partly post-rationalization? I feel like the optimal thing would be adaptive highlighting that changes based on what task you’re doing, or setting up a keybinding to toggle highlighting for comments, but I haven’t tried this yet.

Ben

I’m interested in your comments about how syntax highlighting helps with macro-level scanning. While writing this comment I actually realized that I almost never do that! When I’m trying to get a high-level overview of code, I instead use a mix of tools like:

grep (or its various smarter alternatives)
find-as-you-type
jump to definition
“find occurrences”
the editor’s “file structure” sidebar
code folding
drawing pictures

I also find that indentation is usually fine for figuring out structure, although this is definitely at least in part due to language–the codebase I’ve worked in most is a Python server, so it has (a) significant whitespace, (b) minimal syntactic noise, and (c) almost nothing (<5% of lines) is indented more than 3 levels. Plus, we have a fairly strong convention of including a “high level overview” in each module docstring, which eliminates some of the need for code navigation at all. I could see this working differently in a different language or a codebase with different conventions.

Max Wallace

I agree that it’s codebase-dependent. After I read your reply I realized that I did more scanning at my first job than at my second, because the codebase at my first job was not as well structured, and it wasn’t always clear what the relationship between certain methods was. I agree that indentation makes a big difference. I also use all the tools you do, with the exception of the “file structure” sidebar and code folding.

I guess what I was trying to say is that I think syntax highlighting (i.e. a consistent, unique color) for keywords like “def” and “return” helps with letting us process them more like unique symbols rather than words or letters, since it differentiates them from other types of text present in code. But this is all just conjecture :)

Michael Toomim

November 2019

Don’t forget the great research by Baecker and Marcus from the 1980s on source code formatting:

http://www.cs.kent.edu/~jmaletic/softvis/papers/Baecker1983.pdf

matt lawless

syntax highlighting, meh. turn it off

paul wisehart

March 2020

you’re right about everything. Syntax highlighting is a lie. It’s the man telling you what to think. Throw off the shackles.

gdewilde@gmail.com

I went with this highlighting scheme. It was for research but turned into a bit of a joke. http://opml.go-here.nl/the-internet-view-source.php It really bothered me that I couldn’t read the comments anymore.

co-dh

You may want to take a look at semantic highlighting, or colorForth

Goblin

A somewhat orthogonal alternative on shortcomings of today’s syntax highlighting. https://stackoverflow.com/questions/13882241/is-crockford-style-context-coloring-implemented-in-any-code-editor

Alex

Here’s an idea re: bolding comments. Comments could get bolder the more sequential lines there are.

One line of comments would be a little bold.

Three lines of comments would be more bold.

Seven lines of comments would be very bold.

This idea being the more lines of information the previous developer is trying to convey, the more important it may be.

eMBee

i’d do just the opposite. make one line comments full bold, and reduce as lines are added. large comment blocks already stand out by their size. single line comments, hopefully concise and to the point are easier to miss.

John Fitzpatrick

The bolding of keywords (as opposed to identifiers) may have started prior to syntax highlighters. I remember texts for Pascal that showed keywords in bold typeface and the rest of the program in plain text.

For learning a language, highlighting the keywords makes sense. Perhaps we carried this idea to on-line syntax highlighting.

rileyjshaw

You may be interested in Literate Theme, a CLI tool I wrote to update .tmTheme files. It emphasizes comments and mutes everything else.

cemery50

December 2019

I fully agree with the need for color as a method to seperate areas of concern.It seems it should be easy to create varying styles based upon focus and switch among them.

Bob

July 2020

You want https://books.google.com/books/about/Human_Factors_and_Typography_for_More_Re.html?id=QstWAAAAMAAJ

Desi

December 2023

I ended up implementing something like this in a colourblind-friendly theme for RStudio: https://github.com/DesiQuintans/epergoes

Like you, I find that I leave comments stale less often.

Related

Programming tools I’ve recently tried

Miscellaneous interesting quotes from Coders at Work

How programmers start projects

Comments