State of the Terminal

This is a companion article to my talk at Neovimconf 2023.


I have been using Vim/Neovim as my full time text editor for close to 10 years. I’ve spent a lot of time in the terminal and have become very aware of the many flaws and idiosyncrasies of this bizarre platform. But I also think it gets a lot of things right! And I’m not alone in this belief: terminal based tools are still widely popular even in the presence of many alternatives (the StackOverflow developer survey shows that Neovim is the “most loved” editor 3 years in a row).

It’s only been in the last couple of years that I’ve begun to dig deep into the inner workings of how terminal emulators, and the applications that run inside of them, really work. I’ve learned that there is a lot of innovation and creative problem solving happening in this space, even though the underlying technology is over half a century old1.

I’ve also found that many people who use terminal based tools (including shells like Bash and editors like Vim) know very little about terminals themselves, or some of the modern features and capabilities they can support.

In this article, we’ll discuss some of the problems that terminal based applications have historically had to deal with (and what the modern solutions are) as well as some features that modern terminal emulators support that you may not be aware of.

But first, some (very) brief history.

Background & History

Most terminal emulators today can directly trace their roots back to the DEC VT100. The VT100 was not the first video terminal, nor was it the last, but it was the most popular (at the time). And as we’ve learned from history many times since, what becomes popular creates the de facto standard for everything that comes after.

DEC VT100

DEC VT100 Jason Scott, CC BY 2.0 via Wikimedia Commons

Video terminals were an improvement on the teletype machines that preceded them. They could move the cursor around the screen to create interactive interfaces. They could use color, and clear and redraw their displays quickly without feeding out reams of paper.

Different video terminals had their own unique way of doing things using unique, proprietary escape codes (a sequence of bytes beginning with the escape 0x1b character). This made life difficult for applications because they had to know which of these sequences to use. Libraries and helper programs (e.g. termcap) were created to help ameliorate these issues (we still live with the descendant of these early libraries, terminfo).

Eventually, formal standards were created, such as ECMA-48 and ANSI X3.64 (from which the term “ANSI escape codes” derives), which defined a set of standard escape sequences. The DEC VT100 was the first video terminal to support these new standards. Its popularity, combined with the new standards, meant that programs now had a set of known good escape sequences they could reliably use. Its popularity spawned many clones, which in turn supported the same sequences for compatibility with applications.

Graphical window systems eventually replaced hardware video terminals, but users still wanted to use the terminal based programs they were accustomed to (you know how those vi people are). In 1984, work began on a software terminal emulator at MIT. This emulator became part of the X project and was named Xterm. Xterm implemented its own features which did not exist on the video terminals it emulated, such as mouse tracking and a configurable color palette. These features were in turn copied by Xterm clones, until eventually Xterm itself became the new de facto standard.

Terminal Emulator Basics

Terminal based applications write two kinds of data to the terminal emulator: printable text that is displayed to the user, and control codes, which modify the terminal emulator’s state. Control codes are either single bytes in the C0 character set (bytes 0x00 through 0x1f) or sequences of bytes that begin with the escape character (0x1b). These sequences are most commonly referred to as “escape sequences”, and it is these sequences that do the bulk of the heavy lifting in terminal applications.

Most control codes from the C0 character set are not used today, but regardless of experience with terminals or terminal applications, most developers are likely familiar with control codes such as \r (carriage return), which moves the cursor to the beginning of the current line, and \n (line feed), which moves the cursor to the next line.

Escape sequences are varied and numerous, but the vast majority used in practice fall into one of three categories: Control Sequence Introducer (CSI), Device Control String (DCS), and Operating System Command (OSC).

CSI sequences are those which begin with the prefix ESC [ (0x1b 0x5b). Escape sequences in this category are those which reposition the cursor, change the cursor style, clear the screen, set foreground and background colors, and more.

OSC sequences are those which begin with the prefix ESC ] and are typically used for things that modify or interact with the user’s environment outside of the terminal emulator itself (hence the name “Operating System Command”). Examples are reading from or writing to the system clipboard, changing the title of the terminal emulator’s window, or sending desktop notifications.

Xterm maintains a list of all of the control sequences it supports on its website, which, along with vt100.net, forms an informal pseudo-specification for VT100 emulators. Note that this list may not contain some control sequences used by other, modern terminal emulators for features which Xterm does not support (e.g. the Kitty keyboard protocol, which we’ll discuss later).

Escape sequences are actually quite easy to use, and you can even do it straight from your shell. Try running the following command from any shell:

printf '\e[1;32mHello \e[0;4;31mworld!\n\e[0m'

This command will print the text “Hello world!”, with “Hello” in green, bold text and “world!” in red, underlined text.

The escape sequences used here are of the form CSI <parameters> m, which is so common it has its own name: Select Graphic Rendition (SGR). The SGR escape sequence sets foreground and background colors for all printed text. The first escape sequence in the example \e[1;32m enables the bold attribute (1) and sets the foreground color to green (32). The second escape sequence \e[0;4;31m first clears any existing styles (0), then enables the underline attribute (4), and finally sets the foreground text color to red (31). Finally, the last escape sequence \e[0m resets all styles back to their defaults.

Another use case for simple CSI sequences is redrawing text on the screen on an already existing line (e.g. for a progress bar or text that updates itself over time). Hint: look at \r, CSI A, and CSI K.

Most escape sequences are sent from the application to the terminal emulator, but occasionally the terminal emulator sends escape sequences to the application. Usually this is done in response to a query from the application (for instance, to determine if a certain mode is set).

Problems & Solutions

Terminal emulators are descended from old, legacy technologies, which brings with it its fair share of problems. Many of these problems have been (mostly) solved, or at least ameliorated, while others are still active areas of innovation and research.

Key Encoding

Terminal emulators and terminal applications communicate through a stream of bytes. When a user presses a key the terminal sends the byte representation of the character associated with that key. The old video terminals only supported ASCII so this was, generally, fairly straightforward.

Modifier keys like Ctrl and Alt complicate this situation. Alt modified keys are encoded by prefixing the character with an Esc. But this has a problem: including an extra Esc byte for the Alt modifier introduces ambiguity between Alt modified key presses and two separate key presses. When an application sees Esc C, should it interpret it as Alt-C or did the user press Esc and then press C? Applications usually solve this by measuring the amount of time between Esc and the next character. If the time is less than some defined interval, it is considered an Alt modified key press (Vim uses the ttimeoutlen option, tmux uses the escape-time option).

Ctrl modified keys are an even bigger problem. When Ctrl is used as a modifier, the shifted2 version of the key has the 7th bit masked off (for example, C is 0x43 and after masking the 7th bit the byte becomes 0x03). This means that not only can the Shift modifier not be used in conjunction with Ctrl, but that certain Ctrl modified keys are completely indistinguishable from other control codes.

For instance, when you press the Return key the terminal emulator sends the byte \r (0x0d) to the application. But if you press Ctrl-M then the terminal emulator also sends the byte 0x0d to the application (M is 0x4d in ASCII, so when the 7th bit is masked out, it becomes 0x0d). From the application’s perspective, there is literally no way to distinguish these two events.

For a long time this meant that certain modified keys like Ctrl-I, Ctrl-J, and Ctrl-M could not be used in terminal applications like Vim. There have been a few attempts to solve this problem: the first came from Xterm in 2006 through the modifyOtherKeys option. Paul Evans (author of libvterm and libtickit) introduced an alternate key encoding using the CSI u escape sequence in an essay which is sometimes colloquially referred to as “fixterms”. The CSI u encoding proposed by Evans was extended by Kovid Goyal, the author of the kitty terminal emulator, in what has become known as the kitty keyboard protocol.

What all of these solutions have in common is that key presses are sent to the terminal application encoded as escape sequences. This eliminates any ambiguity for modified keys and enables certain modifier combinations (such as Ctrl + Shift) that are not possible using “legacy” encoding. The CSI u encoding proposed by Evans and adapted by kitty encodes a modified key press like Ctrl-M as \e[109;5u. The encoding of unmodified key presses like Return depend on which “level” of the kitty keyboard protocol is enabled. Applications can opt-in to different levels to ease adoption (for instance, Neovim uses only the first level, “Disambiguate escape keys”). See the kitty documentation for more details.

Sending key presses as escape sequences requires that terminal applications are able to recognize and parse those sequences, so it is not something that “just works” out of the box. However, the kitty keyboard protocol has been widely adopted by both modern terminal emulators and terminal applications. Terminals which support the kitty keyboard protocol (to some degree) include Wezterm, Alacritty, kitty, foot, Ghostty, and iTerm2. Applications which support the kitty keyboard protocol (to some degree) include Vim, Neovim, Helix, kakoune, and nushell. This means that when using one of these applications in one of these terminals, all of the key encoding problems discussed above (as well as some others which were not discussed…) are solved.

Decorations

Xterm has supported 256 user specified colors since 1999. These colors could be changed at runtime using an escape sequence (OSC 4), which can be used to great effect (see “8 Bit & ‘8 Bitish’ Graphics-Outside the Box” by Mark Ferrari for an incredible demonstration, or install notcurses and run notcurses-demo j in your terminal).

Within the last decade or so, 24 bit color (sometimes referred to as “truecolor” or “RGB color”) has become widely supported by terminal emulators which allows terminal applications to use whatever arbitrary colors they want. This provides terminal UIs a much greater degree of flexibility and creative freedom.

Modern terminals also support other kinds of “rich” text markup, such as strikethrough and various types of underlines. For instance, text editors like Vim and Neovim can add a red squiggly line under misspelled words (as seen in many graphical rich text editors).

A screenshot of a terminal emulator displaying different text markup styles

Examples of markup styles supported by modern terminal emulators

It is also possible to display images and even videos inline inside of terminal emulators. There are (at least) three different ways to do this (sixels, the iTerm2 image protocol, and the kitty graphics protocol) and support among terminal emulators varies. Unfortunately this means that terminal applications are in a bit of an awkward situation, as they must either implement support for all of the image protocols, or only support a subset of terminals. For this reason, use of images in terminal applications is still relatively uncommon.

It is important to note that advances in terminal based UIs are not only due to the efforts of terminal emulators, but also to the creativity and talent of terminal application and library authors. For example, see some of the fantastic work that charm.sh has done creating delightful, interactive terminal based user interfaces that rival (and in some cases, surpass!) graphical UIs for similar tools.

Capability Determination

Terminal emulators do not all support the same features. In some cases, the same feature is implemented in different ways. Terminal applications need some way to know which features the terminal they’re running in support and how to properly use those features.

Today this is primarily done using a distributed database of “terminfo” files. The terminal emulator uses the $TERM environment variable to communicate to terminal applications which terminfo file to use to lookup which capabilities the terminal supports.

This has a multitude of problems, however. The terminfo database is part of the ncurses library, and different operating systems and distributions package different versions of ncurses. This was a problem for tmux users on macOS for many years because the version of ncurses packaged with macOS was so old that it did not even include the tmux-256color terminfo entry at all!

This is also a problem for newer terminals which have not yet been added to the ncurses terminfo database. Terminal emulators can (and often do) ship their own terminfo entries which are used by applications running on the same system as the terminal emulator itself. But when connecting to a remote system (e.g. with SSH), the terminfo database on the remote system will not have the terminfo entry and the user is met with cryptic warnings like WARNING: terminal is not fully functional and applications not functioning properly.

To circumvent this issue, many terminals use xterm-256color as their $TERM value, essentially claiming to be Xterm even though they are not, piggybacking on Xterm’s ubiquity. This creates a vicious cycle, as terminal applications often hardcode special cases for xterm-256color, which incentivizes terminals to claim to be xterm-256color, which incentivizes applications to special case xterm-256color, which… and so on. The problem is exacerbated by common (bad) advice to users facing problems with terminal applications to simply override $TERM to be xterm-256color (the Xterm FAQ itself warns against this).

Unfortunately there are no easy fixes for these problems, but there is hope. The vast majority of escape sequences used by applications today are common across most (if not all) modern terminal emulators. This makes terminfo less necessary since applications can usually safely assume that a given escape sequence will “just work”.

In addition, terminal emulators increasingly support applications querying support for certain capabilities. For instance, applications can query the terminal for support of the kitty keyboard protocol mentioned above and only enable it if the terminal responds that it is supported. A nice property of escape sequence queries is they still work even over remote login connections like SSH.

Some new TUI libraries, such as vaxis, are designed specifically to avoid using terminfo at all and exclusively use queries to determine feature capabilities. As more applications, libraries, and terminal emulators move in this direction, terminfo will become increasingly unnecessary.

System Integration

One of the many advantages of software terminal emulators over hardware video terminals is that they are one piece of a larger, integrated computing system. Modern terminal emulators support many escape sequences to interact with their broader environment. These sequences are generally known as Operating System Commands (OSCs) and are often referred to by the numeric integer which appears after the OSC prefix.

Some of the more popular OSC sequences are OSC 2 for setting the title of the terminal emulator’s window (used frequently by shells and text editors), OSC 8 for creating clickable hyperlinks, OSC 9 for sending desktop notifications, and OSC 52 for interacting with the system clipboard.

You can test these sequences out for yourself. Try running the following in your shell:

printf '\e]9;This is a notification!\a'

If your terminal emulator supports OSC 9, you will see a desktop notification appear with the text, “This is a notification!” (some terminals or operating systems may not display a notificaton for the focused application. In that case, add a sleep 2 before the printf command and quickly change focus to another window).

Terminals which support OSC 8 can create clickable hyperlinks. For instance, try running the below command:

printf '\e]8;;https://www.youtube.com/watch?v=dQw4w9WgXcQ\aClick me for an awesome video!\n\e]8;;\a'

You will see the text “Click me for an awesome video!”. If your terminal emulator supports OSC 8, the text will be clickable (perhaps requiring a modifier key like Shift or Command to be held) and might be styled with an underline or some other visual affordance to indicate that the text is a hyperlink. Clicking on the text will open your web browser to the (perfectly innocuous) embedded URL.

A long standing issue for terminal based text editors like Vim is clipboard management in remote sessons. A strength of Vim is that it can be run just as easily in a remote SSH session as it can locally; however, the remote SSH session is not able to communicate with the clipboard on your local system, so it is not possible to copy text inside of Vim on the remote session to your clipboard.

Vim addresses this by (optionally) linking against X11 and allowing users to forward their X connection to the remote server, allowing Vim on the remote server to copy text to the X clipboard on the local system. And while this does work, it has its own problems (users must use a version of Vim compiled against X11, with the optional +clipboard feature enabled, and use X11 as their display server, and remember to forward the X connection to the remote system).

A better solution is to copy data to the clipboard through the terminal emulator directly. An application running in the terminal can use the OSC 52 escape sequence to write a Base64 encoded string to the terminal emulator. The terminal then decodes the string and copies the data into the system clipboard. The terminal emulator does not know or care whether the application that sent the sequence is running remotely or not, which means this works on any system with zero dependencies.

Pasting (reading) from the clipboard has serious security implications, because any program in the terminal (even ones on remote servers) can request the clipboard contents of the user’s system. For this reason, most terminal emulators disable reading from the clipboard by default, or require the user to explicitly allow it with a prompt.

Neovim recently added builtin support for using OSC 52 and it will be enabled for users by default (if the terminal emulator supports it) in the forthcoming 0.10 release.

Conclusion

While it’s true that terminals, as an application platform, are idiosyncratic and quirky, their portability, ubiquity, and relative ease of use (for application authors) makes them increasingly popular for many developers, even in the face of an increasing number of alternatives.

This article is not exhaustive, but it is not meant to be. There are other challenges that both terminal emulator and terminal application authors face that are not discussed here, as well as other areas of innovation and creative exploration. Some examples: better grapheme clustering, synchronized output to avoid “flickering” in redraw-heavy UIs, and custom shaders to create arbitrary visual effects.

Terminal emulators are not static: they continue to evolve and innovate to solve users’ problems and improve users’ experience. The underlying technology is old: downright ancient by the standards of modern tech. But, instead of a flaw, I consider this a strength: it gives me confidence that while individual terminal emulators may come and go, the underlying platform will endure.

References & Further Reading


  1. This depends on your exact definition of “underlying technology”. Here, I’m referring to the use of teletype machines (TTYs) connected to a digital computer. But teletypes themselves were created well over a century ago, and there are still traces of their design in the inner workings of terminal emulators. ↩︎

  2. Technically, it uses whichever version of the key, shifted or unshifted, falls in the range [0x20, 0x5f]. For alphabetic characters, this is the shifted (uppercase) letter, but for keys like [ or - it uses the unshifted variant. ↩︎

Last modified on