State of the Terminal
This is a companion article to my talk at Neovimconf 2023.
I have been using Vim/Neovim as my full time text editor for close to 10 years. I’ve spent a lot of time in the terminal and have become very aware of the many flaws and idiosyncrasies of this bizarre platform. But I also think it gets a lot of things right! And I’m not alone in this belief: terminal based tools are still widely popular even in the presence of many alternatives (the StackOverflow developer survey shows that Neovim is the “most loved” editor 3 years in a row).
It’s only been in the last couple of years that I’ve begun to dig deep into the inner workings of how terminal emulators, and the applications that run inside of them, really work. I’ve learned that there is a lot of innovation and creative problem solving happening in this space, even though the underlying technology is over half a century old1.
I’ve also found that many people who use terminal based tools (including shells like Bash and editors like Vim) know very little about terminals themselves, or some of the modern features and capabilities they can support.
In this article, we’ll discuss some of the problems that terminal based applications have historically had to deal with (and what the modern solutions are) as well as some features that modern terminal emulators support that you may not be aware of.
But first, some (very) brief history.
Background & History
Most terminal emulators today can directly trace their roots back to the DEC VT100. The VT100 was not the first video terminal, nor was it the last, but it was the most popular (at the time). And as we’ve learned from history many times since, what becomes popular creates the de facto standard for everything that comes after.
Video terminals were an improvement on the teletype machines that preceded them. They could move the cursor around the screen to create interactive interfaces. They could use color, and clear and redraw their displays quickly without feeding out reams of paper.
Different video terminals had their own unique way of doing things using
unique, proprietary escape codes (a sequence of bytes beginning with the
escape 0x1b
character). This made life difficult for applications because
they had to know which of these sequences to use. Libraries and helper
programs (e.g. termcap
) were created to help ameliorate these issues (we
still live with the descendant of these early libraries,
terminfo
).
Eventually, formal standards were created, such as ECMA-48 and ANSI X3.64 (from which the term “ANSI escape codes” derives), which defined a set of standard escape sequences. The DEC VT100 was the first video terminal to support these new standards. Its popularity, combined with the new standards, meant that programs now had a set of known good escape sequences they could reliably use. Its popularity spawned many clones, which in turn supported the same sequences for compatibility with applications.
Graphical window systems eventually replaced hardware video terminals, but
users still wanted to use the terminal based programs they were accustomed to
(you know how those vi
people are). In 1984, work began on a software
terminal emulator at MIT. This emulator became part of the X project and was
named Xterm. Xterm implemented its own features which did not exist
on the video terminals it emulated, such as mouse tracking and a configurable
color palette. These features were in turn copied by Xterm clones, until
eventually Xterm itself became the new de facto standard.
Terminal Emulator Basics
Terminal based applications write two kinds of data to the terminal emulator:
printable text that is displayed to the user, and control codes, which modify
the terminal emulator’s state. Control codes are either single bytes in the C0
character set (bytes 0x00
through 0x1f
) or sequences of bytes that begin
with the escape character (0x1b
). These sequences are most commonly referred
to as “escape sequences”, and it is these sequences that do the bulk of the
heavy lifting in terminal applications.
Most control codes from the C0 character set are not used today, but
regardless of experience with terminals or terminal applications, most
developers are likely familiar with control codes such as \r
(carriage
return), which moves the cursor to the beginning of the current line, and \n
(line feed), which moves the cursor to the next line.
Escape sequences are varied and numerous, but the vast majority used in practice fall into one of three categories: Control Sequence Introducer (CSI), Device Control String (DCS), and Operating System Command (OSC).
CSI sequences are those which begin with the prefix ESC [
(0x1b 0x5b
).
Escape sequences in this category are those which reposition the cursor,
change the cursor style, clear the screen, set foreground and background
colors, and more.
OSC sequences are those which begin with the prefix ESC ]
and are typically
used for things that modify or interact with the user’s environment outside of
the terminal emulator itself (hence the name “Operating System Command”).
Examples are reading from or writing to the system clipboard, changing the
title of the terminal emulator’s window, or sending desktop notifications.
Xterm maintains a list of all of the control sequences it supports on its website, which, along with vt100.net, forms an informal pseudo-specification for VT100 emulators. Note that this list may not contain some control sequences used by other, modern terminal emulators for features which Xterm does not support (e.g. the Kitty keyboard protocol, which we’ll discuss later).
Escape sequences are actually quite easy to use, and you can even do it straight from your shell. Try running the following command from any shell:
printf '\e[1;32mHello \e[0;4;31mworld!\n\e[0m'
This command will print the text “Hello world!”, with “Hello” in green, bold text and “world!” in red, underlined text.
The escape sequences used here are of the form CSI <parameters> m
, which is
so common it has its own name: Select Graphic Rendition (SGR). The SGR escape
sequence sets foreground and background colors for all printed text. The first
escape sequence in the example \e[1;32m
enables the bold attribute (1
)
and sets the foreground color to
green
(32
). The second escape sequence \e[0;4;31m
first clears any existing
styles (0
), then enables the
underline
attribute (4
), and finally sets the foreground text color to
red
(31
). Finally, the last escape sequence \e[0m
resets all styles back to
their defaults.
Another use case for simple CSI sequences is redrawing text on the screen on
an already existing line (e.g. for a progress bar or text that updates itself
over time). Hint: look at \r
, CSI A
, and CSI K
.
Most escape sequences are sent from the application to the terminal emulator, but occasionally the terminal emulator sends escape sequences to the application. Usually this is done in response to a query from the application (for instance, to determine if a certain mode is set).
Problems & Solutions
Terminal emulators are descended from old, legacy technologies, which brings with it its fair share of problems. Many of these problems have been (mostly) solved, or at least ameliorated, while others are still active areas of innovation and research.
Key Encoding
Terminal emulators and terminal applications communicate through a stream of bytes. When a user presses a key the terminal sends the byte representation of the character associated with that key. The old video terminals only supported ASCII so this was, generally, fairly straightforward.
Modifier keys like Ctrl
and Alt
complicate this situation. Alt
modified
keys are encoded by prefixing the character with an Esc
. But this has a
problem: including an extra Esc
byte for the Alt
modifier introduces
ambiguity between Alt
modified key presses and two separate key presses.
When an application sees Esc C
, should it interpret it as Alt-C
or did the
user press Esc
and then press C
? Applications usually solve this by
measuring the amount of time between Esc
and the next character. If the time
is less than some defined interval, it is considered an Alt
modified key
press (Vim uses the ttimeoutlen
option, tmux uses the escape-time
option).
Ctrl
modified keys are an even bigger problem. When Ctrl
is used as a
modifier, the shifted2 version of the key has the 7th bit masked off (for
example, C
is 0x43
and after masking the 7th bit the byte becomes 0x03
).
This means that not only can the Shift
modifier not be used in conjunction
with Ctrl
, but that certain Ctrl
modified keys are completely
indistinguishable from other control codes.
For instance, when you press the Return
key the terminal emulator sends the
byte \r
(0x0d
) to the application. But if you press Ctrl-M
then the
terminal emulator also sends the byte 0x0d
to the application (M
is 0x4d
in ASCII, so when the 7th bit is masked out, it becomes 0x0d
). From the
application’s perspective, there is literally no way to distinguish these two
events.
For a long time this meant that certain modified keys like Ctrl-I
, Ctrl-J
,
and Ctrl-M
could not be used in terminal applications like Vim. There have
been a few attempts to solve this problem: the first came from Xterm in 2006
through the modifyOtherKeys option. Paul Evans (author of libvterm
and
libtickit
) introduced an alternate key encoding using the CSI u
escape
sequence in an essay which is sometimes colloquially referred to
as “fixterms”. The CSI u
encoding proposed by Evans was extended by Kovid
Goyal, the author of the kitty terminal emulator, in what has become known as
the kitty keyboard protocol.
What all of these solutions have in common is that key presses are sent to the
terminal application encoded as escape sequences. This eliminates any
ambiguity for modified keys and enables certain modifier combinations (such as
Ctrl + Shift
) that are not possible using “legacy” encoding. The CSI u
encoding proposed by Evans and adapted by kitty encodes a modified key press
like Ctrl-M
as \e[109;5u
. The encoding of unmodified key presses like
Return
depend on which “level” of the kitty keyboard protocol is enabled.
Applications can opt-in to different levels to ease adoption (for instance,
Neovim uses only the first level, “Disambiguate escape keys”). See the kitty
documentation for more details.
Sending key presses as escape sequences requires that terminal applications are able to recognize and parse those sequences, so it is not something that “just works” out of the box. However, the kitty keyboard protocol has been widely adopted by both modern terminal emulators and terminal applications. Terminals which support the kitty keyboard protocol (to some degree) include Wezterm, Alacritty, kitty, foot, Ghostty, and iTerm2. Applications which support the kitty keyboard protocol (to some degree) include Vim, Neovim, Helix, kakoune, and nushell. This means that when using one of these applications in one of these terminals, all of the key encoding problems discussed above (as well as some others which were not discussed…) are solved.
Decorations
Xterm has supported 256 user specified colors since 1999. These
colors could be changed at runtime using an escape sequence (OSC 4), which can
be used to great effect (see “8 Bit & ‘8 Bitish’ Graphics-Outside the
Box” by Mark Ferrari for an incredible demonstration, or install
notcurses and run notcurses-demo j
in your terminal).
Within the last decade or so, 24 bit color (sometimes referred to as “truecolor” or “RGB color”) has become widely supported by terminal emulators which allows terminal applications to use whatever arbitrary colors they want. This provides terminal UIs a much greater degree of flexibility and creative freedom.
Modern terminals also support other kinds of “rich” text markup, such as strikethrough and various types of underlines. For instance, text editors like Vim and Neovim can add a red squiggly line under misspelled words (as seen in many graphical rich text editors).
It is also possible to display images and even videos inline inside of terminal emulators. There are (at least) three different ways to do this (sixels, the iTerm2 image protocol, and the kitty graphics protocol) and support among terminal emulators varies. Unfortunately this means that terminal applications are in a bit of an awkward situation, as they must either implement support for all of the image protocols, or only support a subset of terminals. For this reason, use of images in terminal applications is still relatively uncommon.
It is important to note that advances in terminal based UIs are not only due to the efforts of terminal emulators, but also to the creativity and talent of terminal application and library authors. For example, see some of the fantastic work that charm.sh has done creating delightful, interactive terminal based user interfaces that rival (and in some cases, surpass!) graphical UIs for similar tools.
Capability Determination
Terminal emulators do not all support the same features. In some cases, the same feature is implemented in different ways. Terminal applications need some way to know which features the terminal they’re running in support and how to properly use those features.
Today this is primarily done using a distributed database of “terminfo” files.
The terminal emulator uses the $TERM
environment variable to communicate to
terminal applications which terminfo file to use to lookup which capabilities
the terminal supports.
This has a multitude of problems, however. The terminfo database is part of
the ncurses library, and different operating systems and distributions package
different versions of ncurses. This was a problem for tmux users on
macOS for many years because the version of ncurses packaged with macOS
was so old that it did not even include the tmux-256color
terminfo entry at
all!
This is also a problem for newer terminals which have not yet been added to
the ncurses terminfo database. Terminal emulators can (and often do) ship
their own terminfo entries which are used by applications running on the same
system as the terminal emulator itself. But when connecting to a remote system
(e.g. with SSH), the terminfo database on the remote system will not have the
terminfo entry and the user is met with cryptic warnings like WARNING: terminal is not fully functional
and applications not functioning properly.
To circumvent this issue, many terminals use xterm-256color
as their $TERM
value, essentially claiming to be Xterm even though they are not, piggybacking
on Xterm’s ubiquity. This creates a vicious cycle, as terminal applications
often hardcode special cases for xterm-256color
, which incentivizes
terminals to claim to be xterm-256color
, which incentivizes applications to
special case xterm-256color
, which… and so on. The problem is
exacerbated by common (bad) advice to users facing problems with terminal
applications to simply override $TERM
to be xterm-256color
(the
Xterm FAQ itself warns against this).
Unfortunately there are no easy fixes for these problems, but there is hope. The vast majority of escape sequences used by applications today are common across most (if not all) modern terminal emulators. This makes terminfo less necessary since applications can usually safely assume that a given escape sequence will “just work”.
In addition, terminal emulators increasingly support applications querying support for certain capabilities. For instance, applications can query the terminal for support of the kitty keyboard protocol mentioned above and only enable it if the terminal responds that it is supported. A nice property of escape sequence queries is they still work even over remote login connections like SSH.
Some new TUI libraries, such as vaxis, are designed specifically to avoid using terminfo at all and exclusively use queries to determine feature capabilities. As more applications, libraries, and terminal emulators move in this direction, terminfo will become increasingly unnecessary.
System Integration
One of the many advantages of software terminal emulators over hardware video terminals is that they are one piece of a larger, integrated computing system. Modern terminal emulators support many escape sequences to interact with their broader environment. These sequences are generally known as Operating System Commands (OSCs) and are often referred to by the numeric integer which appears after the OSC prefix.
Some of the more popular OSC sequences are OSC 2 for setting the title of the terminal emulator’s window (used frequently by shells and text editors), OSC 8 for creating clickable hyperlinks, OSC 9 for sending desktop notifications, and OSC 52 for interacting with the system clipboard.
You can test these sequences out for yourself. Try running the following in your shell:
printf '\e]9;This is a notification!\a'
If your terminal emulator supports OSC 9, you will see a desktop notification
appear with the text, “This is a notification!” (some terminals or operating
systems may not display a notificaton for the focused application. In that
case, add a sleep 2
before the printf
command and quickly change focus to
another window).
Terminals which support OSC 8 can create clickable hyperlinks. For instance, try running the below command:
printf '\e]8;;https://www.youtube.com/watch?v=dQw4w9WgXcQ\aClick me for an awesome video!\n\e]8;;\a'
You will see the text “Click me for an awesome video!”. If your terminal
emulator supports OSC 8, the text will be clickable (perhaps requiring a
modifier key like Shift
or Command
to be held) and might be styled with an
underline or some other visual affordance to indicate that the text is a
hyperlink. Clicking on the text will open your web browser to the (perfectly
innocuous) embedded URL.
A long standing issue for terminal based text editors like Vim is clipboard management in remote sessons. A strength of Vim is that it can be run just as easily in a remote SSH session as it can locally; however, the remote SSH session is not able to communicate with the clipboard on your local system, so it is not possible to copy text inside of Vim on the remote session to your clipboard.
Vim addresses this by (optionally) linking against X11 and allowing users to
forward their X connection to the remote server, allowing Vim on the remote
server to copy text to the X clipboard on the local system. And while this
does work, it has its own problems (users must use a version of Vim compiled
against X11, with the optional +clipboard
feature enabled, and use X11 as
their display server, and remember to forward the X connection to the remote
system).
A better solution is to copy data to the clipboard through the terminal emulator directly. An application running in the terminal can use the OSC 52 escape sequence to write a Base64 encoded string to the terminal emulator. The terminal then decodes the string and copies the data into the system clipboard. The terminal emulator does not know or care whether the application that sent the sequence is running remotely or not, which means this works on any system with zero dependencies.
Pasting (reading) from the clipboard has serious security implications, because any program in the terminal (even ones on remote servers) can request the clipboard contents of the user’s system. For this reason, most terminal emulators disable reading from the clipboard by default, or require the user to explicitly allow it with a prompt.
Neovim recently added builtin support for using OSC 52 and it will be enabled for users by default (if the terminal emulator supports it) in the forthcoming 0.10 release.
Conclusion
While it’s true that terminals, as an application platform, are idiosyncratic and quirky, their portability, ubiquity, and relative ease of use (for application authors) makes them increasingly popular for many developers, even in the face of an increasing number of alternatives.
This article is not exhaustive, but it is not meant to be. There are other challenges that both terminal emulator and terminal application authors face that are not discussed here, as well as other areas of innovation and creative exploration. Some examples: better grapheme clustering, synchronized output to avoid “flickering” in redraw-heavy UIs, and custom shaders to create arbitrary visual effects.
Terminal emulators are not static: they continue to evolve and innovate to solve users’ problems and improve users’ experience. The underlying technology is old: downright ancient by the standards of modern tech. But, instead of a flaw, I consider this a strength: it gives me confidence that while individual terminal emulators may come and go, the underlying platform will endure.
References & Further Reading
- The TTY demystified
- What happens when you press a key in your terminal?
- A history of the tty
- Understanding ASCII (and terminals)
- Comprehensive keyboard handling in terminals
- Fix Keyboard Input on Terminals - Please
- Grapheme Clusters and Terminal Emulators
-
This depends on your exact definition of “underlying technology”. Here, I’m referring to the use of teletype machines (TTYs) connected to a digital computer. But teletypes themselves were created well over a century ago, and there are still traces of their design in the inner workings of terminal emulators. ↩︎
-
Technically, it uses whichever version of the key, shifted or unshifted, falls in the range
[0x20, 0x5f]
. For alphabetic characters, this is the shifted (uppercase) letter, but for keys like[
or-
it uses the unshifted variant. ↩︎