Serial interfaces and RS-232

"Serial interface" is actually a collective term for all transmission types that transfer data serially (one after the other), but here we are dealing with the RS-232 standard. The standard was created in the early 1960s and was used, for example, for display terminals, and became very popular in the 1980s with IBM PCs for many devices such as mice. Even today, some PCs still have the 9-pin connector and hardware that is compatible with the interfaces installed in the IBM PC, even though RS-232 was actually made superfluous by the Universal Serial Bus (USB). On the hardware and software side, RS-232 is a simple way of communicating with remote devices. For example, it helps to develop operating systems on embedded systems that do not have a screen and allows simple data exchange between systems.

We already have an output option with the Text mode of the CGA, but serial output is more flexible in some properties. We can transfer an almost unlimited amount of data to our development PC with serial output, while CGA specifies a very limited number of characters. However, serial output does not allow us to directly place characters and only indirectly assign a color.

The serial interface is typically provided by a UART chip, which decodes incoming data and encodes data to be sent. Often only individual bytes can be buffered by the UART chip, which is why flow control makes sense, but is not required by the protocol. The standard D-sub connector contains 9 pins, some of which allow simple flow control in addition to a GND line. For our purpose, only Rx and Tx, i.e. the receive and transmit lines, are of interest to us.

The serial interface is used to transmit octets, and there are often modes for transmitting just 7 bits, which are then evaluated to form a character. A character or octet on the line begins with a start bit, which is intended to enable synchronization of the receiving end and indicates that a new datum is beginning. Depending on the configuration, this is followed by 7, 8 or more bits of user data, which are evaluated by the UART controller. A parity bit may also follow, followed by the stop bit. The transmission speed is specified in bits per second, the so-called baud rate. It must be set identically for the receiver and transmitter.

This results in the following settings:

The baud rate: This denotes how many bits are transmitted in one second (often 115200).
The number of data bits: Indicates the size of a symbol (5-8 bits).
Number of stop bits: Stop bits (1-2) mark the end of a symbol and are used to check synchronicity.
Parity bit: An optional parity bit that can be used to check whether the message arrived without errors.

Nowadays, a mode of 8N1 is common (8 data bits, no parity bit, 1 stop bit).

Implementation in the IBM PC

The PC has 4 built-in serial ports, so-called COM ports. The UART chip is configured via I/O ports. COM1 is located behind I/O port 0x3f8 and COM2 behind 0x2f8. Other registers required for control are located directly behind the base, e.g. the line control register of COM1 can be addressed with I/O port 0x3f8+3, i.e. 0x3fa.

Two of the registers are switched by a bit in the line control register (the so-called DLAB or divisor latch access bit). If this bit is set, the registers at offset 0 and 1 are read and written as a divisor for the baud rate (LSB at +0, MSB at +1). If the bit is not set, the data register, which we need for sending and receiving data, can be accessed at +0, and the interrupt enable register is located at +1.

Offset, and DLAB setting	Register
+0, DLAB: 0	Data register
+1, DLAB: 0	Interrupt Enable Register
+0, DLAB: 1	Divisor Latch LSB
+1, DLAB: 1	Divisor Latch MSB
+2	Interrupt Identification and FIFO Control Register
+3	Line Control Register
+4	Modem Control Register
+5	Line Status Register
+6	Modem Status Register
+7	Scratch Register

Register

A detailed breakdown of the registers can be found in the data sheet of the 8250A UART chip. For our purposes, only the Line Control Register and the Line Status Register are important.

Register	Bit 7	6	5	4	3	2	1	0
Line Control	DLAB	Set Break	Stick Parity	Even Parity Select (EPS)	Parity Enable (PEN)	Stop Bits (STB)	Word Length Select 1 (WLS1)	Word Length Select 0 (WLS0)
Line Status	0	Transmitter Empty	Transmitter Holding Register	Break Interrupt	Framing Error	Parity Error	Overrun Error	Data Ready

Baud rate

The UART has a built-in timer of 115200 ticks per second. To set the baud rate, we divide this timer in half step by step. So to use the full 115200 ticks per second, we use a divisor of 1, for 57600 baud we use a divisor of 2, 38400 baud divisor of 3, and so on. To set the baud rate, the top bit of the line control register (i.e. DLAB) must be active. Then we can write the LSB to offset +0 and the MSB to +1. Finally, DLAB can be reset.

Data bits

The number of data bits in a message is variable. Usually 8N1 is used, i.e. 8 data bits, which are entered accordingly in the WLS0 and WLS1 bits in the Line Control Register.

Character length	(1) WLS1	(0) WLS0
5	0	0
6	0	1
7	1	0
8	1	1

Parity bit - error detection

The parity enable bit (PEN) specifies whether a parity bit is to be sent or received. The parity bit is used to detect transmission errors at bit level. A parity bit can usually be seen as a counter of transmitted zeros, so if a bit changes its value, the parity bit no longer matches and an error is detected.

The 8250A UART chip can use several UART modes, e.g. the parity bit can be set to 0 or 1 so that the parity bit is always sent and received as 0 (SPACE) or 1 (MARK). The EVEN and ODD modes ensure that the symbol together with the parity bit have either an even or an odd sum of digits. The usual standard 8N1 deactivates the parity bit.

Meaning	(5) Stick Parity	(4) EPS	(3) PEN
NONE - no parity bit	-	-	0
ODD	0	0	1
EVEN	0	1	1
MARK - always 1	1	0	1
SPACE - always 0	1	1	1

Stopbits

The stop bits are used by the receiver to ensure that the sender acts in phase, i.e. uses the same (or sufficiently similar) timing. With 5 data bits, it is possible to send 1 or 1.5 stop bits, otherwise 1 or 2 stop bits.

Meaning	(2) STB
1 stop bit	0
1.5 or 2 stop bits	1

Line Status

With the Line Status Register it is possible to query the status of the UART chip. For us, only bits 0 and 6 are important, which indicate whether data has been received (Data ready) and whether the chip is ready to send data (Transmsitter empty). These bits should first be queried by polling.

ANSI Escape Sequences

The serial interface only sends characters, but we can also have color or other highlighting if ANSI escape sequences are implemented on both sides. The usual terminal emulators under Linux implement the escape sequences, so it is up to you to also send these control characters in StuBS.

There are other control characters that we already know, e.g. \n or \r are those that move the Cursor to the next line, e.g. to the beginning of the line. We are interested here in everything that begins with the escape control character, i.e. ^[, 0x1b or octal 33. For many commands, this control character is followed by the control sequence introducer (in this case [), a series of parameters separated by semicolons and a further control character that specifies the command to be executed with the transmitted parameters. For example, the character string \033[J is used to clear the entire screen. A further parameter can also be used to specify where the Cursor should be located.

Colors can be set using the command m, where 033[0m stands for resetting the previous settings to the default. As in the CGA display, the color settings have a foreground and a background color. Characters can also be displayed in bold, underlined or flashing. The settings are retained until they are overwritten or reset.

Foreground	Background	Color palette	Foreground	Background	Color palette
30	40	Black	90	100	Dark grey
31	41	Red	91	101	Light red
32	44	Green	92	102	Light green
33	43	Yellow	93	103	Light yellow
34	44	Blue	94	104	Light blue
35	45	Magenta	95	105	Light magenta
36	46	Cyan	96	106	Hellcyan
37	47	Grey	97	107	White

Further information

Serial interface at OSDev.org and Lowlevel.eu
ANSI/VT100 Terminal Control Escape Sequences

Table of Contents