Module rstubs::assignments::b1_syscalls

source ·
Expand description

§BST - A1: System Calls

The first goal of the BST lecture is to introduce vertical and horizontal isolation to protect the kernel from applications and also applications from each other. In this first assignment, we focus on vertical isolation (horizontal isolation will be the topic of the next assignment).

Vertical isolation protects the kernel from the user by preventing it from using certain privileged instructions (like cli/sti). For this, CPUs contain multiple execution modi: a kernel mode (ring 0) with the whole instructions set and a user mode (ring 3) where some instructions are not available. In RStuBS, we have to implement this mode- or ring-switch manually.

§Running Applications on Ring 3

Currently, RStuBS boots in ring 0 (kernel mode) and also starts all applications in this ring. As we want horizontal isolation, the applications must be moved to ring 3 (user mode). Only certain privileged functions - system calls - should be available to the user. Here, we switch to the kernel mode, check inputs and permissions, and execute the desired tasks.

We generally have three different ring switches:

  1. The initial switch to ring 3 for an application
  2. The switch to ring 0 for syscalls (and other interrupts)
  3. The switch back to ring 3 after a syscall (or interrupt)

Switch (1) and (2) have to be implemented manually, and switch (3) is done automatically by the CPU as part of iret.

§Expanding the Global Descriptor Table

To enable ring 3, we have to add three entries to the GDT. The first two entries define the code and data/stack segments for the user mode. For now, we give it access to all the memory (this will be improved in the next assignment).

The third entry, the task state segment descriptor, has to be added due to switch (2), from user to kernel for the interrupt handling. Usually, the CPU automatically switches to ring 0 if an interrupt is triggered. We have, however, one problem: the stack. We generally do not want our interrupt handlers to rely on the user stack. Thus, the stack pointer (esp) must be switched to a separate kernel stack before the IRQ and restored back to the user stack afterward. Configuring the interrupt stack requires the use of x86 hardware tasks as a workaround. These are configured with a TaskStateSegment, which must be added to the Global Descriptor Table. This TSS specifies the kernel stack pointer and stack segment when switching from a user program to ring 0.

To summarize:

  • Add two segment descriptors for the user code and user data/stack
  • Create a new TSS descriptor
  • Implement the layout of the TaskStateSegment (we only need ss0 and esp0)

Note: Details of these descriptors are in the IA-32 Developer Manual, specifically in sections “Segment Descriptors” and “Task Management Data Structures”.

§Kernel Stack Implementation

Now, how do we use this TaskStateSegment?

Remember, we must ensure that each application runs its ring 0 code on a unique kernel stack. The Thread already contains a kernel stack pointer, so nothing has to change here. However, during a context switch (in Scheduler::dispatch), we have to store the kernel stack of the next app in the TaskStateSegment (esp0 and ss0) so that the next interrupt uses the correct kernel stack.

§Switching to the User for the First Time

Generally, x86 has multiple ways to perform the initial switch from kernel to user mode. One option is to fake a return from an interrupt, which automatically switches rings.

In RStuBS this requires a few things:

  • We need an extra user stack for each thread
  • These have to be passed into the Thread::init function and then into Thread::kernel_kickoff (via the kernel stack).
  • There, we switch rings from 0 to 3 by calling switch_to_user.
  • Naturally, you must also implement the switch_to_user function, including the stack preparation and iretd. Don’t forget to also set the correct ds, es, fs, and gs segments.

Hint: For details on this setup, refer to the “Exception and Interrupt Handling” section in the Intel Manual.

§Creating a System Call Interface

The second half of the first assignment is to design a system call interface to provide privileged operations.

§Managing Exceptions

Triggering a trap via the int instruction is usually a privileged operation, not allowed for ring 3 code, and would result in a general protection fault. However, interrupt descriptor table entries for custom vectors (like 0x80) can configured to allow being triggered from the user.

Since we have to access the previous registers (InterruptContext), which contain the syscall parameters, we cannot use the compiler-generated syscall handlers (extern "x86-interrupt"). Instead, we have to write our own trampoline function that saves and provides access to all the necessary interrupt context. We need some assembly code to save and restore the registers for this (implement syscall_trampoline and InterruptContext).

Note: Check the Intel Manual’s “Exception and Interrupt Handling” section for more information.

§Parameter Handling Mechanism

Switching to the kernel stack when handling a system call trap makes parameter passing on the stack (like for regular function calls) impossible. Instead, RStuBS should be designed to pass system call parameters in registers. Therefore, the user syscall stub has to put the parameter in the correct registers before calling int 0x80. Accordingly, the kernel handling function must do the opposite, placing the register contents on the kernel stack. The Syscall handler then has to select the actual implementation based on the syscall number, execute it, and return to the user.

Return values should also be passed in registers. There are multiple ways to return and distinguish an error from a correct return value. The errors can be put in an extra register, or the carry flag can be used to distinguish an error code from a normal return value.

Note that we return Results, so your wrapper should convert from/to the integers in the registers accordingly.

System Calls to Implement:

/// The errors, syscalls can return
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
#[repr(usize)]
pub enum Error {
    /// Invalid syscall id
    Id,
    /// Invalid syscall argument
    Argument,
    /// Resource cannot be initialized, it most likely was already initialized
    Init,
    /// Resource cannot be found
    NotFound,
    // ... (you can create your own errors)
}

fn write(fd: usize, buf: &[u8], pos: Option<(u8, u8)>) -> Result<usize, Error>;
fn read(fd: usize, buf: &mut [u8]) -> Result<usize, Error>;
fn sleep(ms: usize) -> Result<(), Error>;
fn sem_init(id: usize, value: usize) -> Result<(), Error>;
fn sem_destroy(id: usize) -> Result<(), Error>;
fn sem_wait(id: usize) -> Result<(), Error>;
fn sem_signal(id: usize) -> Result<(), Error>;

Recommendation: Create println! macro and a BufWriter that calls write internally. To make your life easier, we provide an example below:

/// Buffered writer that collects output until its buffer is full.
/// It then flushes the buffer with the write syscall.
pub struct BufWriter {
    fd: usize,
    buf: [u8; Self::LEN],
    idx: usize,
    pos: Option<(u8, u8)>,
}
impl BufWriter {
    pub const LEN: usize = 64;
    /// Create a new writer for the given file and optional screen position on the CGA.
    pub fn new(fd: usize, pos: Option<(u8, u8)>) -> Self {
        Self { fd, buf: [0; Self::LEN], idx: 0, pos }
    }
    /// Call the write syscall and empty the buffer.
    pub fn flush(&mut self) {
        write(self.fd, &self.buf[..self.idx], self.pos).unwrap();
        self.idx = 0;
    }
}
impl Drop for BufWriter {
    fn drop(&mut self) {
        self.flush();
    }
}
impl core::fmt::Write for BufWriter {
    fn write_str(&mut self, s: &str) -> core::fmt::Result {
        let mut s = s.as_bytes();
        while !s.is_empty() {
            let remainder = (Self::LEN - self.idx).min(s.len());
            self.buf[self.idx..self.idx + remainder].copy_from_slice(&s[..remainder]);
            self.idx = (self.idx + remainder) % Self::LEN;
            s = &s[remainder..];
            if !s.is_empty() {
                self.flush();
            }
        }
        Ok(())
    }
}

#[macro_export]
macro_rules! println {
    // CGA at position x and y: println!(1, 2: "my text {}", x);
    ($x:tt, $y:tt : $($arg:tt)*) => ({
        use core::fmt::Write;
        let _ = writeln!($crate::BufWriter::new(0, Some(($x, $y))), $($arg)*);
    });
    // CGA Debug: println!(dbg: "my text {}", x);
    (dbg: $($arg:tt)*) => ({
        use core::fmt::Write;
        let _ = writeln!($crate::BufWriter::new(2, None), $($arg)*);
    });
    // Serial console: println!("my text {}", x);
    ($($arg:tt)*) => ({
        use core::fmt::Write;
        let _ = writeln!($crate::BufWriter::new(1, None), $($arg)*);
    });
}

§Checklist

  • All segment descriptors must be defined correctly
  • Faked the user stack in the kickoff function
  • Using the correct stack for interrupts
  • Specific interrupt vector for syscalls
  • Saved registers before syscall (clobbers)
  • Convert registers back to arguments in the kernel syscall handler
  • Test: Does cli in an app create a trap?