Module rstubs::assignments::b3_ipc_cow

source ·
Expand description

§BST - A3: IPC, fork, and COW

In the last assignment, you implemented paging for memory isolation. Consequently, applications can no longer access each other’s memory. Unfortunately, this also prevents them from sending any data to other applications.

We will change that in this assignment with safe and explicit interprocess communication. Additionally, we introduce the fork syscall, allowing applications to create new copies of themselves. The final task will be optimizing both IPC and fork with copy-on-write (COW).

§Interprocess Communication

Making this communication safe requires going through the kernel, which can reliably check the sent data and recipients. In RStuBS, the IPC mechanism follows the send-receive-reply procedure:

Sender        OS      Receiver
   |          '          '
   x--------->|          '
   ' send     x--------->|
   '          ' dispatch |
   '          '          |
   '          |<---------x
   '          |  receive '
   '          |          '
   '          x--------->|
   '          ' return   |
   '          '          |
   '          |<---------x
   |<---------x    reply '
   |          '          '

Note that receive could also happen before send.

The send function blocks until the corresponding reply was executed, and similarly the receive function blocks until a send for this thread was issued. The reply function wakes up the sender does not block.

This mechanism allows sending data in both directions. The send function specifies two buffers: one for the message to be sent to the recipient and a second mutable buffer where the response should be stored. Accordingly, the receive function must specify an appropriate mutable buffer in which the message is then copied. And lastly, the reply also takes another buffer for the response, which is copied into the provided response buffer from the send function.

We introduce four new syscalls that implement this send-receive-reply procedure.

/// Return the unique id of this thread.
pub fn pid() -> usize;
/// Send the `message` to the `pid`, and wait for the `response`.
pub fn send(pid: usize, message: &[u8], response: &mut [u8]) -> Result<(), Error>;
/// Wait for an incoming `message`. Returns the sender thread id.
pub fn receive(message: &mut [u8]) -> Result<usize, Error>;
/// Reply with the `response` to the sender `pid` after a previous receive.
pub fn reply(pid: usize, response: &[u8]) -> Result<(), Error>;
  • We recommend implementing this IPC first without COW and instead with plain old copies.
  • The initial synchronization between send->receive or receive->send could be implemented with semaphores.
  • This IPC mechanism can be implemented without any kernel allocations.

§Forking Processes

The fork syscall clones an application, including its address space and current instruction pointer. It returns twice: at the parent process, that called fork in the first place, and at the newly created child process. Other than UNIX, we do not create a process hierarchy in RStuBS. Instead, fork just returns the pid of the other process: for the newly created process, this is the pid of the old process and vice versa.

Parent (PID 1)
      |
 fork +--__     Child (PID 2)
      '    ‾‾--__
      o          ‾‾--o
ret=2 |        ret=1 |
      |              |

And here is the corresponding syscall:

pub fn fork() -> usize;
  • The new thread has to be initialized so that the syscall return value is the parent’s pid. This requires modifying switch_to_user to set the correct register and fake a syscall return.
  • Fork takes the current EIP and ESP from the saved interrupt context.
  • You might not want to copy the entire thread struct and kernel stack. Why?

§Copy-on-Write (COW)

In the second half of this assignment, we want to reduce the number of (costly) copies for both IPC and fork. A common strategy to do so is the copy-on-write mechanism, which is implemented by most general-purpose OSs (Linux, Darwin, Windows). This mechanism shares the underlying data pages between processes and copies them lazily when a process modifies them for the first time. This hides most of the initial copy delay, and it turns out that, especially for fork, most of the memory does not have to be copied at all.

The concrete implementation looks like this:

  • All currently copied pages should be mapped read-only in both the source and the target address spaces.
  • Now, each write access triggers a page fault (int 14).
  • The corresponding handler must resolve the COW:
    • If this is the last mapping to this page, just make it writable.
    • If there are other mappings, allocate a new page, copy the content over, and map the new page. Do not forget to invalidate the corresponding TLB entry.
  • We need a reference counter for these COW pages to keep track of the number of mappings they are part of. You can use a static array, a B-Tree map, or shadow page tables for this.
  • You can also implement kernel page faults.

There are two ways to invalidate TLB entries: the first option is to flush all entries by writing to the CR3 register, and the alternative is the invlpg instruction that invalidates a single entry.

§Shadow Page Tables

Usually, page tables implement the mapping from virtual to physical address (including access rights). However, this tree data structure can also be used for other purposes. They can store different data instead of physical addresses, like reference counters. In this case, the page tables would not be given to the MMU. They are purely used as a map data structure (like a hash map). As a nice benefit, you probably could reuse the logic for creating and traversing them from your existing page directory/table abstractions.

Typical Page Directory:

phys, rights <-MMU-- PD[virt]

Shadow Page Directory:

counter <-CPU-- PD'[virt]

§Stress Test

The following application is a stress test for your COW implementation. It checks the general functionality, including a few edge cases.

#![no_std]
#![no_main]

#[repr(align(4096))]
struct PageAlign<T>(T);
static mut S_BUF: PageAlign<[u8; 4097]> = PageAlign([0; 4097]);
static mut R_BUF: PageAlign<[u8; 4097]> = PageAlign([0; 4097]);

#[no_mangle]
pub extern "C" fn start() -> ! {
    syscall::fork();
    syscall::fork();
    let pid = syscall::pid();
    let other = syscall::fork();

    let s_buf = unsafe { &mut S_BUF.0 };
    let r_buf = unsafe { &mut R_BUF.0 };
    if pid == other {
        // child
        s_buf[0] = 3;
        s_buf[4096] = pid as u8;
        syscall::send(other, s_buf, r_buf).unwrap();

        let a = char::from(b'A' + r_buf[0]);
        let b = char::from(b'A' + 3 + pid as u8);
        syscall::println!("Reply: {a}{b}");
    } else {
        // parent
        let sender = syscall::receive(r_buf).unwrap();
        r_buf[0] += r_buf[4096];
        syscall::reply(sender, r_buf).unwrap();
    }
    syscall::exit();
}

/// This function is called on panic.
#[panic_handler]
fn panic(info: &core::panic::PanicInfo) -> ! {
    syscall::println!(dbg: "{info}");
    syscall::exit();
}

Here, we split our application into four pairs. Within each pair, the child sends a message (including its pid) to the parent, who waits for this message, modifies it, and sends it back to the child.

This test should print four lines of two identical letters similar to the following:

REPLY: DD
REPLY: GG
REPLY: EE
REPLY: FF

The order of lines could differ depending on the timing.