Expand description
§BST - A3: IPC, fork, and COW
In the last assignment, you implemented paging for memory isolation. Consequently, applications can no longer access each other’s memory. Unfortunately, this also prevents them from sending any data to other applications.
We will change that in this assignment with safe and explicit interprocess communication. Additionally, we introduce the fork syscall, allowing applications to create new copies of themselves. The final task will be optimizing both IPC and fork with copy-on-write (COW).
§Interprocess Communication
Making this communication safe requires going through the kernel, which can reliably check the sent data and recipients.
In RStuBS, the IPC mechanism follows the send-receive-reply
procedure:
The
send
function blocks until the correspondingreply
was executed, and similarly thereceive
function blocks until asend
for this process was issued. Thereply
function wakes up the sender does not block.
This mechanism allows sending data in both directions.
The send function specifies two buffers: one for the message
to be sent to the recipient and a second mutable buffer where the response
should be stored.
Accordingly, the receive function must specify an appropriate mutable buffer in which the message
is then copied.
And lastly, the reply also takes another buffer for the response
, which is copied into the provided response
buffer from the send function.
We introduce four new syscalls that implement this send-receive-reply
procedure.
/// Return the unique id of this process.
pub fn pid() -> usize;
/// Send the `message` to the `pid`, and wait for the `response`.
pub fn send(pid: usize, message: &[u8], response: &mut [u8]) -> Result<(), Error>;
/// Wait for an incoming `message`. Returns the sender process id.
pub fn receive(message: &mut [u8]) -> Result<usize, Error>;
/// Reply with the `response` to the sender `pid` after a previous receive.
pub fn reply(pid: usize, response: &[u8]) -> Result<(), Error>;
- We recommend implementing this IPC first without COW and instead with plain old copies.
- The initial synchronization between send->receive or receive->send could be implemented with semaphores.
- This IPC mechanism can be implemented without any kernel allocations.
§Forking Processes
The fork syscall clones an application, including its address space and current instruction pointer. It returns twice: at the parent process, that called fork in the first place, and at the newly created child process. Other than UNIX, we do not create a process hierarchy in RStuBS. Instead, fork just returns the pid of the other process: for the newly created process, this is the pid of the old process and vice versa.
![][bst3-fork-cf]
And here is the corresponding syscall:
pub fn fork() -> usize;
- The new process has to be initialized so that the syscall return value is the originals’s pid. For this we have to fake a return from
fork
with the appropriate pid. How? - Fork takes the current EIP and ESP from the saved interrupt context.
- You might not want to copy the entire process struct and kernel stack. Why?
§Testing
At this point, we would recommend executing the stress test down below. It should already work without COW.
§Copy-on-Write (COW)
In the second half of this assignment, we want to reduce the number of (costly) copies for both IPC and fork. A common strategy to do so is the copy-on-write mechanism, which is implemented by most general-purpose OSs (Linux, Darwin, Windows). This mechanism shares the underlying data pages between processes and copies them lazily when a process modifies them for the first time. This hides most of the initial copy delay, and it turns out that, especially for fork, most of the memory does not have to be copied at all. Note that you can only COW whole pages. IPC buffers are not required to be page aligned/sized, so you might still have to copy parts at the start/end of the buffers.
The concrete implementation looks like this:
- All currently copied pages should be mapped read-only in both the source and the target address spaces.
- Now, each write access triggers a page fault (int 14). The corresponding handler must resolve the COW:
- If this is the last mapping to this page, just make it writable.
- If there are other mappings, allocate a new page, copy the content over, and map the new page. Do not forget to invalidate the corresponding TLB entry.
- We need a reference counter for these COW pages to keep track of the number of mappings they are part of. You can use a static array, a B-Tree map, or shadow page tables for this.
- You can also implement kernel page faults.
There are two ways to invalidate TLB entries: the first option is to flush all entries by writing to the CR3 register, and the alternative is the invlpg
instruction that invalidates a single entry.
§Reference Counters
There are a lot of data structure that can be used for storing the reference counters for each (COWed) page frame. Here we discuss a few examples, you can choose whatever you like. Generally we want to have a mapping from the physical page address to the corresponding reference counter.
fn ref_get(phys: Physical) -> usize;
fn ref_inc(phys: Physical);
fn ref_dec(phys: Physical);
Shadow Page Tables: Usually, page tables implement the mapping from virtual to physical address (including access rights). However, this tree data structure can also be used for other purposes. It can store different data instead of physical addresses, like reference counters. In this case, the page tables would not be given to the MMU. They are purely used as a map data structure (like a hash map). As a nice benefit, you probably could reuse the logic for creating and traversing them from your existing page directory/table abstractions. Also note, that you would use the physical address to walk the shadow page tables, not the virtual.
Typical Page Directory:
phys, rights <-MMU-- PD[virt]
Shadow Page Directory:
counter <-CPU-- ShadowPD[phys]
Struct Page:
Linux and other OSs commonly use a large array of entries for every physical page frame static FRAMES: [Frame; 1<<20]
.
These arrays can be quickly indexed with physical_address / 4096
and store reference counters and other OS specific flags.
The downside of this approach is, that these entries quickly become filled with a lot of unrelated data and difficult to maintain.
Here you could use an AtomicUsize
as reference counter type, which does not have require a lock.
Mapping Data Structure:
The Rust standard library (alloc
) contains a BTreeMap
, which you could also use.
This requires a #[global_allocator]
and has to be locked (e.g. by the guard).
§Stress Test
The following application is a stress test for your COW implementation. It checks the general functionality, including a few edge cases.
#![no_std]
#![no_main]
static mut S_BUF: [u8; 4097] = [0; 4097];
static mut R_BUF: [u8; 4097] = [0; 4097];
#[no_mangle]
pub extern "C" fn start() -> ! {
syscall::fork();
syscall::fork();
let pid = syscall::pid();
let other = syscall::fork();
let s_buf = unsafe { &mut S_BUF.0 };
let r_buf = unsafe { &mut R_BUF.0 };
if pid == other {
// child
s_buf[0] = 3;
s_buf[4096] = pid as u8;
syscall::send(other, s_buf, r_buf).unwrap();
let a = char::from(b'A' + r_buf[0]);
let b = char::from(b'A' + 3 + pid as u8);
syscall::println!("Reply: {a}{b}");
} else {
// parent
let sender = syscall::receive(r_buf).unwrap();
r_buf[0] += r_buf[4096];
syscall::reply(sender, r_buf).unwrap();
}
syscall::exit();
}
/// This function is called on panic.
#[panic_handler]
fn panic(info: &core::panic::PanicInfo) -> ! {
syscall::println!(dbg: "{info}");
syscall::exit();
}
Here, we split our application into four pairs. Within each pair, the child sends a message (including its pid) to the parent, who waits for this message, modifies it, and sends it back to the child.
This test should print four lines of two identical letters similar to the following:
REPLY: DD
REPLY: GG
REPLY: EE
REPLY: FF
The order of lines could differ depending on the timing.
Finally, demonstrate the correctness of your implementation by printing the number of allocated page frames in the user and kernel area respectively:
- Before starting the scheduler in
main
- When first starting the idle thread (all apps have terminated)