Let’s Build an Own Operating System (PrimitiveOS)

Sachin Tharaka
7 min readSep 6, 2021

--

Part 7- Paging an OS

Welcome Back!

This is my journey through making a new Operating System named PrimitiveOS.This is the 7th article of the article series and after reading this you can get a proper idea about paging in an OS.

Before entering this Please read previous parts if you haven’t already done so. In the last article, I had written about integrating user modes in an OS.

What is paging

In Operating Systems, Paging is a storage mechanism used to retrieve processes from the secondary storage into the main memory in the form of pages.

The main idea behind the paging is to divide each process in the form of pages. The main memory will also be divided into the form of frames.

One page of the process is to be stored in one of the frames of the memory. The pages can be stored at the different locations of the memory but the priority is always to find the contiguous frames or holes.

Pages of the process are brought into the main memory only when they are required otherwise they reside in the secondary storage.

Why Paging

OS performs an operation for storing and retrieving data from secondary storage devices for use in main memory. Paging is one such memory management scheme. Data is retrieved from storage media by OS, in the same sized blocks called pages. Paging allows the physical address space of the process to be noncontiguous. The whole program had to fit into storage contiguously.

Paging is to deal with external fragmentation problems. This is to allow the logical address space of a process to be noncontiguous, which makes the process to be allocated physical memory.

  • Logical Address or Virtual Address (represented in bits): An address generated by the CPU
  • Logical Address Space or Virtual Address Space( represented in words or bytes): The set of all logical addresses generated by a program
  • Physical Address (represented in bits): An address actually available on the memory unit
  • Physical Address Space (represented in words or bytes): The set of all physical addresses corresponding to the logical addresses

Example:

  • If Logical Address = 31 bit, then Logical Address Space = 231 words = 2 G words (1 G = 230)
  • If Logical Address Space = 128 M words = 27 * 220 words, then Logical Address = log2 227 = 27 bits
  • If Physical Address = 22 bit, then Physical Address Space = 222 words = 4 M words (1 M = 220)
  • If Physical Address Space = 16 M words = 24 * 220 words, then Physical Address = log2 224 = 24 bits

The mapping from virtual to physical address is done by the memory management unit (MMU) which is a hardware device and this mapping is known as the paging technique.

  • The Physical Address Space is conceptually divided into a number of fixed-size blocks, called frames.
  • The Logical Address Space is also split into fixed-size blocks, called pages.
  • Page Size = Frame Size

Identity Paging

The simplest kind of paging is when we map each virtual address onto the same physical address, called identity paging. This can be done at compile time by creating a page directory where each entry points to its corresponding 4 MB frame. It can of course also be done at run-time by using ordinary assembly code instructions.

Enabling

Enabling paging is actually very simple. All that is needed is to load CR3 with the address of the page directory and to set the paging (PG) and protection (PE) bits of CR0. Note: setting the paging flag when the protection flag is clear causes a general protection exception.

mov eax, page_directory
mov cr3, eax

mov eax, cr0
or eax, 0x80000001
mov cr0, eax

If you want to set pages as read-only for both userspace and supervisor, replace 0x80000001 above with 0x80010001, which also sets the WP bit.

To enable PSE (4 MiB pages) the following code is required.

mov eax, cr4
or eax, 0x00000010
mov cr4, eax

Now paging is enabled.

Physical Address Extension

All Intel processors since Pentium Pro (with exception of the Pentium M at 400 Mhz) and all AMD since the Athlon series implement the Physical Address Extension (PAE). This feature allows you to access up to 64 GiB (2³⁶) of RAM. You can check for this feature using CPUID. Once checked, you can activate this feature by setting bit 5 in CR4. Once active, the CR3 register points to a table of 4 64-bit entries, each one pointing to a page directory made of 4096 bytes (like in normal paging), divided into 512 64-bit entries, each pointing to a 4096-byte page table, divided into 512 64bit page entries.

More…

An instruction that is useful when updating a PDT or PT is invlpg. It invalidates the Translation Lookaside Buffer (TLB) entry for a virtual address. The TLB is a cache for translated addresses, mapping physical addresses corresponding to virtual addresses. This is only required when changing a PDE or PTE that was previously mapped to something else. If the PDE or PTE had previously been marked as not present (bit 0 was set to 0), executing invlpg is unnecessary. Changing the value of cr3 will cause all entries in the TLB to be invalidated.

An example of invalidating a TLB entry is shown below:

; invalidate any TLB references to virtual address 0
invlpg [0]

Paging and the Kernel

This section will describe how paging affects the OS kernel. We encourage you to run your OS using identity paging before trying to implement a more advanced paging setup, since it can be hard to debug a malfunctioning page table that is set up via assembly code.

Paging Tricks

When the present bit in the PDE or PTE is cleared, the processor always throws a page fault exception, regardless of the address. This implies that the contents of the PTE or PDE may be utilized to rapidly load a page saved on mass storage by indicating its location in the PTE or PDE. Use these entries to indicate the place in the paging file where a page may be quickly loaded, then set the present bit to 0.

Reasons to Not Identity Map the Kernel

If the kernel is placed at the beginning of the virtual address space — that is, the virtual address space (0x00000000, "size of kernel") maps to the location of the kernel in memory - there will be issues when linking the user mode process code. Normally, during linking, the linker assumes that the code will be loaded into the memory position 0x00000000. Therefore, when resolving absolute references, 0x00000000 will be the base address for calculating the exact position. But if the kernel is mapped onto the virtual address space (0x00000000, "size of kernel"), the user mode process cannot be loaded at virtual address 0x00000000 - it must be placed somewhere else. Therefore, the assumption from the linker that the user mode process is loaded into memory at position 0x00000000 is wrong. This can be corrected by using a linker script which tells the linker to assume a different starting address, but that is a very cumbersome solution for the users of the operating system.

This also assumes that we want the kernel to be part of the user mode process’ address space. As we will see later, this is a nice feature, since during system calls we don’t have to change any paging structures to get access to the kernel’s code and data. The kernel pages will of course require privilege level 0 for access, to prevent a user process from reading or writing kernel memory.

Into Hands-On

We have to Edit the following files

Makefile

kmain

link.ld

Files that are needed for paging purpose

paging.c/h

common.c/h

kheap.c/h

paging_enable.s

First We can create our new files.

Paging.c file

Paging.h file

Likewise, we can add new files and we can modify our existing files as we want.

You can refer the following repository for that.

https://github.com/Sachin-Tharaka/PrimitiveOS (Branch- vertual_memory_paging)

The Virtual Address for the Kernel

Preferably, the kernel should be placed at a very high virtual memory address, for example 0xC0000000 (3 GB). The user mode process is not likely to be 3 GB large, which is now the only way that it can conflict with the kernel. When the kernel uses virtual addresses at 3 GB and above it is called a higher-half kernel. 0xC0000000 is just an example, the kernel can be placed at any address higher than 0 to get the same benefits. Choosing the correct address depends on how much virtual memory should be available for the kernel (it is easiest if all memory above the kernel virtual address should belong to the kernel) and how much virtual memory should be available for the process.

If the user mode process is larger than 3 GB, some pages will need to be swapped out by the kernel. Swapping pages is not part of this book.

Virtual Memory Through Paging

Paging enables two things that are good for virtual memory. First, it allows for fine-grained access control to memory. You can mark pages as read-only, read-write, only for PL0 etc. Second, it creates the illusion of contiguous memory. User mode processes, and the kernel, can access memory as if it were contiguous, and the contiguous memory can be extended without the need to move data around in memory. We can also allow the user mode programs access to all memory below 3 GB, but unless they actually use it, we don’t have to assign page frames to the pages. This allows processes to have code located near 0x00000000 and the stack at just below 0xC0000000, and still not require more than two actual pages.

Thank you for reading. We will meet soon.

#staysafe #stayconnected

Reference: https://littleosbook.github.io/ https://intermezzos.github.io/
http://os.phil-opp.com/

For any issue with making files, you can follow my repository

https://github.com/Sachin-Tharaka/PrimitiveOS

--

--

Sachin Tharaka

Software Engineering, University of Kelaniya, Sri Lanka