Let’s Build an Own Operating System (PrimitiveOS)

8 min readAug 13, 2021

Part 4- Segmentation

Welcome Back!

This is my journey through making a new Operating System named PrimitiveOS.This is the fourth article of the article series and after reading this you can get a proper idea about booting an OS.

Before entering this Please read previous parts if you haven’t already done so. In the last article, I had written about how to display text on the console as well as writing data to the serial port you can read that here.

Let’s Build an Own Operating System (PrimitiveOS)

Part 3- Implement inputs and outputs

tharakasachin98.medium.com

Segmentation

In Operating Systems, Segmentation is a memory management technique in which the memory is divided into variable size parts. Each part is known as a segment that can be allocated to a process.

The details about each segment are stored in a table called a segment table. The segment table is stored in one (or many) of the segments.

The segment table contains mainly two information about the segment:

Base: It is the base address of the segment

2. Limit: It is the length of the segment.

Segmentation in x86 means accessing the memory through segments. Segments are portions of the address space, possibly overlapping, specified by a base address and a limit. To address a byte in segmented memory you use a 48-bit logical address: 16 bits that specify the segment and 32-bits that specify what offset within that segment you want. The offset is added to the base address of the segment, and the resulting linear address is checked against the segment’s limit — see the figure below. If everything works out fine (including access-rights checks ignored for now) the result is a linear address. When paging is disabled, then the linear address space is mapped 1:1 onto the physical address space, and the physical memory can be accessed. (See the chapter “Paging” for how to enable paging.)

Why Segmentation is required?

Till now, we were using Paging as our main memory management technique. Paging is more close to the Operating system rather than the User. It divides all the processes into the form of pages even though a process can have some relative parts of functions that need to be loaded on the same page.

The operating system doesn’t care about the User’s view of the process. It may divide the same function into different pages and those pages may or may not be loaded at the same time into the memory. It decreases the efficiency of the system.

It is better to have a segmentation that divides the process into segments. Each segment contains the same type of functions such as the main function can be included in one segment and the library functions can be included in the other segment.

Translation of logical addresses to linear addresses.

To enable segmentation you need to set up a table that describes each segment — a segment descriptor table. In x86, there are two types of descriptor tables: the Global Descriptor Table (GDT) and Local Descriptor Tables (LDT). An LDT is set up and managed by user-space processes, and all processes have their own LDT. LDTs can be used if a more complex segmentation model is desired — we won’t use it. The GDT is shared by everyone — it’s global.

Advantages of Segmentation

No internal fragmentation
The average Segment Size is larger than the actual page size.
Less overhead
It is easier to relocate segments than the entire address space.
The segment table is of lesser size as compared to the page table in paging.

Disadvantages

It can have external fragmentation.
it is difficult to allocate contiguous memory to variable-sized partitions.

Costly memory management algorithms.

Now we can go through the main topics.

#1. Accessing Memory

Most of the time when accessing memory there is no need to explicitly specify the segment to use. The processor has six 16-bit segment registers: cs, ss, ds, es, gs and fs. The register cs is the code segment register and specifies the segment to use when fetching instructions. The register ss is used whenever accessing the stack (through the stack pointer esp), and ds is used for other data accesses. The OS is free to use the registers es, gs and fs however, it wants.

Below is an example showing implicit use of the segment registers:

func:
        mov eax, [esp+4]
        mov ebx, [eax]
        add ebx, 8
        mov [eax], ebx
        ret

The above example can be compared with the following one that makes explicit use of the segment registers:

func:
        mov eax, [ss:esp+4]
        mov ebx, [ds:eax]
        add ebx, 8
        mov [ds:eax], ebx
        ret

You don’t need to use ss it for storing the stack segment selector, or ds for the data segment selector. You could store the stack segment selector in ds and vice versa. However, in order to use the implicit style shown above, you must store the segment selectors in their indented registers.

#2. The Global Descriptor Table (GDT)

A GDT/LDT is an array of 8-byte segment descriptors. The first descriptor in the GDT is always a null descriptor and can never be used to access memory. At least two segment descriptors (plus the null descriptor) are needed for the GDT because the descriptor contains more information than just the base and limit fields. The two most relevant fields for us are the Type field and the Descriptor Privilege Level (DPL) field.

Table 3–1 in chapter 3 of the Intel manual [33] specifies the values for the Type field. The table shows that the Type field can’t be both writable and executable at the same time. Therefore, two segments are needed: one segment for executing code to put in cs (Type is Execute-only or Execute-Read) and one segment for reading and writing data (Type is Read/Write) to put in the other segment registers.

The DPL specifies the privilege levels required to use the segment. x86 allows for four privilege levels (PL), 0 to 3, where PL0 is the most privileged. In most operating systems (eg. Linux and Windows), only PL0 and PL3 are used. However, some operating systems, such as MINIX, make use of all levels. The kernel should be able to do anything, therefore it uses segments with DPL set to 0 (also called kernel mode). The current privilege level (CPL) is determined by the segment selector in cs.

The segments needed are described in the table below.

#3. Loading the GDT

Loading the GDT into the processor is done with the lgdt assembly code instruction, which takes the address of a struct that specifies the start and size of the GDT. It is easiest to encode this information using a “packed struct” as shown in the following example:

struct gdt {
        unsigned int address;
        unsigned short size;
    } __attribute__((packed));

If the content of the eax the register is the address to such a struct, then the GDT can be loaded with the assembly code shown below:

lgdt [eax]

It might be easier if you make this instruction available from C, the same way as was done with the assembly code instructions in and out.

After the GDT has been loaded the segment registers need to be loaded with their corresponding segment selectors. The content of a segment selector is described in the figure and table below:

Bit:     | 15                                3 | 2  | 1 0 |
Content: | offset (index)                      | ti | rpl |

The layout of segment selectors

The offset of the segment selector is added to the start of the GDT to get the address of the segment descriptor: 0x08 for the first descriptor and 0x10 for the second, since each descriptor is 8 bytes. The Requested Privilege Level (RPL) should be 0 since the kernel of the OS should execute in privilege level 0.

Loading the segment selector registers is easy for the data registers — just copy the correct offsets to the registers:

mov ds, 0x10
    mov ss, 0x10
    mov es, 0x10
    .
    .
    .

To load cs we have to do a “far jump”:

; code here uses the previous cs
    jmp 0x08:flush_cs   ; specify cs when jumping to flush_cs    flush_cs:
        ; now we've changed cs to 0x08

A far jump is a jump where we explicitly specify the full 48-bit logical address: the segment selector to use and the absolute address to jump to. It will first set cs to 0x08 and then jump to flush_cs using its absolute address.

Into the hands-on

So, here’s the assembly code for the file called “gdt.s” which is located in a newly created directory inside the correct location. (in my case I should create it inside the Assignment folder). I named it “segmentation”. This will help to access the memory.

Step 1 — Open your text editor and type this and save gdt.s

Loading the GDT into the processor is done with the “lgdt” assembly code instruction, which takes the address of a struct that specifies the start and size of the GDT. To do that you need to declare functions and structures within a file called “memory_segments.h” which is in the “segmentation” directory.

Step 2— Open your text editor and type this and save memory_segments.h:

Step 3 — Define those functions in a file called “memory_segments.c” which is also in the “segmentation” directory.

open your text editor and type this and save memory_segments.c:

Step 4 — Configure the “Makefile” properly like this to run your OS.

Step 5 — Call the “segments_install_gdt” function in the “kmain.c” file.

Step 6 — open your terminal type the command make run