The Kernel
The kernel is the core of the OS. It runs with full, unrestricted access to hardware: it can read and write any memory address, program the CPU directly, and control every attached device. The kernel is responsible for:- Process scheduling (deciding which process runs on the CPU and when)
- Memory management (mapping virtual addresses to physical RAM, enforcing that processes can’t read each other’s memory)
- Device I/O (talking to disks, NICs, GPUs through device drivers)
- System call dispatch (receiving requests from user-space programs and executing them safely)
User Space
User space is where your applications live — shells, browsers, servers, databases. User-space processes run in restricted mode (Ring 3). The CPU hardware enforces this: if a user-space process attempts a privileged instruction, the CPU generates a trap, hands control to the OS, and the OS terminates the offending process with a segmentation fault or similar error. User-space programs communicate with the kernel exclusively through system calls — a controlled interface where the process passes arguments to a numbered call (e.g., syscall 0 forread, syscall 1 for write) and the kernel executes the hardware operation on its behalf, then returns the result. The kernel switches from user mode to kernel mode for the duration of the call, then switches back. This mode-switching is expensive, which is one reason OS designers minimize unnecessary system calls.
auditd can track exactly which process attempted to open which file, and whether it was allowed.
Privilege Rings in Practice
x86 hardware supports four privilege rings (0–3), but in practice modern OSes use only two:| Ring | Name | Who uses it |
|---|---|---|
| 0 | Kernel Mode | OS kernel, device drivers |
| 3 | User Mode | All user applications |
Hypervisors
A hypervisor (Virtual Machine Monitor) sits between physical hardware and virtual machines, managing resource allocation and enforcing isolation between guests. The security model of every VM on a host depends entirely on the hypervisor’s integrity.- Type 1 — Bare-Metal Hypervisor
- Type 2 — Hosted Hypervisor
A Type 1 hypervisor runs directly on the physical hardware with no host OS underneath. It is the first software that loads when the machine boots.How it works: The hypervisor owns the hardware and presents a virtualized view to each guest OS. Guest kernels run in Ring 0 of their virtual environment, but the hypervisor intercepts any instruction that would actually touch real hardware.Security profile: Smaller attack surface because there is no host OS to compromise. A vulnerability in the hypervisor itself is still critical, but there are fewer software layers overall.Performance: Lower overhead — guest VMs communicate more directly with hardware through the thin hypervisor layer.Examples: VMware ESXi, Microsoft Hyper-V Server, KVM (built into the Linux kernel), XenTypical use: Data centers, cloud providers (AWS, Azure, GCP all run Type 1 hypervisors under their instances)
VM Escape
VM escape exploits a vulnerability in the hypervisor itself. The attack flow looks like this:- Attacker controls code inside a guest VM
- Code exploits a bug in the hypervisor (e.g., in the emulated device driver, shared clipboard, or drag-and-drop feature)
- Attacker gains code execution in the hypervisor process
- Attacker now controls the host — and by extension, all other VMs
- Keep hypervisors patched — hypervisor CVEs are critical-severity by default
- Disable unnecessary virtual hardware (shared folders, clipboard sharing, USB passthrough) in production VMs
- Run untrusted workloads on physically separate hardware when the risk justifies it
Side-Channel Attacks
Even without breaking the hypervisor, VMs sharing a physical host share physical resources — CPU caches, memory buses, power rails. An attacker can exploit these shared resources to extract information without ever crossing the software isolation boundary. A side-channel attack recovers secrets not by breaking cryptography directly, but by measuring indirect signals: timing, cache behavior, or power consumption of shared hardware. Classic example — cache timing: Two VMs run on the same CPU. VM A is performing AES encryption. VM B belongs to the attacker. VM B repeatedly accesses memory to probe the CPU’s L3 cache and measures how long each access takes. Accesses that are slower indicate cache lines that VM A has evicted — this timing difference leaks information about VM A’s memory access patterns and, eventually, its encryption key. Meltdown and Spectre (2018) are the most famous side-channel attacks. They exploited speculative execution — a CPU performance optimization where the processor executes instructions ahead of time before knowing if they’re needed. By triggering speculative execution of a privileged memory read and then measuring cache timing, a user-space (Ring 3) process could read kernel memory (Ring 0) without the OS ever detecting the access. The vulnerability affected nearly every modern CPU and required mitigations at the OS, hypervisor, and CPU microcode levels simultaneously. Mitigations for side-channel attacks include:- Kernel Page-Table Isolation (KPTI) — separates kernel and user page tables to limit speculative information leakage
- Disabling hyper-threading in high-security environments (sibling hyperthreads share L1 cache)
- Core scheduling — ensuring that co-located VMs belonging to different tenants do not share a physical CPU core