The most fundamental security boundary in any modern operating system is the line between kernel space and user space. Every security control — file permissions, process isolation, access control — depends on this boundary holding. When it breaks, everything above it breaks too.

The Kernel

The kernel is the core of the OS. It runs with full, unrestricted access to hardware: it can read and write any memory address, program the CPU directly, and control every attached device. The kernel is responsible for:
  • Process scheduling (deciding which process runs on the CPU and when)
  • Memory management (mapping virtual addresses to physical RAM, enforcing that processes can’t read each other’s memory)
  • Device I/O (talking to disks, NICs, GPUs through device drivers)
  • System call dispatch (receiving requests from user-space programs and executing them safely)
The kernel runs in privileged mode (Ring 0 on x86 hardware). In this mode the CPU permits every instruction, including ones that manipulate memory maps and disable interrupts. No user application ever runs at this level.

User Space

User space is where your applications live — shells, browsers, servers, databases. User-space processes run in restricted mode (Ring 3). The CPU hardware enforces this: if a user-space process attempts a privileged instruction, the CPU generates a trap, hands control to the OS, and the OS terminates the offending process with a segmentation fault or similar error. User-space programs communicate with the kernel exclusively through system calls — a controlled interface where the process passes arguments to a numbered call (e.g., syscall 0 for read, syscall 1 for write) and the kernel executes the hardware operation on its behalf, then returns the result. The kernel switches from user mode to kernel mode for the duration of the call, then switches back. This mode-switching is expensive, which is one reason OS designers minimize unnecessary system calls.
┌─────────────────────────────────────┐
│   KERNEL MODE (Ring 0)              │
│   - Full hardware access            │
│   - Can modify memory maps          │
│   - Device drivers live here        │
└──────────────┬──────────────────────┘
               │  system call (trap)
┌──────────────┴──────────────────────┐
│   USER MODE (Ring 3)                │
│   - Your apps run here              │
│   - Cannot touch hardware directly  │
│   - Must ask kernel via syscalls    │
└─────────────────────────────────────┘
The user has zero direct hardware access. All hardware interaction is mediated by the kernel. This design also means the kernel automatically logs system calls, which is why kernel-level audit frameworks like Linux’s auditd can track exactly which process attempted to open which file, and whether it was allowed.

Privilege Rings in Practice

x86 hardware supports four privilege rings (0–3), but in practice modern OSes use only two:
RingNameWho uses it
0Kernel ModeOS kernel, device drivers
3User ModeAll user applications
Rings 1 and 2 were originally intended for OS services and device drivers but are unused by mainstream OSes. Hypervisors introduced a new level below Ring 0 (sometimes called Ring -1) to manage guest operating systems — this is how a Type 1 hypervisor can run a guest OS that itself runs in Ring 0 without giving the guest true hardware control.

Hypervisors

A hypervisor (Virtual Machine Monitor) sits between physical hardware and virtual machines, managing resource allocation and enforcing isolation between guests. The security model of every VM on a host depends entirely on the hypervisor’s integrity.
A Type 1 hypervisor runs directly on the physical hardware with no host OS underneath. It is the first software that loads when the machine boots.How it works: The hypervisor owns the hardware and presents a virtualized view to each guest OS. Guest kernels run in Ring 0 of their virtual environment, but the hypervisor intercepts any instruction that would actually touch real hardware.Security profile: Smaller attack surface because there is no host OS to compromise. A vulnerability in the hypervisor itself is still critical, but there are fewer software layers overall.Performance: Lower overhead — guest VMs communicate more directly with hardware through the thin hypervisor layer.Examples: VMware ESXi, Microsoft Hyper-V Server, KVM (built into the Linux kernel), XenTypical use: Data centers, cloud providers (AWS, Azure, GCP all run Type 1 hypervisors under their instances)

VM Escape

VM escape is one of the most severe attacks in virtualization security. A successful VM escape lets an attacker who controls code inside one VM break out of its isolation and gain control of the hypervisor or host — which means they can then access every other VM on the same physical machine.In shared infrastructure (cloud providers, hosting companies), VM escape could let one customer’s malicious code reach another customer’s data.
VM escape exploits a vulnerability in the hypervisor itself. The attack flow looks like this:
  1. Attacker controls code inside a guest VM
  2. Code exploits a bug in the hypervisor (e.g., in the emulated device driver, shared clipboard, or drag-and-drop feature)
  3. Attacker gains code execution in the hypervisor process
  4. Attacker now controls the host — and by extension, all other VMs
Real-world VM escape vulnerabilities have been found in VirtualBox (shared folders, 3D acceleration), VMware (drag-and-drop, SVGA), and QEMU (VGA emulation). The VENOM vulnerability (CVE-2015-3456) in QEMU’s floppy drive emulation allowed guest-to-host escape. To reduce VM escape risk:
  • Keep hypervisors patched — hypervisor CVEs are critical-severity by default
  • Disable unnecessary virtual hardware (shared folders, clipboard sharing, USB passthrough) in production VMs
  • Run untrusted workloads on physically separate hardware when the risk justifies it

Side-Channel Attacks

Even without breaking the hypervisor, VMs sharing a physical host share physical resources — CPU caches, memory buses, power rails. An attacker can exploit these shared resources to extract information without ever crossing the software isolation boundary. A side-channel attack recovers secrets not by breaking cryptography directly, but by measuring indirect signals: timing, cache behavior, or power consumption of shared hardware. Classic example — cache timing: Two VMs run on the same CPU. VM A is performing AES encryption. VM B belongs to the attacker. VM B repeatedly accesses memory to probe the CPU’s L3 cache and measures how long each access takes. Accesses that are slower indicate cache lines that VM A has evicted — this timing difference leaks information about VM A’s memory access patterns and, eventually, its encryption key. Meltdown and Spectre (2018) are the most famous side-channel attacks. They exploited speculative execution — a CPU performance optimization where the processor executes instructions ahead of time before knowing if they’re needed. By triggering speculative execution of a privileged memory read and then measuring cache timing, a user-space (Ring 3) process could read kernel memory (Ring 0) without the OS ever detecting the access. The vulnerability affected nearly every modern CPU and required mitigations at the OS, hypervisor, and CPU microcode levels simultaneously. Mitigations for side-channel attacks include:
  • Kernel Page-Table Isolation (KPTI) — separates kernel and user page tables to limit speculative information leakage
  • Disabling hyper-threading in high-security environments (sibling hyperthreads share L1 cache)
  • Core scheduling — ensuring that co-located VMs belonging to different tenants do not share a physical CPU core