Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

virtio-blk Block Device Driver

Overview

The kernel includes a PCI virtio-blk driver that provides read/write access to a QEMU virtual disk. The driver is implemented using the virtio-drivers crate (v0.13) and integrates with the existing actor/driver framework.

The driver is started automatically at boot if a virtio-blk PCI device is found. It is accessible from the shell via the blk commands.


Architecture

QEMU virtio-blk device (PCIe, Q35 ECAM)
        │
        │  PciTransport (virtio-drivers)
        ▼
  VirtIOBlk<KernelHal, PciTransport>   ← virtio protocol implementation
        │
  spin::Mutex (actor + ISR safe)
        │
  VirtioBlkActor                        ← actor framework wrapper
        │
  Mailbox<ActorMsg<VirtioBlkMsg, VirtioBlkInfo>>
        │
  Shell / other actors                  ← consumers

Components

devices/src/virtio/mod.rs — HAL and transport

KernelHal

Implements the virtio_drivers::Hal unsafe trait, bridging the virtio-drivers crate into the kernel memory model:

MethodImplementation
dma_alloc(pages)Allocates contiguous physical frames via MemoryServices::alloc_dma_pages; returns (paddr, virt) where virt is in the linear physical-memory window (phys_mem_offset + paddr). Pages are zeroed.
dma_deallocNo-op. The frame allocator has no free operation; allocations are leaked (acceptable for MVP).
mmio_phys_to_virt(paddr, size)Calls MemoryServices::map_mmio_region to ensure the physical range is mapped, then returns the linear-window virtual address.
share(buffer)Performs a page-table walk via MemoryServices::translate_virt to find the physical address of any buffer (heap or DMA window). A plain vaddr - phys_mem_offset would be wrong for heap buffers.
unshareNo-op on x86 (cache-coherent).

ECAM / PciRoot

The Q35 machine exposes a PCIe Extended Configuration Access Mechanism (ECAM) region at physical address 0xB000_0000 (1 MiB, covering bus 0).

Physical 0xB000_0000  →  Virtual phys_mem_offset + 0xB000_0000

The mapping is created once during libkernel_main by calling MemoryServices::map_mmio_region. The resulting virtual base is stored in the ECAM_VIRT_BASE atomic and used by create_pci_root() which constructs a PciRoot<MmioCam<'static>> for the virtio-drivers transport layer. (In virtio-drivers 0.13, PciRoot is generic over a ConfigurationAccess implementation; MmioCam wraps the raw MMIO pointer with a Cam::Ecam mode.)

create_pci_transport (formerly create_blk_transport)

#![allow(unused)]
fn main() {
pub fn create_pci_transport(bus: u8, device: u8, function: u8) -> Option<PciTransport>
}

Wraps PciTransport::new::<KernelHal, _>, isolating virtio-drivers from the kernel binary — the kernel crate does not depend on virtio-drivers directly. Works for any virtio-pci device (blk, 9p, etc.), not just block devices. create_blk_transport is kept as a legacy alias.

register_blk_irq

#![allow(unused)]
fn main() {
pub fn register_blk_irq(handler: fn()) -> Option<u8>
}

Registers a dynamic IDT handler for the virtio-blk interrupt (delegating to libkernel::interrupts::register_handler). Returns the allocated IDT vector, which must be programmed into the device’s MSI or IO APIC routing table. IRQ-driven completion is not yet wired up (see Limitations).


devices/src/virtio/blk.rs — the actor

Messages

#![allow(unused)]
fn main() {
pub enum VirtioBlkMsg {
    Read(u64, Reply<Result<Vec<u8>, ()>>),   // sector, reply
    Write(u64, Vec<u8>, Reply<Result<(), ()>>), // sector, data, reply
}
}

Info

#![allow(unused)]
fn main() {
#[derive(Debug)]
pub struct VirtioBlkInfo {
    pub capacity_sectors: u64,
    pub reads:  u64,
    pub writes: u64,
}
}

Returned by driver info virtio-blk and blk info.

VirtioBlkActor

Owns a spin::Mutex<VirtIOBlk<KernelHal, PciTransport>>. The mutex is needed because both the actor task and (future) interrupt handler may access the device.

unsafe impl Send + Sync are required because VirtIOBlk contains raw DMA buffer pointers, which are not auto-Send. Access is always serialised through the spin::Mutex.

Read/write flow

on_read(sector, reply):
  1. lock device → read_blocks_nb(sector, &mut req, buf, &mut resp) → token
  2. unlock device
  3. CompletionFuture.await  (busy-polls peek_used until the device signals done)
  4. lock device → complete_read_blocks(token, &req, buf, &resp)
  5. unlock device
  6. reply.send(Ok(buf))

Write is symmetric with write_blocks_nb / complete_write_blocks.

All of read_blocks_nb, write_blocks_nb, complete_read_blocks, and complete_write_blocks are unsafe fn in virtio-drivers — the safety contract is that the buffers remain valid and unpinned for the duration of the I/O. Because buf, req, and resp all live in the async state machine on the heap, they are not moved or dropped between submit and complete.

CompletionFuture

#![allow(unused)]
fn main() {
struct CompletionFuture<'a> {
    device: &'a spin::Mutex<VirtIOBlk<KernelHal, PciTransport>>,
}

impl Future for CompletionFuture<'_> {
    type Output = ();
    fn poll(...) -> Poll<()> {
        if device.lock().peek_used().is_some() {
            Poll::Ready(())
        } else {
            cx.waker().wake_by_ref();   // reschedule immediately (busy-poll)
            Poll::Pending
        }
    }
}
}

This is a busy-poll future for MVP. It re-schedules itself every executor turn until the virtqueue returns a used buffer. See Limitations for the planned IRQ-driven replacement.


libkernel/src/memory/mod.rs — supporting APIs

Three methods were added to MemoryServices for virtio support:

map_mmio_region(phys_start, size) -> VirtAddr

Maps a physical MMIO range into the linear physical-memory window (phys_mem_offset + phys_start) using 4 KiB pages with PRESENT | WRITABLE | NO_CACHE flags.

Pages already mapped as 4 KiB pages are skipped silently (Ok(_)). Pages inside a 2 MiB or 1 GiB huge-page entry are also skipped (Err(TranslateError::ParentEntryHugePage)) — they are already accessible because the bootloader maps all physical RAM using 2 MiB huge pages.

This huge-page check was the fix for the map_to failed: ParentEntryHugePage panic that occurred when mapping the ECAM region.

alloc_dma_pages(pages) -> Option<PhysAddr>

Allocates pages physically-contiguous 4 KiB frames from the BootInfoFrameAllocator. Panics if frames are not contiguous (very unlikely with the sequential allocator).

translate_virt(virt) -> Option<PhysAddr>

Walks the active RecursivePageTable to find the physical address for any virtual address, regardless of page size (4 KiB, 2 MiB, or 1 GiB).

This is used by KernelHal::share to convert heap buffer addresses to physical addresses. A simple vaddr - phys_mem_offset subtraction would be wrong for heap buffers (which live at HEAP_START, not in the linear physical window), producing garbage physical addresses and causing QEMU to report virtio: zero sized buffers are not allowed.


Boot Sequence

libkernel_main()
  1. memory::init_services(mapper, frame_allocator, phys_mem_offset, map)
  2. map_mmio_region(0xB000_0000, 1 MiB)   ← ECAM
     virtio::set_ecam_base(ecam_virt)
  3. devices::pci::init()                   ← scan CF8/CFC config space
  4. find_devices(0x1AF4, 0x1042)           ← probe modern-transitional first
     find_devices(0x1AF4, 0x1001)           ← then legacy
  5. virtio::create_pci_transport(bus, dev, func)
       └─ PciRoot::new(MmioCam::new(ECAM_VIRT_BASE, Cam::Ecam))
          PciTransport::new::<KernelHal, _>(&mut root, df)
  6. VirtioBlkActor::new(transport)
  7. VirtioBlkActorDriver::new(actor)
  8. driver::register + registry::register("virtio-blk", inbox)
  9. driver::start_driver("virtio-blk")
     → "[kernel] virtio-blk registered"

Shell Commands

CommandDescription
blk infoPrint capacity, read count, and write count
blk read <sector>Read 512 bytes from sector N; hex-dump first 64 bytes
blk ls [path]List exFAT directory (see exfat.md)
blk cat <path>Print exFAT file as text (see exfat.md)
ls [path]Alias for blk ls
cat <path>Alias for blk cat
driver info virtio-blkSame info via the generic driver info command
driver stop virtio-blkStop the actor (mailbox closed; no further I/O)
driver start virtio-blkRestart the actor

Running with a Disk

# Create a blank 64 MiB disk image (once):
make disk

# Build and run with the disk attached:
make run

The run target adds:

-drive file=disk.img,format=raw,if=none,id=hd0
-device virtio-blk-pci,drive=hd0

The kernel uses a Q35 machine (-machine q35) which provides native PCIe and ECAM support.

To run without a disk (e.g. for quick boot tests):

make run-nodisk

PCI Device IDs

Device IDVariant
0x1AF4:0x1042Modern-transitional virtio-blk (QEMU default)
0x1AF4:0x1001Legacy virtio-blk

Both are probed at boot; modern-transitional is tried first.


Key Files

FileRole
devices/src/virtio/mod.rsKernelHal, ECAM state, create_pci_transport, register_blk_irq
devices/src/virtio/blk.rsVirtioBlkActor, VirtioBlkMsg, VirtioBlkInfo, CompletionFuture
devices/src/virtio/p9_proto.rs9P2000.L wire protocol encode/decode
devices/src/virtio/p9.rsP9Client — high-level 9P client wrapping VirtIO9p
kernel/src/main.rsECAM mapping, PCI probe (blk + 9p), actor registration
devices/src/virtio/exfat.rsexFAT partition detection, filesystem, path walk
kernel/src/shell.rsblk info, blk read, blk ls, blk cat, ls, cat, cd, pwd
libkernel/src/memory/mod.rsmap_mmio_region, alloc_dma_pages, translate_virt
Makefiledisk, run, run-nodisk targets

Limitations

Busy-poll completion

CompletionFuture re-schedules itself every executor turn, consuming CPU until the device completes I/O. The intended replacement is an AtomicWaker-based future that sleeps until the IRQ handler calls wake():

#![allow(unused)]
fn main() {
static IRQ_WAKER: AtomicWaker = AtomicWaker::new();

fn virtio_blk_irq_handler() {
    IRQ_PENDING.store(true, Ordering::Release);
    IRQ_WAKER.wake();
}
}

This requires programming the device’s MSI capability or IO APIC routing with the vector returned by register_blk_irq. The infrastructure exists; wiring is the remaining work.

No DMA free

dma_dealloc is a no-op. Freed DMA pages are leaked. The BootInfoFrameAllocator has no reclamation path. Acceptable for MVP; a proper frame allocator with free would be needed for a production kernel.

Single device

The IRQ state (IRQ_PENDING) is a file-level static, supporting only one virtio-blk device. Multi-device support would require per-device state.

Heap size

The kernel heap is 100 KiB. DMA allocations come from the frame allocator (not the heap), but Vec<u8> read buffers and BlkReq/BlkResp structs live on the heap. Sustained I/O workloads should remain well within the limit.