Architecture Guide

What this page covers: UBI internals — on-flash layout, in-RAM data structures, initialization, wear-leveling, dual-bank metadata redundancy, recovery, and failure handling.

Prerequisites: Read the Overview first for the mental model (PEB, LEB, EC, VID, EBA).

What you will learn: How UBI maps logical blocks to physical blocks, how it recovers from crashes, and how wear is distributed across the flash.

30-Second Summary

UBI divides a flash partition into Physical Erase Blocks (PEBs). The first N PEBs (configurable, default 2) store mirrored device and volume metadata. All remaining PEBs hold user data. Each data PEB carries an Erase Counter (EC) header and a Volume Identifier (VID) header followed by the payload. At init, UBI scans every PEB and builds an in-RAM red-black tree cache of free, dirty, and bad blocks plus per-volume LEB-to-PEB mappings. Writes always pick the free PEB with the lowest erase count (wear-leveling). Crash recovery relies on monotonically increasing sequence numbers in VID headers — the higher sqnum always wins.

Core Invariants

These rules hold at all times after a successful ubi_device_init():

Invariant

Description

One LEB, one PEB

Each mapped LEB points to exactly one active PEB. No two LEBs share a PEB.

Higher sqnum wins

During init, if two PEBs claim the same (vol_id, lnum), the one with the higher sequence number is kept; the other becomes dirty.

Erase before reuse

A dirty PEB must be erased before it can return to the free pool. No in-place overwrites.

Bad PEBs are terminal

Once a PEB is classified as bad, it never returns to the free or dirty pool (unless torture recovery succeeds).

Reserved PEBs are mirrors

The first N reserved PEBs hold identical copies of device + volume metadata. They are never used for data.

Free pool is EC-ordered

free_pebs is a red-black tree keyed by erase count. rb_get_min() always returns the least-worn block.

Mutex serialization

All public API calls acquire a per-device mutex. UBI is thread-safe but not ISR-safe.

Secure extension: For authenticated encryption of all on-flash structures, see the Secure Architecture Guide.


Flash Storage Primer

Raw flash memory (NAND or NOR) differs from block devices like SD cards or eMMC in several important ways:

  • Erase before write — a flash cell must be erased before it can be written. Erasing sets all bytes to the hardware-defined erased value (typically 0xFF for NOR flash, but this may differ on other technologies).

  • Erase granularity — erasure operates on large blocks (erase blocks), typically 4 KB to 256 KB.

  • Write granularity — writes operate on smaller units (write blocks), typically 1 to 16 bytes.

  • Limited endurance — each erase block supports a finite number of erase cycles (typically 10,000 to 100,000) before it becomes unreliable.

  • Bad blocks — blocks can fail at any point during the device lifetime.

Without wear-leveling, repeatedly writing to the same logical location would exhaust a small set of physical blocks while the rest remain unused. UBI solves this by dynamically remapping logical blocks to physical blocks, always choosing the least-worn block for new writes.

Key terminology:

Term

Meaning

PEB

Physical Erase Block — a hardware erase unit on the flash chip

LEB

Logical Erase Block — a virtual block exposed to the application

EC

Erase Counter — tracks how many times a PEB has been erased

VID

Volume Identifier — metadata linking a PEB to a volume and LEB

EBA

Erase Block Association — the mapping table from LEBs to PEBs


Architecture Overview

+-----------------------------------------------------+
|                   Application                       |
+-----------------------------------------------------+
            |                          ^
            | ubi_leb_write()          | ubi_leb_read()
            | ubi_volume_create()      | ubi_volume_get_info()
            | ubi_device_init()        | ubi_device_get_info()
            v                          |
+-----------------------------------------------------+
|                     UBI Layer                       |
|                                                     |
|  +---------------+  +---------------+  +---------+  |
|  | Volume Mgmt   |  | LEB I/O       |  | Wear-   |  |
|  | create/remove |  | read/write    |  | Level   |  |
|  | resize/info   |  | map/unmap     |  | Engine  |  |
|  +---------------+  +---------------+  +---------+  |
|                                                     |
|  +----------------------------------------------+   |
|  |           PEB Management (RBT Cache)         |   |
|  |  free_pebs | dirty_pebs | bad_pebs | vols    |   |
|  +----------------------------------------------+   |
+-----------------------------------------------------+
            |                          ^
            | flash_area_write()       | flash_area_read()
            | flash_area_erase()       |
            v                          |
+-----------------------------------------------------+
|          Zephyr Flash Area API (Flash Map)          |
+-----------------------------------------------------+
            |                          ^
            v                          |
+-----------------------------------------------------+
|              Flash Hardware (NOR / NAND)            |
+-----------------------------------------------------+

Source files:

File

Role

lib/include/ubi.h

Public API — all structures and function declarations

lib/src/ubi_core_init.c

Device initialization — format, scan, mount

lib/src/ubi_core_runtime.c

Device runtime — get_info, erase_peb, deinit, test API

lib/src/ubi_volume.c

Volume management — create, resize, remove, get_info

lib/src/ubi_leb.c

LEB operations — read, write (copy-on-write), map, unmap (idempotent), is_mapped, get_size

lib/src/ubi_cache.c

Red-black tree comparator and search helpers

lib/src/ubi_internal.h

Shared internal types (ubi_device, ubi_volume) and helpers

lib/src/ubi_cache.h

RBT and linked-list item types

lib/src/ubi_io.h

On-flash header structures and constants

lib/src/ubi_io_metadata.c

Metadata I/O — device and volume header read/write

lib/src/ubi_io_data.c

Data I/O — EC/VID header and LEB data read/write, flash write/erase fault injection

lib/src/ubi_flash_res_peb.h

Reserved PEB state types and API declarations

lib/src/ubi_flash_res_peb.c

Reserved PEB scanning, recovery, overwrite, and commit

lib/src/ubi_partition_guard.h

Single-handle-per-partition registry API

lib/src/ubi_partition_guard.c

Static bitfield registry preventing double-init of the same partition

lib/src/ubi_mem.h

Memory abstraction layer API — device, volume, leaf, scratch allocators

lib/src/ubi_mem.c

Static (k_mem_slab) and heap (k_malloc) backend implementations


On-Flash Layout

UBI reserves the first N PEBs for device and volume metadata, stored in a dual-bank configuration for crash resilience. N is configurable via CONFIG_UBI_DEV_HDR_NR_OF_RES_PEBS (default 2, range 2–4). The remaining PEBs (N through total-1) are data blocks available for volume use.

Flash Partition (default: N=2 reserved PEBs)
+====================+====================+=====+====================+
| PEB 0 (Reserved)   | PEB 1 (Reserved)   | ... | PEB total-1        |
| Device Header Bank | Device Header Bank |     | Data Block         |
+====================+====================+=====+====================+

Reserved PEB Layout (reserved PEBs are mirrors):

Offset 0x000  +----------------------+
              | Device Header (32 B) |  magic, version, revision, vol_count, CRC
              +----------------------+
Offset 0x020  | Volume 0 Hdr  (48 B) |  magic, vol_id, name, type, leb_count, CRC
              +----------------------+
Offset 0x050  | Volume 1 Hdr  (48 B) |
              +----------------------+
              |         ...          |  (up to CONFIG_UBI_MAX_NR_OF_VOLUMES)
              +----------------------+


Data PEB Layout (PEB N through PEB total-1):

Offset 0x000  +----------------------+
              | EC Header    (16 B)  |  magic, version, erase_counter, CRC
              +----------------------+
Offset 0x010  | VID Header   (32 B)  |  magic, vol_id, leb_num, sqnum, data_size, CRC
              +----------------------+
Offset 0x030  |                      |
              |     User Data        |  up to (erase_block_size - 48) bytes
              |                      |
              +----------------------+

When a data PEB is free (not assigned to any volume), its VID header area is erased (filled with the hardware-reported erased byte value). The EC header is always present on valid PEBs.


Header Structures

All headers are aligned to 16 bytes and protected by CRC-32/IEEE (crc32_ieee() from Zephyr’s <zephyr/sys/crc.h>). The CRC covers all fields except the hdr_crc field itself.

Erase Counter (EC) Header — 16 bytes

Present on every data PEB. Tracks how many times this block has been erased.

Offset  Size  Field
------  ----  -----
0x00    4     magic       (0x55424923)
0x04    1     version     (1)
0x05    3     padding
0x08    4     ec          erase counter value
0x0C    4     hdr_crc     CRC-32 of bytes 0x00..0x0B

Volume Identifier (VID) Header — 32 bytes

Present on data PEBs that are mapped to a volume. Links a PEB to a specific volume and LEB.

Offset  Size  Field
------  ----  -----
0x00    4     magic       (0x55424921)
0x04    1     version     (1)
0x05    3     padding
0x08    4     lnum        logical erase block number within the volume
0x0C    4     vol_id      volume identifier
0x10    8     sqnum       global sequence number (monotonically increasing)
0x18    4     data_size   size of user data in bytes
0x1C    4     hdr_crc     CRC-32 of bytes 0x00..0x1B

The sqnum field is critical for crash recovery. During the PEB scan at init, if two PEBs claim the same (vol_id, lnum) pair, the one with the higher sqnum wins.

Device Header — 32 bytes

Stored on all reserved PEBs (default: PEB 0 and PEB 1). Describes the overall UBI device.

Offset  Size  Field
------  ----  -----
0x00    4     magic       (0x55424925)
0x04    1     version     (1)
0x05    3     padding
0x08    4     offset      offset of the first volume header
0x0C    4     size        device size
0x10    4     revision    header revision counter (incremented on each metadata update)
0x14    4     vol_count   number of volumes
0x18    4     vol_id_watermark  monotonic volume ID counter (never reused)
0x1C    4     hdr_crc     CRC-32 of bytes 0x00..0x1B

Volume Header — 48 bytes

One per volume, stored sequentially after the device header on the reserved PEBs.

Offset  Size  Field
------  ----  -----
0x00    4     magic       (0x55424926)
0x04    1     version     (1)
0x05    1     vol_type    0 = static, 1 = dynamic
0x06    2     padding
0x08    4     vol_id      unique volume identifier
0x0C    4     leb_count   number of LEBs allocated to this volume
0x10    12    padding
0x1C    16    name        null-terminated volume name (max 16 bytes including '\0')
0x2C    4     hdr_crc     CRC-32 of bytes 0x00..0x2B

In-RAM Data Structures

When ubi_device_init() runs, it scans the flash and builds an in-RAM cache of PEB states. This cache is the heart of UBI — all runtime decisions (which PEB to write to, which blocks are dirty, etc.) are made from these structures without re-reading flash.

Overview

struct ubi_device (128 B)
|
|-- mutex                       Zephyr mutex for thread safety
|-- mtd                         Flash partition config (partition_id, block sizes)
|
|-- free_pebs (Red-Black Tree, keyed by erase counter)
|   |
|   |   Holds PEBs that are erased and available for new writes.
|   |   The minimum node (lowest EC) is selected for writes (wear-leveling).
|   |
|   |       ec:3         Nodes are struct ubi_rbt_item {
|   |      /    \            .key   = erase_counter,
|   |   ec:1   ec:7         .value.pnum = PEB index
|   |          /    \    }
|   |       ec:5  ec:12
|   |
|   `-- Each node points to a physical PEB on flash:
|           ec:1 --> PEB 5  [EC hdr: ec=1 | VID: 0xFF (empty) | ...]
|           ec:3 --> PEB 8  [EC hdr: ec=3 | VID: 0xFF (empty) | ...]
|           ec:5 --> PEB 14 [EC hdr: ec=5 | VID: 0xFF (empty) | ...]
|
|-- dirty_pebs (Red-Black Tree, keyed by erase counter)
|   |
|   |   Holds PEBs that contain stale data and need erasure before reuse.
|   |   Populated when a LEB is overwritten or unmapped.
|   |
|   |       ec:4
|   |      /    \
|   |   ec:2   ec:9
|   |
|   `-- Each node points to a PEB with outdated data:
|           ec:2 --> PEB 3  [EC hdr: ec=2 | VID: old data | ...]
|           ec:4 --> PEB 11 [EC hdr: ec=4 | VID: old data | ...]
|
|-- bad_pebs (Singly-Linked List)
|   |
|   |   Holds PEBs with I/O errors (invalid EC headers, failed erases/writes).
|   |   Entries are struct ubi_list_item { .pnum, .erase_count }
|   |
|   `-- [PEB 22, ec:~7] --> [PEB 45, ec:~3] --> NULL
|
|       NOTE: Bad block list is NOT persisted to flash.
|             It is lost on reboot and rebuilt during the next init scan.
|
|-- vols (Red-Black Tree, keyed by volume ID)
|   |
|   |   Maps volume IDs to struct ubi_volume pointers.
|   |
|   |     vol_id:0             Nodes are struct ubi_rbt_item {
|   |      /     \                 .key   = volume_id,
|   |  vol_id:1  vol_id:5         .value.vol = &ubi_volume
|   |                          }
|   |
|   `-- Each ubi_volume (44 B) contains:
|
|       struct ubi_volume
|       |-- vol_id          Unique volume identifier
|       |-- cfg             { name[16], type (static|dynamic), leb_count }
|       |-- eba_tbl_count   Number of mapped LEBs
|       `-- eba_tbl (Red-Black Tree, keyed by LEB number)
|           |
|           |   Per-volume mapping from logical to physical blocks.
|           |
|           |     leb:2             Nodes are struct ubi_rbt_item {
|           |    /     \                .key   = LEB_number,
|           | leb:0   leb:5            .value.pnum = PEB_index
|           |                      }
|           |
|           `-- Each node points to the PEB holding that LEB's data:
|                   leb:0 --> PEB 7  [EC hdr | VID: vol=0,leb=0,sq=42 | payload]
|                   leb:2 --> PEB 19 [EC hdr | VID: vol=0,leb=2,sq=50 | payload]
|                   leb:5 --> PEB 31 [EC hdr | VID: vol=0,leb=5,sq=55 | payload]
|
`-- global_sqnum            Monotonically increasing sequence number for writes
`-- vol_id_watermark        Monotonic volume ID counter (mirrors dev_hdr.vol_id_watermark)

How the Structures Relate to Flash

Every PEB on flash is tracked by exactly one of these structures at any time:

                          +------------------+
                          |   Physical Flash |
                          +------------------+
                          | PEB 0  (reserved)|----> Device + Volume headers (Bank 1)  \
                          | PEB 1  (reserved)|----> Device + Volume headers (Bank 2)   > N reserved
                          |  ...  (if N > 2) |----> Cold spares                       /
                          |------------------|
  free_pebs RBT --------->| PEB N  (free)    |  EC hdr present, VID = 0xFF
  free_pebs RBT --------->| PEB N+1 (free)   |  EC hdr present, VID = 0xFF
                          |------------------|
  vol[0].eba_tbl -------->| PEB 4  (vol0/L0) |  EC hdr + VID(vol=0,leb=0) + data
  vol[0].eba_tbl -------->| PEB 5  (vol0/L1) |  EC hdr + VID(vol=0,leb=1) + data
                          |------------------|
  vol[1].eba_tbl -------->| PEB 6  (vol1/L0) |  EC hdr + VID(vol=1,leb=0) + data
                          |------------------|
  dirty_pebs RBT -------->| PEB 7  (dirty)   |  EC hdr + VID (stale data)
                          |------------------|
  bad_pebs list --------->| PEB 8  (bad)     |  Unreadable or failed I/O
                          +------------------+

  Rule: PEB 0..N-1 are always reserved (N = CONFIG_UBI_DEV_HDR_NR_OF_RES_PEBS).
        Every other PEB is in exactly ONE of:
        - free_pebs      (erased, ready for use)
        - Some volume's eba_tbl  (in use, holds live data)
        - dirty_pebs     (contains stale data, awaiting erasure)
        - bad_pebs       (defective, excluded from use)

Memory Usage

Structure

Size per entry

Allocated via

ubi_device

136 B

ubi_mem_device_alloc → device slab (static) / k_malloc (heap)

ubi_volume

44 B

ubi_mem_volume_alloc → volume slab (static) / k_malloc (heap)

ubi_rbt_item

16 B

ubi_mem_leaf_alloc → leaf slab (static) / k_malloc (heap)

ubi_list_item

12 B

ubi_mem_leaf_alloc → leaf slab (static) / k_malloc (heap)

Under the static backend (CONFIG_UBI_MEM_BACKEND_STATIC, default), all pools are pre-allocated at compile time. Under the heap backend, allocations are dynamic. See Configuration — Memory Sizing Guide for pool sizing details.

Memory Backends

All UBI runtime allocations route through the ubi_mem abstraction layer (lib/src/ubi_mem.h), which supports two backends selected via Kconfig:

ubi_mem (CONFIG_UBI_MEM_BACKEND_STATIC)
|
|-- device_slab    [K_MEM_SLAB: D blocks of sizeof(ubi_device)]
|-- volume_slab    [K_MEM_SLAB: D×V blocks of sizeof(ubi_volume)]
|-- leaf_slab      [K_MEM_SLAB: D×(P+V) blocks of sizeof(ubi_leaf_item)]
`-- scratch_slab   [K_MEM_SLAB: 1 block of DEV_HDR_SIZE + V×VOL_HDR_SIZE]

    D = CONFIG_UBI_MAX_NR_OF_DEVICES
    V = CONFIG_UBI_MAX_NR_OF_VOLUMES
    P = CONFIG_UBI_MAX_NR_OF_DATA_PEBS

ubi_rbt_item (16 B) and ubi_list_item (12 B) share 16-byte blocks via union ubi_leaf_item. PEB state transitions (dirty→bad, bad→free, mapped→bad) retype items in-place rather than freeing and re-allocating, eliminating allocation failures on critical error paths.

When the static backend is used, ubi_device_init() validates that the flash geometry fits within the configured pool limits before scanning PEBs.


PEB Lifecycle

A Physical Erase Block moves through the following states during normal operation:

        stateDiagram-v2
    [*] --> Free : ubi_device_init() (fresh flash)
    Free --> Allocated : leb_write() / leb_map()
    Allocated --> Dirty : leb_write() (overwrite) / leb_unmap()
    Dirty --> Free : ubi_device_erase_peb() (ec += 1)
    Free --> Bad : I/O error
    Allocated --> Bad : I/O error
    Dirty --> Bad : I/O error
    Bad --> Free : Torture recovery (rare)
    

Detailed ASCII reference:

                          +-------+
           ubi_device_    |       |   ubi_device_init()
           erase_peb() -->| FREE  |<-- (fresh flash: all PEBs start here)
           (ec += 1)      |       |    
                          +---+---+
                              |
                              | leb_write() or leb_map()
                              | (rb_get_min selects lowest EC)
                              v
                        +-----------+
                        |           |
                        | ALLOCATED |   In a volume's eba_tbl
                        | (in use)  |   VID header links to vol_id + leb_num
                        |           |
                        +-----+-----+
                              |
                              | leb_write() (overwrite) or leb_unmap()
                              | Old PEB moved to dirty_pebs
                              v
                          +-------+
                          |       |
                          | DIRTY |   Stale data, awaiting erasure
                          |       |
                          +---+---+
                              |
                              | ubi_device_erase_peb()
                              | (erase flash, increment EC, write new EC hdr)
                              v
                          +-------+
                          | FREE  |   Back in free_pebs, ready for reuse
                          +-------+

  At ANY point, if a flash I/O operation fails:

                          +-------+
              I/O error   |       |
           ------------>  |  BAD  |   Moved to bad_pebs linked list
                          |       |   Excluded from all future operations
                          +-------+

Device Initialization

ubi_device_init() is the most complex function in UBI. It handles two fundamentally different scenarios: initializing a brand-new (never-used) flash device, and re-mounting an existing device after a reboot.

Flow Overview

ubi_device_init(mtd, NULL, &ubi)
        |
        v
  Allocate ubi_device, init mutex, init RBTs
        |
        v
  Check: is device mounted?
  (read reserved PEBs 0..N-1, look for valid device headers)
        |
        +--- NO (fresh flash) -------> Phase 0: First-Time Mount
        |                                  |
        +--- YES (reboot) --+              |
        |                   |              v
        |                   |     Write device header to reserved PEBs
        |                   |     Erase data PEBs N..total-1
        |                   |     Write EC headers (ec=0) to each
        |                   |              |
        v                   v              |
  +--------------------------------------------+
  | Phase 1: Read Device Header                |
  |   Read device header from reserved PEBs    |
  |   For each volume in vol_count:            |
  |     Read volume header                     |
  |     Allocate ubi_volume + ubi_rbt_item     |
  |     Insert into vols RBT                   |
  +--------------------------------------------+
                    |
                    v
  +--------------------------------------------+
  | Phase 2: Compute Average Erase Count       |
  |   Scan PEBs N..total-1                     |
  |   Read EC headers, sum valid erase counts  |
  |   ec_avg = ec_sum / ec_count               |
  |   (Used as fallback EC for bad blocks)     |
  +--------------------------------------------+
                    |
                    v
  +--------------------------------------------+
  | Phase 3: PEB Scan & Classification         |
  |   For each PEB from N to total-1:          |
  |                                            |
  |   3.1  EC header invalid?                  |
  |         --> bad_pebs (ec = ec_avg)         |
  |                                            |
  |   3.2  EC valid, VID erased (empty)?       |
  |         Probe data area prefix:            |
  |         - prefix erased → free_pebs        |
  |         - prefix non-erased → dirty_pebs   |
  |           (uncommitted write)              |
  |                                            |
  |   3.3  EC valid, VID invalid CRC?          |
  |         --> bad_pebs (ec from EC hdr)      |
  |                                            |
  |   3.4  EC valid, VID valid:                |
  |     3.4.1  Track max sqnum for global_seqnr|
  |     3.4.2  Volume not found in vols RBT?   |
  |             --> dirty_pebs (orphaned)      |
  |     3.4.3  LEB >= vol.leb_count?           |
  |             --> dirty_pebs (out of range)  |
  |     3.4.4  LEB not in vol.eba_tbl?         |
  |             --> insert into vol.eba_tbl    |
  |     3.4.5  LEB already in vol.eba_tbl?     |
  |             Compare sqnum:                 |
  |             - new < existing: new-->dirty  |
  |             - new > existing: old-->dirty, |
  |               new replaces in eba_tbl      |
  +--------------------------------------------+
                    |
                    v
            Return ubi_device*

First-Time Mount vs. Reboot

Aspect

First-Time Mount

Reboot (Re-mount)

Device header on reserved PEBs

Not present

Already written

Phase 0

Erase all data PEBs, write EC headers with ec=0

Skipped entirely

Phase 1–3

Runs (all PEBs will be free)

Runs (reconstructs volumes from existing data)

Volume data

None — empty EBA tables

Reconstructed from VID headers on flash

Dirty PEBs

None

May exist from incomplete writes before reboot

Bad PEBs

Detected from Phase 3 scan

Detected fresh (previous list was in RAM only)

Sequence Number Conflict Resolution

When two PEBs claim the same (vol_id, leb_num) pair (e.g., a write was interrupted and both the old and new PEB survive), UBI resolves the conflict using the sqnum field in the VID header:

  • The PEB with the higher sqnum is the newer write and is kept in the EBA table.

  • The PEB with the lower sqnum is moved to dirty_pebs for later erasure.

This ensures that even after an unexpected power loss, the most recent successful write survives.


Erased-State Detection

UBI does not assume that erased flash reads as 0xFF. The erased byte value is queried at runtime via Zephyr’s flash_area_erased_val() API. Two internal helpers abstract all erased-state checks:

  • ubi_get_erased_val(mtd, &val) — queries the hardware-reported erased byte value for the partition, once.

  • ubi_buf_is_erased(buf, len, val) — returns true if every byte in buf equals val.

During PEB scan, the erased value is obtained once and passed to all classification helpers. Reserved PEB scan likewise derives the erased magic pattern from the actual erased byte value.


Thread Safety

Since v0.5.0, all public API functions acquire a per-device Zephyr mutex (struct k_mutex) before accessing any shared state. This means:

  • Multiple threads can safely call UBI functions on the same device concurrently.

  • The mutex provides mutual exclusion (one thread at a time), not read-write differentiation.

  • The mutex is initialized in ubi_device_init() and held for the duration of each API call.

  • Callers do not need to provide their own locking.

Single Handle Per Partition

Only one struct ubi_device * handle may be active per flash partition at any time. ubi_device_init() returns -EBUSY if a handle for the given partition_id already exists. The guard is released when ubi_device_deinit() completes.

Deinit Contract

ubi_device_deinit() acquires the device mutex before freeing resources. Any in-flight operations that already hold the mutex will complete before teardown proceeds. The caller must ensure that no other thread will start new operations after calling deinit.


Wear-Leveling

UBI implements a greedy minimum-erase-count wear-leveling strategy.

Write Path

When writing to a LEB, UBI always selects the free PEB with the lowest erase counter:

struct rbnode *min = rb_get_min(&ubi->free_pebs);

Since free_pebs is a red-black tree keyed by erase count, rb_get_min() returns the least-worn block in O(log n) time.

Erase Path

When erasing dirty PEBs, UBI also processes the one with the lowest erase counter first:

struct rbnode *min = rb_get_min(&ubi->dirty_pebs);

After erasing, the PEB’s erase counter is incremented and it is moved back to free_pebs.

Effect

This two-sided greedy approach naturally distributes wear across all PEBs:

  • Least-worn blocks are consumed first for writes, giving them more cycles.

  • Least-worn dirty blocks are recycled first, keeping the counter distribution tight.

  • Over time, all PEBs converge toward a similar erase count.

Write Flow (Mermaid)

Copy-on-write: the new PEB is fully written before the old mapping is swapped. On write failure, the previous mapping and data remain intact. The write order is EC → DATA → VID; the VID header acts as the commit point that makes the new mapping visible.

        flowchart TD
    Start["ubi_leb_write(vol_id, lnum, buf, len)"]
    Lookup["Look up LEB in volume EBA table"]
    SelectFree["Select free PEB with lowest EC\n(rb_get_min on free_pebs)"]
    NoFree{"Free PEB available?"}
    ErrNospc["Return -ENOSPC"]
    WriteEC["Write EC header on new PEB"]
    WriteData["Write user data payload"]
    WriteVID["Write VID header\n(vol_id, lnum, sqnum++, data_size)\n— commit point —"]
    WriteFail{"Write succeeded?"}
    MarkBad["Mark new PEB as bad\nRetry with next free PEB"]
    SwapEBA["Swap EBA: LEB → new PEB"]
    WasOverwrite{"Was overwrite?"}
    OldDirty["Move old PEB to dirty_pebs"]
    Done["Return 0"]

    Start --> Lookup --> SelectFree
    SelectFree --> NoFree
    NoFree -- No --> ErrNospc
    NoFree -- Yes --> WriteEC --> WriteData --> WriteVID --> WriteFail
    WriteFail -- No --> MarkBad --> SelectFree
    WriteFail -- Yes --> SwapEBA --> WasOverwrite
    WasOverwrite -- Yes --> OldDirty --> Done
    WasOverwrite -- No --> Done
    

Read Flow (Mermaid)

        flowchart TD
    Start["ubi_leb_read(vol_id, lnum, offset, buf, len)"]
    FindVol["Find volume in vols RBT"]
    FindLEB["Look up LEB in volume EBA table"]
    IsMapped{"LEB mapped?"}
    ErrInval["Return -EINVAL"]
    ReadFlash["Read from PEB at data offset + user offset"]
    Done["Return 0"]

    Start --> FindVol --> FindLEB --> IsMapped
    IsMapped -- No --> ErrInval
    IsMapped -- Yes --> ReadFlash --> Done
    

Erase / Reclaim Flow (Mermaid)

        flowchart TD
    Start["ubi_device_erase_peb()"]
    HasDirty{"dirty_pebs non-empty?"}
    NoDirty["Return 0 (nothing to reclaim)"]
    SelectMin["Select dirty PEB with lowest EC\n(rb_get_min on dirty_pebs)"]
    Erase["Erase PEB on flash"]
    EraseFail{"Erase succeeded?"}
    MarkBad["Mark PEB as bad"]
    IncEC["Increment erase counter"]
    WriteEC["Write new EC header"]
    MoveToFree["Move PEB to free_pebs"]
    Done["Return 0"]

    Start --> HasDirty
    HasDirty -- No --> NoDirty
    HasDirty -- Yes --> SelectMin --> Erase --> EraseFail
    EraseFail -- No --> MarkBad --> Done
    EraseFail -- Yes --> IncEC --> WriteEC --> MoveToFree --> Done
    

Dual-Bank Mechanism

UBI stores device and volume metadata on reserved PEBs as mirrors. The number of reserved PEBs is configurable via CONFIG_UBI_DEV_HDR_NR_OF_RES_PEBS (default 2, range 2–4). Two PEBs are always kept active (containing identical copies); additional PEBs serve as cold spares that are promoted when an active PEB fails.

PEB Classification

State

Description

Active

Contains a valid device header (correct magic + CRC). Participates in dual-bank writes.

Spare

Erased/empty (hardware erased value). Never written until an active PEB fails.

Corrupt

Contains invalid data (bad magic or CRC). Candidate for in-place recovery or abandonment.

Write Sequence

When metadata changes (volume created, removed, or resized), UBI writes to both active PEBs sequentially:

1. Erase active reserved PEB (bank 1)
2. Write updated headers to active reserved PEB (bank 1)
3. Erase active reserved PEB (bank 2)
4. Write updated headers to active reserved PEB (bank 2)

If a write fails (dead PEB), UBI promotes a cold spare to replace it.

Init-Time Recovery

At ubi_device_init(), UBI scans all reserved PEBs (indices 0..N-1):

scan_reserved_pebs()
  |
  +-- All N PEBs valid?  --> Normal init (no recovery needed)
  |
  +-- >= 1 active + corrupt or spare PEBs?
  |     |
  |     +-- Read full content from active PEB (highest revision)
  |     +-- For each corrupt PEB: erase + write canonical data
  |     |     +-- Erase/write succeeds --> PEB recovered in-place
  |     |     +-- Erase/write fails ----> PEB is dead, promote spare
  |     +-- At least 2 active PEBs after recovery? --> Init succeeds
  |
  +-- 0 active PEBs?  --> Init fails (unrecoverable)

Runtime Recovery

Volume operations (ubi_vol_hdr_append, ubi_vol_hdr_remove, ubi_vol_hdr_update) call validate_reserved_pebs() before committing. If a degraded state is detected, recovery is attempted transparently.

Read-Only Degraded Mode

When only 1 active PEB remains and 0 spares are available, the system enters read-only degraded mode.

All public mutators pass through a central mutation gate (ubi_mutation_allowed() in ubi_internal.h) before performing any flash I/O. The gate classifies each operation into one of three mutation classes and applies the degraded-mode policy:

Mutation class

Operations

Degraded-mode policy

UBI_MUT_RESERVED_METADATA

ubi_volume_create, ubi_volume_resize, ubi_volume_remove

Blocked (-EROFS)

UBI_MUT_DATA_PATH

ubi_leb_write, ubi_leb_map, ubi_leb_unmap

Allowed

UBI_MUT_MAINTENANCE

ubi_device_erase_peb

Allowed

ubi_device_erase_peb() is intentionally allowed in degraded mode. After its normal dirty-PEB maintenance cycle, it attempts to recover the reserved PEB bank by calling ubi_dev_hdr_read(), which internally scans all reserved PEBs and attempts erase+rewrite of any corrupt copies. If recovery succeeds, the read_only_degraded flag is cleared and the device returns to normal operation. This allows self-healing without requiring a reboot — the application’s regular garbage-collection loop serves as the recovery trigger.

Read-only operations are not gated and always succeed:

Operation

Degraded mode behavior

ubi_leb_read

Works normally

ubi_leb_is_mapped

Works normally

ubi_leb_get_size

Works normally

ubi_device_get_info

Works normally (read_only_degraded = true)

ubi_volume_get_info

Works normally

State Summary

Active PEBs

Spares

State

Can update metadata?

2

N−2

Healthy

Yes

1

≥1

Degraded

Yes (spare promoted during recovery)

1

0

Critical

No — read-only mode

0

any

Dead

No — cannot init

PEB State Transitions

+-------------------+
|   SPARE (empty)   |
|   erased          |
+--------+----------+
         |
         | (promoted during recovery
         |  or overwrite when active fails)
         v
+-------------------+    power loss / bit rot     +-------------------+
|      ACTIVE       | --------------------------> |     CORRUPT       |
|  valid dev hdr +  |                             | bad magic or CRC  |
|  valid vol hdrs   |                             |                   |
+--------+----------+                             +--------+----------+
         ^                                                 |
         |        erase + write canonical content          |
         +<------------------------------------------------+
         |        (in-place recovery from other active)
         |
         +-- erase/write fails --> PEB is DEAD (stays corrupt)

Volume Management

Volume Types

Type

Enum

Description

Static

UBI_VOLUME_TYPE_STATIC (0)

Fixed content. Cannot be resized after creation.

Dynamic

UBI_VOLUME_TYPE_DYNAMIC (1)

Content can change. Supports runtime resizing.

Create

ubi_volume_create() reads the persisted vol_id_watermark from the device header, assigns it as the new volume’s ID, bumps the watermark, and writes the updated device header plus new volume header to both active reserved PEBs atomically. The watermark is monotonic — IDs are never reused, even after volume removal. If vol_id_watermark reaches UINT32_MAX, create returns -ENOSPC. The volume is then added to the in-RAM vols RBT. The PEBs for the volume are not pre-allocated — they are claimed from free_pebs on-demand when LEBs are written or mapped.

If a volume with the same name and identical configuration (type, leb_count) already exists, the function returns successfully with the existing volume’s ID (idempotent behavior). If a volume with the same name but different configuration exists, the function returns -EEXIST. Volume creation is transactional: RAM structures are allocated before the flash commit, so a failed create leaves no persistent metadata.

Resize

ubi_volume_resize() is only supported for dynamic volumes and rejects leb_count == 0. It updates the leb_count in the volume header on both active reserved PEBs and adjusts the in-RAM configuration. Shrink is transactional: the flash metadata update commits before trimming EBA entries and reclaiming PEBs to dirty. Grow checks capacity accounting (bad_peb_count subtracted from usable PEBs).

Remove

ubi_volume_remove() removes the volume header from the active reserved PEBs, then reclaims mapped PEBs to dirty_pebs and frees in-RAM structures. Reclaim and index cleanup after a successful metadata remove are best-effort — errors are logged but the operation returns success once the flash metadata is gone.