vllm.v1.kv_offload.cpu.policies.abstract ¶
BlockStatus ¶
Bases: Structure
Offloading status for a single block of KV data. Holds the following information:
ref_cnt - the current number of transfers using this block as a source. A value of -1 indicates the block is not yet ready to be read. block_id - index of the physical CPU buffer slot.
Source code in vllm/v1/kv_offload/cpu/policies/abstract.py
CachePolicy ¶
Bases: ABC
Encapsulates both block organization (data structures) and replacement decisions (which block to evict). LRU and ARC differ in both dimensions — ARC's ghost lists and target_t1_size live at the intersection of storage and eviction, so they cannot be separated cleanly.
Source code in vllm/v1/kv_offload/cpu/policies/abstract.py
evict abstractmethod ¶
evict(
n: int, protected: set[BlockHash]
) -> list[tuple[BlockHash, BlockStatus]] | None
Evict exactly n blocks, skipping any in protected.
Returns a list of (block_hash, block) for the evicted blocks, or None if n evictions cannot be satisfied. The operation is atomic: if None is returned, no state changes are made.
For ARC: ghost list cleanup (trimming to cache_capacity) is performed at the end of a successful eviction.
Source code in vllm/v1/kv_offload/cpu/policies/abstract.py
get abstractmethod ¶
get(block_hash: BlockHash) -> BlockStatus | None
insert abstractmethod ¶
insert(block_hash: BlockHash, block: BlockStatus) -> None