vllm.v1.worker.gpu.mm.encoder_cudagraph_defs ¶

Data transfer objects for encoder CUDA graph management.

EncoderCudaGraphCaptureInputs `dataclass` ¶

Everything needed for one CUDA graph capture.

Returned by prepare_encoder_cudagraph_capture_inputs().

Source code in vllm/v1/worker/gpu/mm/encoder_cudagraph_defs.py

@dataclass
class EncoderCudaGraphCaptureInputs:
    """Everything needed for one CUDA graph capture.

    Returned by ``prepare_encoder_cudagraph_capture_inputs()``.
    """

    mm_kwargs: dict[str, Any]
    """Dummy forward inputs (model-specific keys).
    For Qwen3-VL this contains pixel_values and grid_thw."""

    buffers: dict[str, torch.Tensor]
    """Precomputed tensor buffers that will be recorded into the
    CUDA graph.  The manager stores references to these exact
    tensor objects and copies new data into them before each
    ``graph.replay()`` call (buffer identity invariant)."""

buffers `instance-attribute` ¶

buffers: dict[str, Tensor]

Precomputed tensor buffers that will be recorded into the CUDA graph. The manager stores references to these exact tensor objects and copies new data into them before each graph.replay() call (buffer identity invariant).

mm_kwargs `instance-attribute` ¶

mm_kwargs: dict[str, Any]

Dummy forward inputs (model-specific keys). For Qwen3-VL this contains pixel_values and grid_thw.

EncoderCudaGraphConfig `dataclass` ¶

Configuration for encoder CUDA graph management.

Provided by the model at init time via get_encoder_cudagraph_config(). Values are fixed for the lifetime of the manager.

Source code in vllm/v1/worker/gpu/mm/encoder_cudagraph_defs.py

@dataclass
class EncoderCudaGraphConfig:
    """Configuration for encoder CUDA graph management.

    Provided by the model at init time via
    ``get_encoder_cudagraph_config()``. Values are fixed for the
    lifetime of the manager.
    """

    modalities: list[str]
    """Supported modalities (e.g. ["image"])."""

    input_key: str
    """Key in mm_kwargs for the input tensor (e.g. "pixel_values")."""

    buffer_keys: list[str]
    """Keys for the tensor buffers recorded into the CUDA graph.
    Before replay the manager zeros then slice-copies new data
    into these buffers."""

    out_hidden_size: int
    """Output hidden dim of the vision encoder.
    Used for DP gather buffer allocation."""

buffer_keys `instance-attribute` ¶

buffer_keys: list[str]

Keys for the tensor buffers recorded into the CUDA graph. Before replay the manager zeros then slice-copies new data into these buffers.

input_key `instance-attribute` ¶

input_key: str

Key in mm_kwargs for the input tensor (e.g. "pixel_values").

modalities `instance-attribute` ¶

modalities: list[str]

Supported modalities (e.g. ["image"]).

out_hidden_size `instance-attribute` ¶

out_hidden_size: int

Output hidden dim of the vision encoder. Used for DP gather buffer allocation.

EncoderCudaGraphReplayBuffers `dataclass` ¶

New buffer values for graph replay, computed by the model from actual batch inputs.

Returned by prepare_encoder_cudagraph_replay_buffers(). Keys match EncoderCudaGraphConfig.buffer_keys.

Source code in vllm/v1/worker/gpu/mm/encoder_cudagraph_defs.py

@dataclass
class EncoderCudaGraphReplayBuffers:
    """New buffer values for graph replay, computed by the model from
    actual batch inputs.

    Returned by ``prepare_encoder_cudagraph_replay_buffers()``.
    Keys match ``EncoderCudaGraphConfig.buffer_keys``.
    """

    buffers: dict[str, torch.Tensor | None]
    """Data to copy into the captured buffers before replay.
    ``None`` values leave the corresponding captured buffer
    unchanged."""

buffers `instance-attribute` ¶

buffers: dict[str, Tensor | None]

Data to copy into the captured buffers before replay. None values leave the corresponding captured buffer unchanged.

vllm.v1.worker.gpu.mm.encoder_cudagraph_defs ¶

EncoderCudaGraphCaptureInputs dataclass ¶

buffers instance-attribute ¶

mm_kwargs instance-attribute ¶

EncoderCudaGraphConfig dataclass ¶

buffer_keys instance-attribute ¶

input_key instance-attribute ¶

modalities instance-attribute ¶

out_hidden_size instance-attribute ¶

EncoderCudaGraphReplayBuffers dataclass ¶

buffers instance-attribute ¶

EncoderCudaGraphCaptureInputs `dataclass` ¶

buffers `instance-attribute` ¶

mm_kwargs `instance-attribute` ¶

EncoderCudaGraphConfig `dataclass` ¶

buffer_keys `instance-attribute` ¶

input_key `instance-attribute` ¶

modalities `instance-attribute` ¶

out_hidden_size `instance-attribute` ¶

EncoderCudaGraphReplayBuffers `dataclass` ¶

buffers `instance-attribute` ¶