vllm.v1.worker.gpu.mm.encoder_cudagraph_defs ¶
Data transfer objects for encoder CUDA graph management.
EncoderCudaGraphCaptureInputs dataclass ¶
Everything needed for one CUDA graph capture.
Returned by prepare_encoder_cudagraph_capture_inputs().
Source code in vllm/v1/worker/gpu/mm/encoder_cudagraph_defs.py
buffers instance-attribute ¶
Precomputed tensor buffers that will be recorded into the CUDA graph. The manager stores references to these exact tensor objects and copies new data into them before each graph.replay() call (buffer identity invariant).
EncoderCudaGraphConfig dataclass ¶
Configuration for encoder CUDA graph management.
Provided by the model at init time via get_encoder_cudagraph_config(). Values are fixed for the lifetime of the manager.
Source code in vllm/v1/worker/gpu/mm/encoder_cudagraph_defs.py
EncoderCudaGraphReplayBuffers dataclass ¶
New buffer values for graph replay, computed by the model from actual batch inputs.
Returned by prepare_encoder_cudagraph_replay_buffers(). Keys match EncoderCudaGraphConfig.buffer_keys.