CVE-2025-46570

Published: May 29, 2025Modified: Jun 17, 2026

2.6

Vector

CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:L/I:N/A:N

Exploitability: 1.2 / Impact: 1.4

Source: security-advisories@github.com (Secondary)

Description

vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0.

Affected (1)

Products: Vllm: Vllm

1 product

Configuration A

1 vulnerable

Vulnerable Software	Affected Versions
Vllm Vllm	Before 0.9.0

Related CWEs

Observable Discrepancy

The product behaves differently or sends different responses under different circumstances in a way that is observable to an unauthorized actor, which exposes security-relevant information about the state of the product, such as whether a particular operation was successful or not.

Observable Timing Discrepancy

Two separate operations in a product require different amounts of time to complete, in a way that is observable to an actor and reveals security-relevant information about the state of the product, such as whether a particular operation was successful or not.

References (3)

https://github.com/vllm-project/vllm/commit/77073c77bc2006eb80ea6d5128f076f5e6c6f54f

Source: security-advisories@github.com

Patch

https://github.com/vllm-project/vllm/pull/17045

Source: security-advisories@github.com

Issue TrackingVendor Advisory

https://github.com/vllm-project/vllm/security/advisories/GHSA-4qjh-9fv9-r85r

Source: security-advisories@github.com

Vendor Advisory

Timeline

No history available yet.