Nick Hill
2ac85a4544
[BugFix] Fix logprobs with spec decode and modified logits ( #30846 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-18 19:58:28 -08:00
realliujiaxu
d2c919dcc2
[bugfix] fix bug when top_logprobs=0 with spec decoding ( #30059 )
...
Signed-off-by: realliujiaxu <realliujiaxu@163.com>
2025-12-12 09:03:35 -08:00
jthomson04
1528e079e2
[Perf] Avoid pageable HtoD transfer in MinTokensLogitsProcessor ( #29826 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-12-02 21:25:52 +00:00
Cyrus Leung
9e6bcda3ac
[mypy] Enable type checking for more directories ( #29674 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-28 08:39:27 -08:00
Harry Mellor
9eec282cb5
Guard FlashInfer sampler using the same check as FlashInfer attention backend ( #29415 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-28 08:34:48 -08:00
Didier Durand
66d3d5422c
[Doc]: fixing typos in diverse files ( #29492 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-27 07:15:50 -08:00
Nick Hill
4e57c6587f
[Core] Support logprobs with spec decode + async scheduling ( #29223 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-25 12:55:24 -08:00
Nick Hill
d44a63c6d6
[BugFix] Fix returned logprobs with spec decode + prefill chunking ( #29216 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-22 22:41:25 +08:00
Jialin Ouyang
30b9c67743
Revert "[Redo] #26368 ( #28771 )" ( #29121 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-11-20 21:27:45 -08:00
Xiao Li
ed6ae1e36a
[AITER] [ROCm] Fix crash when loading llama4 model with old aiter version installed, fallback to forward_native implementation ( #29124 )
...
Signed-off-by: Xiao Li <ilx@meta.com>
2025-11-20 17:54:35 -08:00
vllmellm
0af3d4f0df
[FEAT] [AITER] [ROCm] integrate aiter sampling ops ( #26084 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-18 17:28:34 +00:00
Ronald
d8874c61a5
[Core] Async Scheduling X Spec Decoding Compatibility ( #24799 )
...
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
2025-11-17 12:16:20 -08:00
Cyrus Leung
98b4d389ed
[Redo] #26368 ( #28771 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-11-14 22:47:41 -08:00
Nick Hill
ac86bff8cb
Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… ( #28773 )
2025-11-14 20:24:00 -08:00
Jialin Ouyang
186352b270
[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization ( #26368 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-11-14 16:04:04 -08:00
Isotr0py
3f770f4427
[Performance] Cache loaded custom logitsprocs to avoid overheads ( #28462 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-11 16:49:29 -08:00
Zhang Xiangze
7bdb42b2f2
[CPU]Avoid repeated random sample compile ( #28260 )
...
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>
2025-11-07 11:03:57 +00:00
Isotr0py
3f5a4b6473
[Bugfix] Validate custom logits processor xargs for online serving ( #27560 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-05 16:53:33 +00:00
Nick Hill
c2ed069b32
[BugFix] Fix mixed penalties batch with async scheduling ( #27910 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-01 10:51:24 -07:00
Wentao Ye
52efc34ebf
[Log] Optimize Startup Log ( #26740 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-24 19:27:04 -04:00
Jonathan Chen
ca76486a16
[Chore] Separate out vllm.utils.platform_utils.py ( #27374 )
...
Signed-off-by: Jonathan <chenleejonathan@gmail.com>
2025-10-23 19:08:06 +00:00
Giancarlo Delfin
6644796bf4
[V1][spec decode] return logprobs for spec decoding ( #26060 )
...
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-10-22 22:59:59 -07:00
Isotr0py
6ac5e06f7c
[Chore] Clean up pytorch helper functions in vllm.utils ( #26908 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
2025-10-18 09:48:22 -07:00
Pradyun92
acedc74b1a
[V1][Spec Decode] Fix greedy temperature detection after sampler refactor ( #27077 )
...
Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com>
Co-authored-by: Pradyun Ramadorai <pradyunr@amazon.com>
2025-10-17 13:27:47 -07:00
Jee Jee Li
fec2b341ad
[Kernel] Lazy import FlashInfer ( #26977 )
2025-10-17 04:48:18 +00:00
Akash kaothalkar
f7d318de2b
[Hardware][CPU][PowerPC]Disable torch.compile() in toptopk sampling ( #26987 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
2025-10-15 22:36:59 -07:00
Michael Goin
e66d787bce
Disable FlashInfer sampler by default ( #26859 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-10-15 02:35:18 +00:00
ihb2032
4a61950f4d
[Hardware][CPU] Disable torch.compile for RISC-V to prevent APIError ( #26693 )
...
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: ihb2032 <1355790728@qq.com>
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn
2025-10-13 07:56:01 -07:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y ( #26633 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Nick Hill
ddcbc2f334
[Misc] Misc code simplifications ( #26450 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-09 02:10:06 -07:00
Harry Mellor
e09d1753ec
Remove Python 3.9 support ahead of PyTorch 2.9 in v0.11.1 ( #26416 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-08 10:40:42 -07:00
Harry Mellor
2f99f2f506
Tidy vllm/config/__init__.py to only add classes and functions ( #26405 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-08 07:10:00 -07:00
Sergei Skvortsov
6ebaf43ee4
[V1] Logit processors for rejection sampler ( #19482 )
...
Signed-off-by: southfreebird <yvorott@gmail.com>
Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com>
Signed-off-by: Sergei Skvortsov <yvorott@gmail.com>
Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-10-07 13:02:49 -07:00
Harry Mellor
b893d661b1
Fix per file ruff ignores related to simplification ( #26259 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 20:31:53 +00:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort ( #26247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Corey Lowman
0879736aab
[Perf] Remove hardcoded num_warps=1 ( #26183 )
...
Signed-off-by: Corey Lowman <clowman1993@gmail.com>
2025-10-03 20:38:50 +00:00
Ekagra Ranjan
e71b8e210d
[Spec Decode] Add Batch Parallel Ngram. Upto 8x lower overhead. ( #24986 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-09-25 15:22:03 -07:00
Russell Bryant
532a6cfccb
[ux] Switch a warning to debug about a pytorch fallback ( #23750 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-09-25 14:38:16 +00:00
Li, Jiang
eb32335e35
[CPU] update torch 2.8 and fix missing fields in TorchSDPAMetadata ( #25652 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-25 13:29:11 +00:00
courage17340
a676e668ee
[Bugfix] fix apply_temperature to avoid nan in probs ( #24734 )
...
Signed-off-by: courage17340 <courage17340@163.com>
2025-09-25 05:32:21 +00:00
Wenlong Wang
032d661d27
[Docs] Fix warnings in mkdocs build (continued) ( #25042 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
2025-09-20 11:45:18 +00:00
Harry Mellor
aed16879a9
Move ModelConfig from config/__init__.py to config/model.py ( #25252 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-19 16:22:33 +00:00
Andrew Sansom
9a4600e4dc
[CORE] Prompt Embeddings Support for v1 Engine ( #24278 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <qthequartermasterman@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-09-19 08:03:09 +08:00
afeldman-nm
7ae9887542
[V1] Logits processor docs ( #22919 )
...
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
Co-authored-by: Joseph Marinier <Joseph.Marinier@gmail.com>
2025-09-17 11:53:12 -07:00
co63oc
3144d90217
fix some typos ( #24167 )
...
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-09-10 06:21:23 -07:00
Woosuk Kwon
105d3d62ef
[TPU] Remove TopKTopPSampler dependency for TPU sampler ( #24391 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-07 01:12:36 -07:00
afeldman-nm
136d853e65
[V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing ( #23656 )
...
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
2025-09-03 02:52:51 +00:00
Jingkai He
57d4ede520
[bugfix] [spec-decoding] fix data race in sample_recovered_tokens_kernel (vLLM v1) ( #23829 )
...
Signed-off-by: He-Jingkai <he-jingkai@outlook.com>
2025-08-28 19:05:20 +00:00
Woosuk Kwon
a3432f18fd
[BugFix][Spec Decode] Use float64 for uniform_probs ( #23803 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-28 12:26:45 +00:00
Hyogeun Oh (오효근)
4e4d017b6f
[Docs] Fix warnings in mkdocs build (continued) ( #23743 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com>
Signed-off-by: Hyogeun Oh (오효근) <ohg3417@gmail.com>
2025-08-27 17:17:29 +00:00