xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-18 03:37:09 +08:00

Author	SHA1	Message	Date
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
wang.yuqi	76852017ea	[MISC] Rename the torch profiler filename as instance_id+rank_id for merging the Profiler results of each Rank (#25867 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-10-12 09:29:08 +00:00
dsinghvi	727144bed1	[Refactor]: Use M-RoPE interface directly while defining model class instead of maintaining model specific M-RoPE implementation in mrope.py (#24172 ) Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com> Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: wwl2755 <wangwenlong2755@gmail.com>	2025-10-11 07:21:04 +00:00
Roger Wang	ddaff2938e	[MM] Move Qwen3Omni MRoPE impl to model file (#26608 ) Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-10 22:17:24 -07:00
Nick Hill	5bc26c438d	[BugFix] Make penalties and bad_words work with async scheduling (#26467 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-10 23:27:04 +00:00
Nick Hill	949cb0170d	[BugFix] Fix async scheduling + request preemption (#26385 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-10 20:29:57 +00:00
Vadim Gimpelson	e94cfd51da	[BUG] Qwen3-next MTP. Fix attn metadata build bug (#26564 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-10-10 14:59:03 -04:00
Mark McLoughlin	e519281920	[Metrics] Add test for multi-modal cache stats logging (#26588 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-10-10 16:00:50 +00:00
Sage Moore	ae9d0e7da5	[Bugfix] Make DP padding optional in coordinate_batch_across_dp (#26375 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-10-10 10:53:33 -04:00
Mark McLoughlin	784c231151	[NIXL] Ignore abort on already-finished request (#25067 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-10-10 12:21:56 +02:00
Chen Zhang	606b00e80f	[bugfix][DCP] fix block_size of hash in DCP prefix caching (#26296 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-10-10 03:02:49 -07:00
Cyrus Leung	ad430a67ca	[Metrics] Log multi-modal cache stats and fix reset (#26285 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-10 01:45:55 -07:00
Chen Zhang	6f0f570c43	[deepseek] kernel block size for UniformTypeKVCacheSpecs (#26559 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-10-10 16:40:41 +08:00
Lucas Wilkinson	29255cfc3b	[Spec-Decode] Support piecewise cudagraphs for Eagle head (#25109 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-10-10 01:20:31 -04:00
Nick Hill	aafb99a4d4	[Core] Small simplification in `GPUModelRunner._update_states()` (#26508 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-10 10:53:58 +08:00
Rui Qiao	757fa4a4da	[DP][ray] Support different VLLM_RAY_DP_PACK_STRATEGY (#23849 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-10-09 19:53:43 -07:00
Julien Denize	c6187f55f7	Refactor MistralTokenizer (#26358 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai>	2025-10-09 22:48:58 +00:00
Benjamin Chislett	6e783bc54b	[Bugfix] Fix CUDA graph selection bug in FlashInfer at high concurrency (#26499 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-10-09 17:12:34 -04:00
Nick Hill	2e54db4d2b	[Core] Remove unused `prev_sampled_token_ids_invalid_indices` input batch field (#26514 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-09 20:22:14 +00:00
Ming Yang	3b736e1c38	[Attention][DCP] Support DCP with query length > 1 (MTP) with FA3 (#25049 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-10-09 08:06:29 -07:00
Cyrus Leung	4bdf7ac593	[Bugfix] Fix SHM cache initialization (#26427 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-09 02:48:04 -07:00
Cyrus Leung	dc7976dd9f	[Misc] Upgrade more code to Python 3.10 (#26463 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-09 10:43:53 +01:00
Nick Hill	ddcbc2f334	[Misc] Misc code simplifications (#26450 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-09 02:10:06 -07:00
elvischenv	5e49c3e777	Bump Flashinfer to v0.4.0 (#26326 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-10-08 23:58:44 -07:00
Jee Jee Li	1b2c440cd6	[Core] Relax the LoRA max rank (#26461 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-10-08 23:47:14 -07:00
Zhiyuan Li	d24cf322e1	[Hybrid]: Decouple Kernel Block Size from KV Page Size (#24486 ) Signed-off-by: lizhiyuan <uniartisan2017@gmail.com> Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com>	2025-10-08 23:43:39 -07:00
Qier Li	d17f0fbf30	[Core][KVConnector] Propagate all tokens on resumed preemptions (#24926 ) Signed-off-by: Qier Li <kevin44036@gmail.com> Co-authored-by: Qier Li <qier@fb.com>	2025-10-09 14:43:31 +08:00
Nick Hill	bb6d8c21f9	[Bugfix] Catch and log invalid token ids in detokenizer #2 (#26445 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-08 21:20:25 -07:00
Naveenraj Kamalakannan	e614ab7806	Separate MLAAttention class from Attention (#25103 ) Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-08 17:11:11 -07:00
Matthew Bonanni	2a03f93de9	[Attention] Register FLASHMLA_SPARSE (#26441 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-08 22:28:52 +00:00
Elaine Zhao	f08919b7d1	[Bugfix] Respect min_tokens in scheduler stop check (#26317 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com>	2025-10-08 14:08:24 -07:00
elvischenv	b82f4307c9	[Bugfix][Flashinfer] fix VLLM_USE_TRTLLM_ATTENTION issue for models with diff hyperparameters (#25924 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-10-08 19:54:48 +00:00
Harry Mellor	e09d1753ec	Remove Python 3.9 support ahead of PyTorch 2.9 in v0.11.1 (#26416 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-08 10:40:42 -07:00
Harry Mellor	2f99f2f506	Tidy `vllm/config/__init__.py` to only add classes and functions (#26405 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-08 07:10:00 -07:00
Utkarsh Sharma	335b28f7d1	[TPU] Rename tpu_commons to tpu_inference (#26279 ) Signed-off-by: Utkarsh Sharma <utksharma@google.com> Co-authored-by: Utkarsh Sharma <utksharma@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>	2025-10-07 23:30:52 -07:00
Ayush Satyam	5e65d6b2ad	fix[DP][v1]: Prevent hangs from mismatched worker configurations (#26218 ) Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com>	2025-10-07 22:55:08 -07:00
Ayush Satyam	cd9890544b	fix(v1/kv_cache): resolve async KV transfer bug in cascade attention (#23485 ) Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com>	2025-10-08 04:46:33 +00:00
Nick Hill	067da2d1df	[Core] Simplify setting new_token_ids in CachedRequestData (#26388 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-08 03:32:37 +00:00
Lucas Wilkinson	f80e7866c0	[Misc] Clean up cruft from previous FlashMLA sparse implementation (#26125 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-10-08 10:09:34 +08:00
Thomas Parnell	31a4b3e6c4	Revert #24446 and #26168 (#26332 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-10-07 16:38:19 -06:00
Benjamin Chislett	3d1f67616d	[Spec Decode] Enable efficient speculative decoding with FlashInfer-MLA (#25984 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-10-07 16:05:59 -04:00
Sergei Skvortsov	6ebaf43ee4	[V1] Logit processors for rejection sampler (#19482 ) Signed-off-by: southfreebird <yvorott@gmail.com> Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com> Signed-off-by: Sergei Skvortsov <yvorott@gmail.com> Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-10-07 13:02:49 -07:00
Pei-Lun Liao	eb577e4655	[Bugfix] Add missing sink tensor into flash attn cascade attn implementation (#26325 )	2025-10-07 18:56:39 +00:00
Cyrus Leung	1e4ecca1d0	[V0 Deprecation] Remove `VLLM_USE_V1` from tests (#26341 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 15:42:31 +00:00
Grant Holmes (Ren)	d100d78eb3	Optimize KV cache distribution for asymmetric pipeline parallelism (#25164 ) Signed-off-by: gholmes829 <g.holmes429@gmail.com>	2025-10-07 09:20:30 +00:00
Sage Moore	2111b4643c	[Core] Simplify the Dp padding/should ubatch coordination logic (#25768 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-10-07 01:57:49 +00:00
Benjamin Chislett	f77df94647	[Perf] Add decode full-graph support to FlashInfer-MLA backend (#26313 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-10-06 23:03:49 +00:00
Gregory Shtrasberg	f231e5bc21	[ROCm] Split AITER unified attention into its own backend (#25507 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-10-06 22:49:23 +00:00
Varun Sundar Rabindranath	f23b4c04fd	[BugFix] Pad input buffers in _dummy_run (#26209 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-06 16:07:51 -04:00
7mile	b2ea5ba677	[Bugfix][Spec Decode] Fix wrong valid_mask for padded speculation when chunked prefill occurs (#26231 ) Signed-off-by: seven-mile <i@7li.moe> Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2025-10-06 18:24:22 +00:00

1 2 3 4 5 ...

1418 Commits