xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-30 10:07:13 +08:00

Author	SHA1	Message	Date
Lucas Kabela	213b64452a	[Bugfix] Convert untraceable GroupShape to list for AMD impl (#26535 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2025-10-10 13:32:29 +00:00
Mark McLoughlin	784c231151	[NIXL] Ignore abort on already-finished request (#25067 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-10-10 12:21:56 +02:00
Chen Zhang	606b00e80f	[bugfix][DCP] fix block_size of hash in DCP prefix caching (#26296 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-10-10 03:02:49 -07:00
Chauncey	720d3cd0f0	[CI] fix ruff format (#26579 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-10-10 03:02:12 -07:00
Ashwin Phadke	ab196edefb	Remove LoRA bias support (#25807 ) Signed-off-by: Ashwin Phadke <ashwinphadke12@rediffmail.com> Signed-off-by: Ashwin Phadke <23502062+ashwin-phadke@users.noreply.github.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-10-10 09:50:33 +00:00
Luis Tomas Bolivar	3ee202ea1e	[GPT-OSS] Add support for arrays at tool message content (#25593 ) Signed-off-by: Luis Tomas Bolivar <ltomasbo@redhat.com>	2025-10-10 09:00:45 +00:00
Cyrus Leung	ad430a67ca	[Metrics] Log multi-modal cache stats and fix reset (#26285 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-10 01:45:55 -07:00
Chen Zhang	6f0f570c43	[deepseek] kernel block size for UniformTypeKVCacheSpecs (#26559 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-10-10 16:40:41 +08:00
Boyuan Feng	b545a0b207	fix test_simple_inductor_graph_partition (#26522 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-10-10 06:39:19 +00:00
Lucas Wilkinson	29255cfc3b	[Spec-Decode] Support piecewise cudagraphs for Eagle head (#25109 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-10-10 01:20:31 -04:00
Ben Browning	da4455609d	[Chore]: One pythonic tool parser test uses the wrong parser (#26515 ) Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-10-10 04:03:55 +00:00
Nick Hill	aafb99a4d4	[Core] Small simplification in `GPUModelRunner._update_states()` (#26508 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-10 10:53:58 +08:00
Rui Qiao	757fa4a4da	[DP][ray] Support different VLLM_RAY_DP_PACK_STRATEGY (#23849 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-10-09 19:53:43 -07:00
Julien Denize	c6187f55f7	Refactor MistralTokenizer (#26358 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai>	2025-10-09 22:48:58 +00:00
Wentao Ye	8983e0216f	[CI] Fix Pre-commit Issue Cannot determine type of "rank" and "world_size" (#26448 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-09 15:16:48 -07:00
Wentao Ye	1ee35382cb	[Bug] Fix modular_kernel: ZeroDivisionError: integer division or modulo by zero (#26528 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-09 15:13:27 -07:00
Benjamin Chislett	6e783bc54b	[Bugfix] Fix CUDA graph selection bug in FlashInfer at high concurrency (#26499 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-10-09 17:12:34 -04:00
Michael Goin	c9d33c60dc	[UX] Add FlashInfer as default CUDA dependency (#26443 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-10-09 14:10:02 -07:00
Nick Hill	2e54db4d2b	[Core] Remove unused `prev_sampled_token_ids_invalid_indices` input batch field (#26514 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-09 20:22:14 +00:00
elvischenv	44f633dba1	[Flashinfer][gpt-oss] Support FP8-qkv Flashinfer TRTLLM Sinks Attention (#25674 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-10-09 16:13:39 -04:00
bnellnm	a462331e36	[Bugfix] Disable moe inplace for torch >= 2.9 (#26497 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-10-09 18:07:38 +00:00
roikoren755	4069db3f2e	[Bugfix] Enable padded FP4 quantization (#25947 ) Signed-off-by: Roi Koren <roik@nvidia.com>	2025-10-09 10:59:41 -07:00
Sage Moore	0d37450eb7	[BUGFIX] Add cu_tokens_across_sp to DPMetadata (#26457 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-10-09 17:13:56 +00:00
bnellnm	47e66c24e2	[Model] Apply shared experts overlap optimization to all models with shared experts (#26145 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-10-09 11:31:04 -04:00
Ming Yang	3b736e1c38	[Attention][DCP] Support DCP with query length > 1 (MTP) with FA3 (#25049 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-10-09 08:06:29 -07:00
Lukas Geiger	2c1c7dfb35	[Models][Qwen] Replace `pad` with `cat` for better performance (#26486 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-10-09 14:51:26 +00:00
Harry Mellor	e246ad6f0c	Upgrade Pydantic to v2.12.0 and remove hack for Python 3.13 (#26481 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-09 06:02:40 -07:00
Jiangyun Zhu	5728da11ea	Revert #26113 "[Frontend] CompilationConfig overhaul (#20283 ): deprecate use_inductor in favor of backend, simplify custom_ops" (#26472 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-10-09 05:43:55 -07:00
Simon Danielsson	92be3f3517	[Feature] Use pydantic validation in parallel.py config (#26417 ) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-09 12:41:31 +00:00
Isotr0py	d1ddf340c8	[V0 deprecation] Remove `QKVCrossParallelLinear` implementation (#26475 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-09 10:52:27 +00:00
Wenzheng Bi	ec10fd0abc	[Bugfix] Move current_platform import to avoid python import cache. (#16601 ) Signed-off-by: iwzbi <wzbi@zju.edu.cn>	2025-10-09 10:46:19 +00:00
Lukas Geiger	0426e3c5e1	[Models][Qwen3VL] Optimise `_validate_and_reshape_mm_tensor` (#26426 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-10-09 10:25:48 +00:00
Cyrus Leung	4bdf7ac593	[Bugfix] Fix SHM cache initialization (#26427 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-09 02:48:04 -07:00
Cyrus Leung	dc7976dd9f	[Misc] Upgrade more code to Python 3.10 (#26463 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-09 10:43:53 +01:00
Simon Danielsson	e4791438ed	[Feature] Use pydantic validation in lora.py and load.py configs (#26413 ) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>	2025-10-09 02:38:33 -07:00
youkaichao	e6e898f95d	[doc] add Volcengine as a compute sponsor (#26477 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-10-09 17:11:47 +08:00
Nick Hill	ddcbc2f334	[Misc] Misc code simplifications (#26450 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-09 02:10:06 -07:00
Jerry Zhang	a83ff278d6	[torchao] Add support for ModuleFqnToConfig using regex (#26001 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-10-09 08:32:32 +00:00
Rahul Tuli	cf4cd6c24f	Add: Support for multiple hidden layers in Eagle3 (#26164 ) Signed-off-by: Rahul Tuli <rtuli@redhat.com>	2025-10-09 07:30:50 +00:00
Harry Mellor	b960441812	Enable `RMSNorm` substitution for Transformers backend (#26353 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-09 07:28:51 +00:00
Luciano Martins	1317028aa8	[Model] Gemma3: Fix GGUF loading and quantization (#26189 ) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-09 07:00:53 +00:00
elvischenv	5e49c3e777	Bump Flashinfer to v0.4.0 (#26326 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-10-08 23:58:44 -07:00
pwschuurman	0d7c3cb51d	Update Dockerfile and install runai-model-streamer[gcs] package (#26464 ) Signed-off-by: Peter Schuurman <psch@google.com>	2025-10-08 23:48:51 -07:00
Jee Jee Li	1b2c440cd6	[Core] Relax the LoRA max rank (#26461 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-10-08 23:47:14 -07:00
Cyrus Leung	0f29dca988	[CI/Build] Fix model nightly tests (#26466 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-08 23:44:16 -07:00
Zhiyuan Li	d24cf322e1	[Hybrid]: Decouple Kernel Block Size from KV Page Size (#24486 ) Signed-off-by: lizhiyuan <uniartisan2017@gmail.com> Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com>	2025-10-08 23:43:39 -07:00
Qier Li	d17f0fbf30	[Core][KVConnector] Propagate all tokens on resumed preemptions (#24926 ) Signed-off-by: Qier Li <kevin44036@gmail.com> Co-authored-by: Qier Li <qier@fb.com>	2025-10-09 14:43:31 +08:00
Wenlong Wang	43ab8cfaa5	[MM][Doc] Add documentation for configurable mm profiling (#26200 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-10-08 23:21:20 -07:00
Matt	de253d63b7	[Hardware][AMD] Enable FlexAttention backend on ROCm (#26439 ) Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>	2025-10-09 06:20:18 +00:00
Huy Do	8bd696fa53	[Bugfix] Incorrect another MM data format in vllm bench throughput (#26462 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-10-09 05:58:46 +00:00

1 2 3 4 5 ...

10313 Commits