xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-22 20:15:46 +08:00

Author	SHA1	Message	Date
Julien Denize	1b1e35aaf9	[BUGFIX] Fix regex pattern for Mistral Tool Call (#29918 ) Signed-off-by: juliendenize <julien.denize@mistral.ai>	2025-12-02 14:51:58 -08:00
Julien Denize	5e5646e206	[BUGFIX] llama_4_scaling wrongly passed to DeepseekAttention (#29908 ) Signed-off-by: juliendenize <julien.denize@mistral.ai>	2025-12-02 14:51:20 -08:00
Chauncey	0a9caca9f5	[Bugfix] fix --scheduling-policy=priority & n>1 crashes engine (#29764 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-02 22:42:28 +00:00
Sage Moore	e6f114ac25	[Bugfix][EPLB] Prevent user-provided EPLB config from being overwritten with defaults (#29911 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-12-02 13:20:22 -09:00
Harry Mellor	6fc5841db1	Fix some more Transformers nightly tests (#29872 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 21:49:44 +00:00
dependabot[bot]	3ff5b53bc2	Bump actions/setup-python from 6.0.0 to 6.1.0 (#29768 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-12-02 21:29:32 +00:00
jthomson04	1528e079e2	[Perf] Avoid pageable HtoD transfer in MinTokensLogitsProcessor (#29826 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com>	2025-12-02 21:25:52 +00:00
Divakar Verma	afb1e5b380	[CI][ROCm][tests/v1/e2e] Fix multiprocessing launch for the test (#29123 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-02 20:46:10 +00:00
Copilot	1c593e117d	Fix boolean nested params, add dict format support, and enhance plotting for vllm bench sweep (#29025 ) Signed-off-by: Luka Govedič <luka.govedic@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-12-02 20:40:56 +00:00
Navanit Dubey	a2b053dc85	feat(model): Add BitsAndBytes quantization support for Qwen3-Omni-MoE (#29896 ) Signed-off-by: navanit-git <navanitdubey@gmail.com>	2025-12-02 19:28:35 +00:00
Matthew Bonanni	1d93f11675	[Attention][CUDAGraph] Remove CG padding from attention backends (#29352 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-12-02 13:48:08 -05:00
Benjamin Bartels	2d613de9ae	[CI/Build] Fixes missing runtime dependencies (#29822 ) Signed-off-by: bbartels <benjamin@bartels.dev>	2025-12-02 10:21:49 -08:00
Alexei-V-Ivanov-AMD	c77b9929a0	Update AMD-CI testing mirror (as of 2025-12-02) (#29898 ) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>	2025-12-02 08:52:54 -09:00
Isotr0py	63b1da76ba	[Chore]: Reorganize gguf utils funtions under `transformers_utils` (#29891 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-02 17:33:23 +00:00
Andrew Xia	52cb349fc0	[responsesAPI][3] ResponsesParser to set up non harmony MCP (#29413 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-12-02 11:24:45 -05:00
Isotr0py	0ec8422171	[Bugfix] Fix incorrect channel order for idefics3 in edge case (#29881 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-02 16:03:52 +00:00
wang.yuqi	2eb4fe9129	[examples] Resettle pooling examples. (#29365 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 15:54:28 +00:00
Matthew Bonanni	51c57b51dd	[Bugfix] Fix DeepSeek R1 MTP weight loading (#29545 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2025-12-02 15:52:18 +00:00
ImaGoodFella	60c3d413af	[Multimodal][Core] Optimize multimodal preprocessing cache by hashing image bytes instead of pixel values (#29621 ) Signed-off-by: Rahul Steiger <rasteiger@ethz.ch> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-02 21:49:02 +08:00
Cyrus Leung	68ffbca7e4	[Chore] Use `tokenizer.encode` and `tokenizer.decode` directly (#29851 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-02 12:30:40 +00:00
Harry Mellor	951445a52d	Remove default values from `InitVar`s so that they're not stored (#29859 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 12:16:37 +00:00
Julien Denize	d8c6210eea	Add Mistral Large 3 and Ministral 3 (#29757 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Signed-off-by: Mickael Seznec <mickael@mistral.ai> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Mickael Seznec <mickael@mistral.ai>	2025-12-02 10:29:00 +00:00
Louie Tsai	8bbcf8b6e7	[vLLM Benchmark Suite] Add default parameters section and update CPU benchmark cases (#29381 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Li, Jiang <bigpyj64@gmail.com>	2025-12-02 09:00:23 +00:00
Boyuan Feng	70fb77b4dc	[BugFix] add max-num-batched-token to scheduler hash (#29829 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-12-02 08:55:02 +00:00
杰兮	48d15a32aa	[CI] Fix Bad_words test for tokenizer encode/decode asymmetry (#28193 ) Signed-off-by: zhyajie <yajizhan@amd.com> Co-authored-by: zhyajie <yajizhan@amd.com>	2025-12-02 00:02:12 -08:00
Boyuan Feng	3b221cb661	[BugFix] respect VLLM_LOGGING_LEVEL in logger (#29761 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-12-02 07:49:16 +00:00
Wushi Dong	0037b5746a	[Core] Eliminate redundant is_encoder_decoder lookups (20-40us/step) (#29800 ) Signed-off-by: Wushi Dong <dongws@meta.com>	2025-12-02 07:08:07 +00:00
Harry Mellor	f5b0846ba0	Fix some Transformers nightly tests (#29802 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 07:05:27 +00:00
Zhang Xiangze	13ea39bc09	[CPU]Parallelize over tokens in int4 moe (#29600 ) Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>	2025-12-02 06:21:39 +00:00
Shengqi Chen	4b612664fd	[CI] Renovation of nightly wheel build & generation (take 2) (#29838 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2025-12-01 22:17:10 -08:00
Cyrus Leung	653591d5e7	[Chore] Move tokenizer initialization methods (#29793 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-02 13:33:37 +08:00
Divakar Verma	e2fbfc955e	[CI][AMD] spec_decode:eagle skip FLASH_ATTN for deepseek on ROCm (#29827 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-02 05:27:46 +00:00
Divakar Verma	a690fb5bd6	[CI][ROCm] Fix test_correctness_sliding_window (#29243 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-02 04:53:27 +00:00
usberkeley	81fe3f82af	[BugFix] Fix index error in ngram_proposer (#29779 ) Signed-off-by: Bradley <bradley.b.pitt@gmail.com>	2025-12-02 04:48:11 +00:00
Zuyi Zhao	53bf71b0f0	[Misc] Update conftest for entrypoints/sagemaker test folder (#29799 ) Signed-off-by: Zuyi Zhao <zhaozuy@amazon.com>	2025-12-01 18:56:39 -09:00
Johnny Yang	f441d36cee	Add missing return in _check_vllm_model_embed_input_ids (#29834 ) Signed-off-by: Johnny Yang <johnnyyang@google.com>	2025-12-01 19:22:50 -08:00
Seiji Eicher	22274b2184	[Misc] Add ReplicaId to Ray metrics (#24267 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com> Co-authored-by: rongfu.leng <1275177125@qq.com>	2025-12-02 03:21:44 +00:00
Wei Wei	fc95521ba5	[Misc] Throw error on unintended access to scheduler_config.max_model_len (#29771 ) Signed-off-by: Wei Wei <wwei6@meta.com>	2025-12-02 10:58:44 +08:00
Zhuohan Li	d0cd728907	[Core] Support reseting all running requests' KV while calling `reset_prefix_cache` (#28827 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-02 02:25:05 +00:00
Andrew Xia	fa8804ad9c	[responsesAPI][4] fix responseOutputItem Kimi K2 thinking bug (#29555 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-12-02 02:11:35 +00:00
Divakar Verma	4b40924998	[ROCm] Fallback pytorch GELU with tanh approximation to GELU() (#29244 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com> Signed-off-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-02 02:02:22 +00:00
Hendrik Holtmann	c0dfc89485	SM120 / NVFP4: add device guard and runtime SM dispatch to cutlass_scaled_fp4_mm (#29711 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-12-01 17:24:18 -08:00
Nick Hill	44822d7ff2	[BugFix] Preserve spec decoding uniform decode when scheduling (#29759 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-01 17:15:52 -08:00
Alexei-V-Ivanov-AMD	342c4f1472	Updated CI mirror 2025-11-25 (#29434 ) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> Signed-off-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com> Co-authored-by: Kevin H. Luu <khluu000@gmail.com>	2025-12-01 23:44:33 +00:00
Kevin H. Luu	1336a1ea24	Revert #29787 and #29690 (#29815 )	2025-12-01 13:42:03 -08:00
Nengjun Ma	eaf81485ed	[Ascend]: Fixed the issue where OOT Platform vllm-ascend could not enable SP in Eager mode (#28935 ) Signed-off-by: leo-pony <nengjunma@outlook.com>	2025-12-01 15:02:18 -05:00
Finbarr Timbers	38caf7fa1a	Update FAQ on interleaving sliding windows support (#29796 ) Signed-off-by: Finbarr Timbers <finbarrtimbers@gmail.com>	2025-12-01 19:15:19 +00:00
shivampr	cabc77cc86	[Core][Observability] Add KV cache residency metrics (#27793 ) Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior: vllm:kv_block_lifetime_seconds — total lifetime from allocation to free vllm:kv_block_idle_before_evict_seconds — idle duration before eviction vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates. Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled. Two new runtime flags are introduced: --kv-cache-metrics – enable KV cache residency metrics --kv-cache-metrics-sample – control sampling ratio (default: 0.01) Signed-off-by: Shivam <shivamprasad91@gmail.com>	2025-12-01 18:27:53 +00:00
Kevin H. Luu	ec7035c9d4	[ci] Make distributed 8 gpus test optional (#29801 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com>	2025-12-01 10:22:05 -08:00
knlnguyen1802	fc6acc88ca	[Bugfix] Missing cached item in the MultiModalReceiverCache (#28525 ) Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: Chenguang Zheng <645327136@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-01 10:18:07 -08:00

... 5 6 7 8 9 ...

12170 Commits