xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-28 07:07:12 +08:00

Author	SHA1	Message	Date
Li, Jiang	20852c8f4c	[CPU] Refactor CPU WNA16 (#28826 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-11-19 10:32:00 +08:00
Jialin Ouyang	40b6b38f2c	[Core] Switch Flat logprob control from environment variable to SamplingParams (#28914 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-11-19 02:10:02 +00:00
Kunshang Ji	2a2d5d2780	Replace `torch.cuda.Event` with `torch.Event` for better hardware compatibility (#26985 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-18 11:34:36 -08:00
Chendi.Xue	c3e2978620	[NIXL] fix cpu PD after physical <> logical block_size PR (#28904 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2025-11-18 14:03:23 -05:00
Kevin H. Luu	c64c0b78de	[chore] Move the rest of wikimedia url to S3 (#28921 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-18 09:44:18 -08:00
Nicolò Lucchesi	f226a3f0c1	[CI][NIXL] Change default `block_size` for tests (#28927 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-18 09:22:30 -08:00
Luciano Martins	c2612371ad	[Model] Add Gemma3 GGUF multimodal support (#27772 ) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-18 08:56:29 -08:00
Alex	f6aa122698	[CI Sprint] Quantization CI Cleanup (#24130 ) Signed-off-by: Alex Yun <alexyun04@gmail.com>	2025-11-18 09:21:48 -05:00
Isotr0py	896e41ae04	[CI/Build] Replace wikipedia url with local server ones (#28908 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-18 08:10:55 +00:00
Nick Hill	5bdd155277	[CI] Fix async scheduling + spec decoding test flake (#28902 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-18 05:26:32 +00:00
Cyrus Leung	bf9e1e8767	[Bugfix] Fix wrong CLI defaults for dynamic `SchedulerConfig` fields (#28872 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-17 20:30:29 -08:00
Benjamin Bartels	b6e04390d3	[Bugfix] Fix Kimi-K2 tool parser concatenated tool calls parsing (#28831 ) Signed-off-by: Thomas Mao <yiyeguhu@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev> Co-authored-by: Thomas Mao <yiyeguhu@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2025-11-17 19:13:25 -08:00
Pranav	f77bce001a	[Model] Add Afmoe architecture implementation (#28332 ) Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> Signed-off-by: Pranav <veldurthipranav@gmail.com> Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>	2025-11-17 15:11:20 -08:00
Wentao Ye	a289cc1dde	[Test] Batch Invariant: Rename and organize tests (#27421 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-17 18:09:47 -05:00
Shreyas Kulkarni	95ae50b7d1	[Quantization] [Eagle] Add complete quantization support to the draft model in Eagle (#28435 ) Signed-off-by: Shreyas Kulkarni <shreyas.gp269@gmail.com>	2025-11-17 15:01:34 -08:00
Ronald	d8874c61a5	[Core] Async Scheduling X Spec Decoding Compatibility (#24799 ) Signed-off-by: Ronald1995 <ronaldautomobile@163.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-11-17 12:16:20 -08:00
Roger Wang	7f064491f8	[Bugfix][Perf] Revert applying HF processor on text-only inputs for multimodal models (#28858 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-11-17 14:49:25 +00:00
Jay Caldwell	6f37419244	[Bugfix][Model] Prevent special token leakage in KimiK2ToolParser streaming mode (#28543 ) Signed-off-by: Jscaldwell55 <jay.s.caldwell@gmail.com>	2025-11-17 13:54:46 +08:00
Nick Hill	80b6080ddc	[BugFix] Fix async scheduling + chunked prefill + preemption (#28787 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-17 06:46:46 +08:00
amirkl94	03ee48111d	Feature: Support Relu2 in FusedMoE fp8 cutlass path (#27261 )	2025-11-16 13:39:44 -05:00
Lucia Fang	b316ac6589	[V1] Support MP Executor for multi node distributed inference (#23691 ) Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Signed-off-by: Lucia Fang <fanglu@fb.com> Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-11-16 09:01:21 +00:00
wang.yuqi	a55b64635c	[Model] Allow users to control skip reading cache per request. (#28194 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com>	2025-11-16 00:04:50 -08:00
Eldar Kurtić	e439c784fa	Add support for Eagle with separate lm-head and embed_tokens layers (#28549 ) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>	2025-11-15 06:12:02 -08:00
Angela Yi	f36292dbee	[compile] Enable sequence parallelism matching w/o custom ops enabled (#27126 ) Signed-off-by: angelayi <yiangela7@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com>	2025-11-15 11:46:12 +00:00
Cyrus Leung	638e4196d1	[Misc] Make `SchedulerConfig.max_model_len` init-only (#28733 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-15 01:59:31 -08:00
Cyrus Leung	98b4d389ed	[Redo] #26368 (#28771 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-14 22:47:41 -08:00
Varun Sundar Rabindranath	6965ef436f	[Performance][DeepGEMM] Estimate expected_m (#28694 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-15 13:52:14 +08:00
Chendi.Xue	c9e665852a	[NIXL] heterogeneous block_size support (#26759 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>	2025-11-14 21:51:32 -08:00
Nick Hill	ac86bff8cb	Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… (#28773 )	2025-11-14 20:24:00 -08:00
Jialin Ouyang	186352b270	[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization (#26368 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-14 16:04:04 -08:00
Nick Hill	58e61e56b7	[Test] Rework e2e async scheduling tests (#28744 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-14 16:01:09 -08:00
Laith Sakka	2e0ad629b0	Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-11-14 14:11:10 -08:00
Marcin Ostrowski	0de4f217ab	[Bugfix] TypeError: 'NoneType' object is not callable (#27410 ) Signed-off-by: Marcin Ostrowski <marcinx.ostrowski@intel.com>	2025-11-14 21:13:53 +00:00
Cyrus Leung	e2741f6cbc	[Chore] Rename `SchedulerConfig.chunked_prefill_enabled` (#28735 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 18:39:57 +00:00
TJian	a425dc256e	[Bugfix] [ROCm] [AITER]: Fix aiter block quant not compatible with torch compile dynamo (#28716 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-11-14 10:30:50 -08:00
Nicolò Lucchesi	6f1e7f7226	[DisaggEverything] Tokens in<>out `/generate` endpoint (#24261 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-14 09:58:01 -07:00
Harry Mellor	5f3cd7f7f2	[Docs] Update the name of `Transformers backend` -> `Transformers modeling backend` (#28725 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-14 16:34:14 +00:00
dongbo910220	c934caee88	[Fix] improve aspect ratio in dummy image generation and add common VLM tests for PaddleOCR-VL (#28711 ) Signed-off-by: dongbo910220 <1275604947@qq.com>	2025-11-14 16:07:20 +00:00
Cyrus Leung	511a6b611d	[Config] Clean up SchedulerConfig initialization (#28665 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 22:41:02 +08:00
Nicolò Lucchesi	96b23b8e3b	[Bugfix][Nixl] Fix kernel physical<>logical block_size issue (#28677 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-14 22:40:05 +08:00
Lucas Wilkinson	db56a59970	[BugFix] Fix FA3 IMA with FULL_AND_PIECEWISE and cascade attention (default) (#28702 )	2025-11-14 12:19:22 +00:00
Yong Hoon Shin	9324e10275	Fix KV sharing fast prefill with cudagraph enabled (#28537 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-14 11:53:42 +00:00
Jingchun Gao	4516d44b7f	[DCP] Support Decode Context Parallel (DCP) for GQA with Flashinfer (#25438 ) Signed-off-by: gaojc <1055866782@qq.com> Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com> Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Co-authored-by: gaojingchun (A) <g00955623@china.huawei.com> Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com> Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>	2025-11-14 11:24:10 +00:00
Srreyansh Sethi	360bd8762f	[Frontend] Added chat-style multimodal support to /classify. (#27516 ) Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com> Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com> Signed-off-by: vnadathur <glvikramn@gmail.com> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: vnadathur <glvikramn@gmail.com> Co-authored-by: wang.yuqi <noooop@126.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-11-14 11:03:55 +00:00
Xing Liu	8cfbe89b93	[Misc] fix comment in test_envs (#28529 ) Signed-off-by: Xing Liu <xingliu14@gmail.com>	2025-11-14 09:32:46 +00:00
Boyuan Feng	fd75d3e8c0	[Minor] avoid register new custom and just import silly_attn (#28578 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-11-14 09:32:31 +00:00
Michael Goin	c9a3a02149	Add output token counting to gsm8k eval (#28594 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-14 09:32:03 +00:00
rasmith	93103575ce	[BugFix][CI/Build][ROCM] Fix import error and apply assert in appropriate case in test_struct_output_generate (#28311 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-13 22:41:29 -08:00
Hank_	4d5943bda6	[quantization][config] enable override existing quant_config (#28510 ) Signed-off-by: Hank <hcc.mayday@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-14 01:24:10 +00:00
Mark McLoughlin	6e25b1cddf	[KV Connector] Test async mode in scheduler tests (#28550 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-11-13 18:30:59 -05:00

... 3 4 5 6 7 ...

3760 Commits