xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-29 08:07:51 +08:00

Author	SHA1	Message	Date
Nicolò Lucchesi	26a465584a	[NIXL] Use config to enable telemetry + NIXL version bump (#29305 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-24 17:18:04 +00:00
Varun Sundar Rabindranath	e924bbb4f4	[Build/CI][DP/EP] Add QWen/Qwen3-30B-A3B-FP8 + EPLB tests to Nightly H100 and B200 (#29195 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-24 16:06:17 +00:00
Aydin Abiar	656516c315	[Bugfix] properly handle nested json with llama3 tool parser (#27701 ) Signed-off-by: Aydin Abiar <aydin@anyscale.com> Signed-off-by: Aydin Abiar <62435714+Aydin-ab@users.noreply.github.com> Co-authored-by: Aydin Abiar <aydin@anyscale.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2025-11-24 15:28:51 +00:00
vllmellm	e48b2e6848	[Bugfix] [ROCm] [UX] Reorganize ROCm Backend Selection Logic (#26980 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-11-24 15:24:49 +00:00
Laith Sakka	7a228b5305	Add option to use unbacked, and backed size obl dynamic shapes for more sounds compilation. (#26199 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-11-24 10:12:41 -05:00
Yuan Tang	f716a15372	Update KServe guide link in documentation (#29258 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-11-24 14:40:05 +00:00
WeiQing Chen	2601f18a82	[EPLB] Optimize EPLB for Async Rearrange Experts (#22179 ) Signed-off-by: David Chen <530634352@qq.com> Co-authored-by: SunChenxiang123 <1291824390@qq.com>	2025-11-24 09:08:29 -05:00
R3hankhan	4de87866a8	[CPU][IBM Z] Fix BF16 support and vectorize math operations for s390x (#28926 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2025-11-24 12:08:09 +00:00
Didier Durand	eca7a8fb59	[Doc]: fix typos in various files (#29230 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-24 11:10:48 +00:00
杰兮	8005e606bf	[Bugfix][Rocm] Fix shared expert weight loading failure in DeepSeek-MTP (#27563 ) Signed-off-by: zhyajie <yajizhan@amd.com> Co-authored-by: zhyajie <yajizhan@amd.com>	2025-11-24 10:16:52 +00:00
rongfu.leng	68dfe28eae	[Feature][Benchmark] add --link-vars can filter when serve_param equal bench_param (#28909 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-11-24 02:02:28 -08:00
Fanli Lin	ed40d85929	[BugFix] Fix R-VL model loading error (#29299 ) Signed-off-by: Lin, Fanli <fanli.lin@intel.com>	2025-11-23 22:48:45 -08:00
Roger Wang	0ff70821c9	[Core] Deprecate `xformers` (#29262 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-11-24 04:18:55 +00:00
tongqiu	5253f4276f	[ROCm] Support for Whisper v1 with Aiter Unified Attention and Aiter Flash Attention (#28376 ) Signed-off-by: apinge <Tong.Qiu2@amd.com>	2025-11-24 03:26:00 +00:00
Zero	30854783ad	[Model] Add OpenCUA-7B support (#29068 ) Signed-off-by: lim4349 <rockmanzero@naver.com> Signed-off-by: Zero <rockmanzero@naver.com> Co-authored-by: Cloud User <ubuntu@a100-80g-4.novalocal> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-24 10:27:55 +08:00
Jee Jee Li	1073ba68b0	[LoRA] Optimize 3D MoE logic (#29222 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-24 10:27:23 +08:00
Josh Moore	c309bb5245	[Bugfix] Update Gradio OpenAI Chatbot Webserver example to new Gradio message history format (#29249 ) Signed-off-by: joshiemoore <joshiemoore98@gmail.com>	2025-11-24 00:47:54 +00:00
Woosuk Kwon	3e1ad40655	[Model Runner V2] Add apply_temperature option to gumbel_sample (#29276 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-23 14:13:00 -08:00
Woosuk Kwon	62d54ba46d	[Model Runner V2] Optimize CUDA graph capture time (#29275 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-23 11:15:32 -08:00
Woosuk Kwon	b004c00418	[Model Runner V2] Support spec decoding [1/N] (#29274 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-23 10:09:06 -08:00
Woosuk Kwon	7f12c82fa6	[Model Runner V2] Change bookkeeping logic in preparation for spec decoding (#29194 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-23 09:42:52 -08:00
Luke	6fb0215eee	[Bugfix] Use lazy string reference for DeepseekV3Config in config registry (#28958 ) Signed-off-by: Luke <yq0536@gmail.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-23 11:43:21 +00:00
Micah Williamson	55c21c8836	[ROCm][CI] Fix "Cannot re-initialize CUDA in forked subprocess" in test_pynccl.py (#29119 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-11-23 13:05:00 +08:00
rasmith	3999442f1c	[CI/Build][AMD] Add check for flash_att_varlen_func to test_tree_attention.py (#29252 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-23 04:45:08 +00:00
rasmith	71362ffab4	[CI/Build][AMD] Skip test_multi_shared_storage_connector_consistency in test_multi_connector.py due to hipErrorLaunchFailure when calling .cpu() (#29253 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-23 04:42:49 +00:00
Woosuk Kwon	20ee418adc	[Model Runner V2] Minor fix for cudagraph_utils (#29256 )	2025-11-22 20:12:50 -08:00
Cyrus Leung	389aa1b2eb	[Doc] Update more docs with respect to V1 (#29188 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-23 10:58:48 +08:00
Michael Act	3ed767ec06	docs: fixes distributed executor backend config for multi-node vllm (#29173 ) Signed-off-by: Michael Act <michael.a.c.tulenan@gdplabs.id> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-23 10:58:28 +08:00
jiahanc	5f96c00c55	[Fix] Add SM check to flashinfer MOE backend (#29144 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-11-23 00:39:30 +00:00
Qidong Su	4587063267	Patch DeepEP when building docker image with CUDA 13 (#29154 ) Signed-off-by: Qidong Su <soodoshll@gmail.com>	2025-11-22 23:25:13 +00:00
Wentao Ye	472fdee974	[Chore] Update batch invariant code owner (#29246 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-22 13:50:02 -08:00
Yizhou	df78aeef08	Refactor: Move CUDA graph dispatch logic earlier (#27382 ) Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>	2025-11-22 16:10:31 -05:00
Nick Hill	7df331c66b	[BugFix] Fix chunked prompt logprobs + preemption (#29071 )	2025-11-22 16:07:18 -05:00
Benjamin Bartels	eb5352a770	[CI/build] Removes source compilation from runtime image (#26966 ) Signed-off-by: bbartels <benjamin@bartels.dev>	2025-11-22 10:23:09 -08:00
Cyrus Leung	d1cf8214e5	[Bugfix] Use HF config fields as fallback when loading Mistral config (#29239 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-22 11:22:48 -07:00
Fadi Arafeh	730bd35378	[perf][cpu] Accelerate paged attention GEMMs (QK, PV) on Arm CPUs with NEON (#29193 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2025-11-22 09:04:36 -08:00
Federico	f55c76c2b3	chore: add RTX_PRO_6000 GLM4.6-FP8 kernel tuning (#29240 )	2025-11-22 08:42:48 -08:00
ZiTian Zhao	d84d8f4429	Fix EVS crash when using `video_embeds` inputs in Qwen2.5-VL (#29232 ) Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-22 06:48:59 -08:00
Cyrus Leung	ae66818379	[Misc] Fix pre-commit (#29238 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-22 06:48:01 -08:00
Nick Hill	d44a63c6d6	[BugFix] Fix returned logprobs with spec decode + prefill chunking (#29216 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-22 22:41:25 +08:00
Nicolò Lucchesi	066209a045	[Attention] Refactor FA `block_size` limitations to hybrid models only (#29084 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-22 06:38:44 -08:00
Bram Wasti	5f7209a793	[tiny] Remove unsupported TRITON_MLA backend from batch invariance (#28832 ) Signed-off-by: Bram Wasti <bwasti@meta.com> Signed-off-by: Bram Wasti <bwasti@fb.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-22 21:00:50 +08:00
yihong	2d4978a57e	fix: clean up function never use in setup.py (#29061 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-11-22 05:00:04 -08:00
Nandan Vallamdasu	6965a392a4	Fix: Resolve circular import in model_loader/utils.py (#29189 ) Signed-off-by: nandan2003 <nandan.vallamdasu@outlook.com> Signed-off-by: Nandan Vallamdasu <nandan.vallamdasu@outlook.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-22 04:58:22 -08:00
Cyrus Leung	5a4802588e	[Misc] Further clean up chunked prefill and prefix caching init (#29186 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-22 19:34:15 +08:00
rasmith	8e22da1d7f	[CI/Build Don't add FLASHINFER backend in test_cpu_offloading.py (#29229 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-22 11:00:54 +00:00
rasmith	a4fdf2405c	[CI/Build] Skip tests that require libcudart in test_lmcache_integration.py (#29228 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-22 10:59:39 +00:00
Jane (Yuan) Xu	e6309acdba	Simplify `from_blob` usage in `get_cuda_view_from_cpu_tensor` (#29027 ) Signed-off-by: Jane Xu <janeyx@meta.com>	2025-11-22 10:35:32 +00:00
jinghanhu	988ee66b0d	Handle triton kernel import exception (#29062 )	2025-11-22 10:07:50 +00:00
Mads Kildegård	ea38474ac5	[Frontend][Responses API] Multi-turn (with type: "output_text") support for non-harmony requests (#29175 ) Signed-off-by: Mads Kildegård <mkildegaard99@gmail.com>	2025-11-22 09:58:22 +00:00

1 2 3 4 5 ...

11610 Commits