xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-30 12:37:08 +08:00

Author	SHA1	Message	Date
Michael Goin	93d0652433	[CI] Increase timeout for test_completion_with_image_embeds (#22670 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-11 20:31:36 -07:00
Michael Goin	ea1292ad3e	[CI Failure] Use float32 for tests/entrypoints/openai/test_audio.py (#22686 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-11 20:20:42 -07:00
Harry Mellor	839ab00349	Re-enable Xet on TPU tests now that `hf_xet` has been updated (#22666 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-11 19:54:40 -07:00
Chen Zhang	1891a265d3	[gpt-oss] Add test for response API + harmony (but skipped) (#22554 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-11 17:47:24 -07:00
TJian	65abe111a3	[CI] Skip Tree Attn Test in `test_max_len.py` to unblock CI (#22664 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-08-11 10:36:05 -07:00
22quinn	807d21b80d	[BugFix] [Spec Decode] Remove LlamaForCausalLMEagle3 to fix CI (#22611 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-11 10:31:36 -07:00
Isotr0py	c90fb03df5	[CI/Build] Skip Mllama HF runner tests with Transformers v4.55.0 (#22659 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-08-11 10:00:58 -07:00
wang.yuqi	84cf78acee	[Model] Pooling models default to using chunked prefill & prefix caching if supported. (#20930 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-08-11 09:41:37 -07:00
GuanLuo	16fb668b61	fix: NIXL connector transfers partial block to pass full multi-modal context (#21074 ) Signed-off-by: GuanLuo <gluo@nvidia.com>	2025-08-11 09:40:55 -07:00
Wentao Ye	f7dcce7a4a	[Feature] Add `VLLM_USE_DEEP_GEMM_E8M0` Env to Control E8M0 Scale (#21968 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-11 09:39:08 -07:00
Cyrus Leung	ebf7605b0d	[Misc] Move tensor schema tests (#22612 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-11 00:15:27 -07:00
Maximilien de Bayser	39052dbca8	Support token_type_ids in V1 with less code changes (#21985 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-08-10 22:54:59 -07:00
Nick Hill	5898b135ab	[BugFix] Fix KVConnectorOutput TPU breakage (#22598 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-10 19:33:48 -07:00
22quinn	b799f4b9ea	[CI/Build] Fix tensorizer test for load_format change (#22583 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-10 19:30:00 -07:00
Benji Beck	68b254d673	Fix TensorSchema validation test for symbolic dims (#22366 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-08-10 17:16:44 +00:00
Isotr0py	b76753f0b5	[Bugfix][Kernel] Support partial rotary embedding for MRoPE triton kernel (#22593 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-10 09:00:36 -07:00
Isotr0py	049c245143	[Misc] Replace flaky image urls in pixtral test (#22574 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-08-10 06:18:21 -07:00
Ning Xie	326976291b	[Misc] code clean duplicate set_current_vllm_config in _set_vllm_config (#22566 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-08-10 00:08:48 -07:00
Harry Mellor	c49848396d	Refactor sliding window configuration to Transformers best practice (#21927 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-09 20:50:48 -07:00
Chengji Yao	2a84fb422f	[TPU] kv cache update kernel doesn't need to be padded slices to multiple of num_slices_per_block (#22394 ) Signed-off-by: Chengji Yao <chengjiyao@gmail.com> Co-authored-by: Chengji Yao <chengjiyao@gmail.com>	2025-08-09 20:49:04 -07:00
Le Chen	3d7363e61c	[Config] add "qwen" as a native eagle3 target supported model (#22333 ) Signed-off-by: lechen <lecself@163.com> Signed-off-by: LeChen <lecself@163.com>	2025-08-09 20:21:05 -07:00
Thomas Parnell	61f67d8acd	[V1] [Hybrid] Enable Full CUDA Graph (decode-only) for Mamba layers (#21401 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-08-09 20:16:11 -07:00
TJian	42172ad18f	[FEAT] [Performance] Add triton mrope to replace the torch code path (#22375 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-08-09 11:50:03 -07:00
Nicolò Lucchesi	5a16fa614c	[Model] Gemma3n MM (#20495 ) Signed-off-by: ShriKode <shrikode@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.me> Co-authored-by: ShriKode <shrikode@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.me>	2025-08-09 09:56:25 -07:00
Thomas Parnell	1bf5e1f25b	[CI] [Hybrid] Speed up hybrid models test by removing large models (#22563 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-08-09 02:04:42 -07:00
Yuxuan Zhang	a6022e6fbc	GLM-4.5V with new class name at transformers (#22520 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-09 00:50:21 -07:00
Jee Jee Li	0edc0cd52b	[Bugfix] Fix CI moe kernel failure (#22556 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-09 00:03:29 -07:00
Isotr0py	7920e9b1c5	[Bugfix] Fix failing GPT-OSS initialization test (#22557 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-09 00:03:26 -07:00
Kyuyeun Kim	9a0c5ded5a	[TPU] Add support for online w8a8 quantization (#22425 ) Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>	2025-08-08 23:12:54 -07:00
Thomas Parnell	8a0ffd6285	Remove mamba_ssm from vLLM requirements; install inside test container using `--no-build-isolation` (#22541 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-08-08 23:05:32 -07:00
Roger Wang	08b751ba74	Implicit language-model-only mode via limit-mm-per-prompt (#22299 ) Signed-off-by: Roger Wang <hey@rogerw.me> Signed-off-by: Andy Xie <andy.xning@gmail.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: Andrew Sansom <andrew@protopia.ai> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com> Signed-off-by: Shu Wang <shuw@nvidia.com> Signed-off-by: Po-Han Huang <pohanh@nvidia.com> Signed-off-by: Shu Wang. <shuw@nvidia.com> Signed-off-by: XIn Li <xinli@nvidia.com> Signed-off-by: Junhao Li <junhao@ubicloud.com> Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com> Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com> Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com> Signed-off-by: Linkun <github@lkchen.net> Co-authored-by: Ning Xie <andy.xning@gmail.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Co-authored-by: Andrew Sansom <andrew@protopia.ai> Co-authored-by: Zhiyu <zhiyuc@nvidia.com> Co-authored-by: Shu Wang <shuw@nvidia.com> Co-authored-by: XIn Li <xinli@nvidia.com> Co-authored-by: Junhao Li <streaver91@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Yuxuan Zhang <2448370773@qq.com> Co-authored-by: ZiTian Zhao <zitian.zhao@tencentmusic.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Po-Han Huang (NVIDIA) <53919306+nvpohanh@users.noreply.github.com> Co-authored-by: iAmir97 <71513472+iAmir97@users.noreply.github.com> Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Hong Hanh <hanh.usth@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: lkchen <github@lkchen.net>	2025-08-08 22:21:40 -07:00
Isotr0py	429e4e2d42	[Bugfix] Fix ModernBert cuda graph capturing in v1 (#21901 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-08-08 22:17:22 -07:00
Russell Bryant	311d875614	Drop flaky test_healthcheck_response_time (#22539 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-08-08 16:56:47 -07:00
Harry Mellor	e3edc0a7a8	Extract `CompilationConfig` from `config.py` (#22524 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-08 16:34:25 -07:00
yyweiss	baece8c3d2	[Frontend] Add unix domain socket support (#18097 ) Signed-off-by: <yyweiss@gmail.com> Signed-off-by: yyw <yyweiss@gmail.com>	2025-08-08 16:23:44 -07:00
Harry Mellor	41b9655751	Skip Qwen 1 in CI because remote code is no longer compatible with Transformers (#22536 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-08 16:20:58 -07:00
Yongye Zhu	e789cad6b8	[gpt-oss] triton kernel mxfp4 (#22421 ) Signed-off-by: <zyy1102000@gmail.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>	2025-08-08 08:24:07 -07:00
Cyrus Leung	43c4f3d77c	[Misc] Begin deprecation of `get_tensor_model_*_group` (#22494 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-08 01:11:54 -07:00
Chauncey	17eaaef595	[Bugfix] Fix RuntimeError: Index put requires the source and destination dtypes match (#22065 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-08-07 19:20:21 -07:00
TJian	1ee5ead5f8	[ROCm] [V1] [SpecDec] Enable Speculative Decoding on ROCm V1 Engine (#21496 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-08-07 19:13:17 -07:00
Ning Xie	acf8aeb79e	[Misc] normalize multiprocessing Queue usage (#22371 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-08-08 01:57:27 +00:00
Harry Mellor	7e3a8dc906	Remove `from_dict` from `SpeculativeConfig` (#22451 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-07 10:13:04 -07:00
Cyrus Leung	139d155781	[Frontend] Use engine argument to control MM cache size (#22441 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-07 09:47:10 -07:00
Chen Zhang	4815b00f54	[gpt-oss] Generate ResponseOutputItem from Harmony Message (#22410 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-07 08:33:25 -07:00
Cyrus Leung	766bc8162c	[Core] Store only the keys for multi-modal data in P0 (#22198 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-07 01:45:04 -07:00
Adrián García García	8e8e0b6af1	feat: Add --enable-log-outputs flag for logging model generations (#20707 ) Signed-off-by: Adrian Garcia <adrian.garcia@inceptionai.ai>	2025-08-06 23:10:13 -07:00
Ming Yang	82216dc21f	[Misc] Support routing logic simulation (#21990 ) Signed-off-by: Ming Yang <minos.future@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-06 23:06:20 -07:00
Moritz Sanft	370661856b	[Frontend] Update OpenAI error response to upstream format (#22099 ) Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>	2025-08-06 23:06:00 -07:00
wang.yuqi	2a4c825523	[CI] Skip the pooling models that do not support transformers v4.55 (#22411 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-08-06 23:05:03 -07:00
qscqesze	5e9455ae8f	[Bugfix]: Fix the streaming output for function calls in the minimax (#22015 ) Signed-off-by: QscQ <qscqesze@gmail.com> Signed-off-by: qingjun <qingjun@minimaxi.com>	2025-08-06 20:30:27 -07:00

1 2 3 4 5 ...

2569 Commits