xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-23 05:27:14 +08:00

Author	SHA1	Message	Date
Russell Bryant	ebab1ac37c	[CI] Make JSON output tests less likely to fail (#17859 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-12 22:31:54 +00:00
Yang Wang	2b0db9b0e2	Enable standard language model for torhc nightly (#18004 ) Signed-off-by: Yang Wang <elainewy@meta.com>	2025-05-12 14:00:04 -07:00
Robert Shaw	195adb47c0	[Chore] Remove unused method (#18024 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-05-12 13:59:47 -07:00
Chen Zhang	302f3aca7e	[v1][KVCacheManager] Change prefix caching metric from counting blocks to counting tokens (#18003 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-12 13:46:12 -07:00
Alexei-V-Ivanov-AMD	e9c730c9bd	Enabling "Weight Loading Multiple GPU Test - Large Models" (#18020 )	2025-05-12 13:05:33 -07:00
Jade Zheng	289199feb6	[Core] Use platform-agnostic device control for DP engine core (#17245 ) Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>	2025-05-12 12:09:16 -07:00
Carol Zheng	b9fd0d7a69	[CI/Build] Fix TPU V1 Test mixed use of & and && across tests (#17968 )	2025-05-12 12:06:59 -07:00
Harry Mellor	72a3f6b898	Construct `KVTransferConfig` properly from Python instead of using JSON blobs without CLI (#17994 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-12 11:25:33 -07:00
Jonathan Berkhahn	98ea35601c	[Lora][Frontend]Add default local directory LoRA resolver plugin. (#16855 ) Signed-off-by: jberkhahn <jaberkha@us.ibm.com>	2025-05-12 10:39:10 -07:00
Robert Shaw	d19110204c	[P/D] NIXL Integration (#17751 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Brent Salisbury <bsalisbu@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Brent Salisbury <bsalisbu@redhat.com>	2025-05-12 09:46:16 -07:00
Maximilien de Bayser	05a4324f8e	Initialize the delta tool call fields explicitly (#17340 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: igmainc <igmainc@icloud.com>	2025-05-12 13:28:58 +00:00
Jee Jee Li	7ea6cb28b2	[Misc] Improve modelscope import error (#17983 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-12 10:46:45 +00:00
Aaruni Aggarwal	9fbf2bfbd5	Correcting testcases in builkite job for IBM Power (#17675 ) Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com>	2025-05-12 08:11:55 +00:00
Xu Wenqing	3a5ea75129	[Feature] Support DeepSeekV3 Function Call (#17784 ) Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com> Signed-off-by: Xu Wenqing <xuwq1993@qq.com>	2025-05-12 00:45:21 -07:00
Brayden Zhong	891b9d33de	[Fix] Benchmark `"EngineClient" has no attribute "model_config"` (#17976 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 22:55:53 -07:00
Siyuan Liu	430783018c	[Bugfix][TPU] Use np array when updating cache slot_mapping (#17971 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-05-12 12:58:33 +08:00
Li Wang	19a3c78d1f	[Bugfix] Fix pydantic.errors.PydanticUserError (#17962 ) Signed-off-by: wangli <wangli858794774@gmail.com>	2025-05-12 12:58:23 +08:00
Reid	ada50aa295	[bugfix] fix the wrong parser (#17958 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-12 04:58:02 +00:00
Cheng Kuan Yong Jason	08bf784078	[Bugfix] validate grammar and throw 400 error instead of crashing the engine when xgrammar validation fails (#17623 ) Signed-off-by: Jason Cheng <jasoncky96@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-05-12 09:06:10 +08:00
youkaichao	d45fe333fb	[misc] add instructions on how to install nvshmem/pplx/deepep (#17964 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-05-11 18:02:39 -07:00
Isotr0py	021c16c7ca	[Model] Broadcast Ovis2 implementation to fit Ovis1.6 (#17861 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-11 17:56:30 -07:00
TJian	7de18d541b	[BUG] [ROCm] [MLA] Fix variable name bug due to change in variable name in PR #17483 (#17961 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-11 09:14:30 -07:00
TJian	a810b5b088	[BugFix] [ROCm]: Bugfix and handle addition case of input for `rocm_aiter_rms_norm` (#17857 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-11 04:17:11 -07:00
Reid	009b3d5382	[Misc] not show --model in vllm serve --help (#16691 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-11 08:47:58 +00:00
wang.yuqi	e4b8713380	[New Model]: nomic-embed-text-v2-moe (#17785 )	2025-05-11 00:59:43 -07:00
Gregory Shtrasberg	06c0922a69	[FP8][ROCm][Attention] Enable FP8 KV cache on ROCm for V1 (#17870 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-05-11 15:58:45 +08:00
Dipika Sikka	cd3edfc908	[Misc] Add compressed-tensors NVFP4A16 emulation support (#17914 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Dipika <dipikasikka1@gmail.com>	2025-05-11 15:58:38 +08:00
Frieda Huang	9cea90eab4	[Frontend] Add /classify endpoint (#17032 ) Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>	2025-05-11 07:57:07 +00:00
Reid	d1110f5b5a	[doc] update lora doc (#17936 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-11 15:56:21 +08:00
Ben Browning	8132365b74	[Bugfix]: v1 engine - consider lora adapters in allowed_token_ids (#17855 ) Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-05-11 00:53:58 -07:00
Shiyan Deng	eea22a56ab	fix amd triton mla path (#17871 )	2025-05-11 07:53:31 +00:00
Kuntai Du	9112155283	[Perf] Use small max_num_batched_tokens for A100 (#17885 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2025-05-11 07:53:23 +00:00
xinli-centml	90d0a74b60	[Bugfix] Add revision to `transformers.Auto*.from_pretrained` processors (#17948 ) Signed-off-by: Xin Li <xin@centml.ai>	2025-05-11 07:52:44 +00:00
Jinzhen Lin	d74e5f37bc	[Kernel] fp4 marlin kernel (#17687 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>	2025-05-10 19:58:49 -07:00
Chen Zhang	ca66a1674c	[v1] Rename specialized_manager.py to single_type_kv_cache_manager.py (#17946 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-10 16:14:12 -07:00
Chen Zhang	950751a987	[v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders (#17483 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-10 16:12:04 -07:00
Reid	4c31218f80	[Misc] remove --model from vllm serve usage (#17944 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-10 13:23:31 +00:00
Harry Mellor	68311891f5	Don't default construct `ModelConfig` when default constructing `VllmConfig` (#17943 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-10 13:23:00 +00:00
Ximo Guanter	fc4441a4ee	Add missing content type headers to /ping and /health (#17036 ) (#17786 ) Signed-off-by: Ximo Guanter <ximo.guanter@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-10 07:13:32 +01:00
tracelogfb	246e3e0a36	fix broken test vllm:test_kernels - test_attention_selector.py::test_flash_attn (#17873 ) Co-authored-by: Stephen Chen <tracelog@meta.com>	2025-05-10 10:46:54 +08:00
Mark McLoughlin	7042cc96b0	[V1][Spec Decoding] Log accumulated metrics after system goes idle (#17913 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-09 18:23:07 -07:00
Pavani Majety	0c0fdae84f	[Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model (#16362 )	2025-05-09 16:24:41 -07:00
Alexei-V-Ivanov-AMD	3b602cdea7	AMD conditional all test execution // new test groups (#17556 ) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>	2025-05-09 15:35:58 -07:00
Harry Mellor	4b2ed7926a	Improve configs - the rest! (#17562 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-09 15:18:44 -07:00
Mark McLoughlin	7e3571134f	[V1][Spec Decoding] Include bonus tokens in mean acceptance length (#17908 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-09 13:32:36 -07:00
Richard Zou	ea2236bf95	Add option to use torch._inductor.standalone_compile (#17057 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-05-09 12:59:04 -07:00
Harry Mellor	7d4aedae7c	Handle error when `str` passed to `/v1/audio/transcriptions` (#17909 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-09 19:23:59 +00:00
Michael Goin	22481fbfa3	Update CT WNA16MarlinMoE integration (#16666 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-09 13:19:45 -04:00
Isotr0py	5c4c08f6f1	[Misc] Auto fallback to float16 for pre-Ampere GPUs when detected bfloat16 config (#17265 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-09 17:16:12 +00:00
Rui Qiao	c44c384b1c	[Misc] Add references in ray_serve_deepseek example (#17907 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-05-09 16:59:36 +00:00

1 2 3 4 5 ...

6428 Commits