xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-01 11:07:12 +08:00

Author	SHA1	Message	Date
Chen Zhang	f2ae883b67	[v1][KVCacheManager] pass num_new_computed_tokens to kv cache manager (#18001 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-13 19:09:39 -07:00
vllmellm	40de1ef455	[FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature (#14968 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-13 19:08:20 -07:00
Nick Hill	55aa7af994	[V1] DP scale-out (2/N): Decouple engine process management and comms (#15977 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-13 10:48:21 -07:00
Harry Mellor	0b217da646	Update deprecated type hinting in `vllm/adapter_commons` (#18073 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 08:32:51 -07:00
Harry Mellor	19324d660c	Update deprecated type hinting in `vllm/compilation` (#18072 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 08:32:48 -07:00
Cyrus Leung	b922c2ebd2	[Bugfix] Fix entrypoints metrics tests (#18063 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-13 06:42:43 -07:00
Harry Mellor	8c946cecca	Update deprecated type hinting in `vllm/transformers_utils` (#18058 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 04:34:37 -07:00
Harry Mellor	ff334ca1cd	Update deprecated type hinting in `vllm/profiler` (#18057 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 04:34:34 -07:00
Harry Mellor	6223dd8114	Update deprecated type hinting in `model_executor/layers` (#18056 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 04:17:23 -07:00
Aaron Pham	cb528d0585	[Fix] check to make sure processor has chat templates (#18047 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-05-13 03:04:10 -07:00
Michael Goin	ea6ae8cb45	[Bugfix] Fix marlin moe fallback logic for llama4 (#18042 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-13 07:53:28 +00:00
Woosuk Kwon	2ff297dce9	[BugFix] Set default random seed to 0 for V1 (#17929 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-05-13 07:52:19 +00:00
Jin Huang	8dd0671bac	[Bugfix][V1] Only get input embeddings w/ multi-modal models if first PP (#17916 ) Signed-off-by: Jin Huang <jinhun@amazon.com> Co-authored-by: Jin Huang <jinhun@amazon.com>	2025-05-13 15:10:07 +08:00
Chen Zhang	f0d610a8ae	[v1][KVCacheManager] Avoid full cache hit by controlling max_length (#17999 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-05-13 06:50:38 +00:00
Calvin Chen	48545728d8	cleanup invalid prints (#18050 ) Signed-off-by: calvin chen <120380290@qq.com>	2025-05-12 23:01:57 -07:00
Chauncey	dc1a821768	[Feature][V1] Support `tool_choice: required` when using Xgrammar as the `StructuredOutputBackend`. (#17845 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-05-12 23:01:31 -07:00
Cyrus Leung	61e0a506a3	[Bugfix] Avoid repeatedly creating dummy data during engine startup (#17935 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-12 22:40:19 -07:00
Michael Goin	1df491c522	[Bugfix] Fixes for new marlin moe usage (#18017 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-13 03:50:04 +00:00
Jee Jee Li	c06af9a959	[Misc] Slight spelling modification (#18039 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-12 20:36:27 -07:00
Tao He	60f7624334	Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844 )	2025-05-12 19:52:47 -07:00
Harry Mellor	d67085c2c8	Remove noisy warnings from `SchedulerConfig` (#17995 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 00:33:45 +00:00
Michael Goin	307939f299	Use NVFP4 Marlin for CompressedTensorsW4A16Fp4 (#18000 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Dipika <dipikasikka1@gmail.com> Co-authored-by: Dipika <dipikasikka1@gmail.com>	2025-05-12 18:07:34 -06:00
Harry Mellor	9d7ea9dbbf	Update some more deprecated type hinting (#17998 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-12 23:49:33 +00:00
bwshen-mi	acee8f48aa	[Model] Support MiMo-7B inference with MTP (#17433 ) Signed-off-by: wp-alpha <wangpeng66@xiaomi.com> Co-authored-by: wangpeng66 <wangpeng66@xiaomi.com>	2025-05-12 23:25:33 +00:00
Michael Goin	f065de4e88	Fix FBGEMM integration (#18002 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-12 23:02:07 +00:00
wwl2755	dc9905368d	[V1][Spec Decode] Eagle unit tests (#17350 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-05-12 23:01:17 +00:00
Robert Shaw	195adb47c0	[Chore] Remove unused method (#18024 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-05-12 13:59:47 -07:00
Chen Zhang	302f3aca7e	[v1][KVCacheManager] Change prefix caching metric from counting blocks to counting tokens (#18003 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-12 13:46:12 -07:00
Jade Zheng	289199feb6	[Core] Use platform-agnostic device control for DP engine core (#17245 ) Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>	2025-05-12 12:09:16 -07:00
Jonathan Berkhahn	98ea35601c	[Lora][Frontend]Add default local directory LoRA resolver plugin. (#16855 ) Signed-off-by: jberkhahn <jaberkha@us.ibm.com>	2025-05-12 10:39:10 -07:00
Robert Shaw	d19110204c	[P/D] NIXL Integration (#17751 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Brent Salisbury <bsalisbu@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Brent Salisbury <bsalisbu@redhat.com>	2025-05-12 09:46:16 -07:00
Maximilien de Bayser	05a4324f8e	Initialize the delta tool call fields explicitly (#17340 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: igmainc <igmainc@icloud.com>	2025-05-12 13:28:58 +00:00
Jee Jee Li	7ea6cb28b2	[Misc] Improve modelscope import error (#17983 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-12 10:46:45 +00:00
Xu Wenqing	3a5ea75129	[Feature] Support DeepSeekV3 Function Call (#17784 ) Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com> Signed-off-by: Xu Wenqing <xuwq1993@qq.com>	2025-05-12 00:45:21 -07:00
Brayden Zhong	891b9d33de	[Fix] Benchmark `"EngineClient" has no attribute "model_config"` (#17976 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 22:55:53 -07:00
Siyuan Liu	430783018c	[Bugfix][TPU] Use np array when updating cache slot_mapping (#17971 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-05-12 12:58:33 +08:00
Li Wang	19a3c78d1f	[Bugfix] Fix pydantic.errors.PydanticUserError (#17962 ) Signed-off-by: wangli <wangli858794774@gmail.com>	2025-05-12 12:58:23 +08:00
Reid	ada50aa295	[bugfix] fix the wrong parser (#17958 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-12 04:58:02 +00:00
Cheng Kuan Yong Jason	08bf784078	[Bugfix] validate grammar and throw 400 error instead of crashing the engine when xgrammar validation fails (#17623 ) Signed-off-by: Jason Cheng <jasoncky96@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-05-12 09:06:10 +08:00
Isotr0py	021c16c7ca	[Model] Broadcast Ovis2 implementation to fit Ovis1.6 (#17861 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-11 17:56:30 -07:00
TJian	7de18d541b	[BUG] [ROCm] [MLA] Fix variable name bug due to change in variable name in PR #17483 (#17961 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-11 09:14:30 -07:00
TJian	a810b5b088	[BugFix] [ROCm]: Bugfix and handle addition case of input for `rocm_aiter_rms_norm` (#17857 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-11 04:17:11 -07:00
Reid	009b3d5382	[Misc] not show --model in vllm serve --help (#16691 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-11 08:47:58 +00:00
wang.yuqi	e4b8713380	[New Model]: nomic-embed-text-v2-moe (#17785 )	2025-05-11 00:59:43 -07:00
Gregory Shtrasberg	06c0922a69	[FP8][ROCm][Attention] Enable FP8 KV cache on ROCm for V1 (#17870 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-05-11 15:58:45 +08:00
Dipika Sikka	cd3edfc908	[Misc] Add compressed-tensors NVFP4A16 emulation support (#17914 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Dipika <dipikasikka1@gmail.com>	2025-05-11 15:58:38 +08:00
Frieda Huang	9cea90eab4	[Frontend] Add /classify endpoint (#17032 ) Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>	2025-05-11 07:57:07 +00:00
Ben Browning	8132365b74	[Bugfix]: v1 engine - consider lora adapters in allowed_token_ids (#17855 ) Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-05-11 00:53:58 -07:00
Shiyan Deng	eea22a56ab	fix amd triton mla path (#17871 )	2025-05-11 07:53:31 +00:00
Kuntai Du	9112155283	[Perf] Use small max_num_batched_tokens for A100 (#17885 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2025-05-11 07:53:23 +00:00

1 2 3 4 5 ...

4378 Commits