xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-08-03 10:27:03 +08:00

Author	SHA1	Message	Date
Harry Mellor	f36355abfd	Move `LoadConfig` from `config/__init__.py` to `config/load.py` (#24566 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-10 06:14:18 -07:00
Yash Pratap Singh	9e3c3a7df2	[LoRA]: Add LoRA support to Mistral's Voxtral models (#24517 ) Signed-off-by: Yash Pratap Singh <yashsingh20001@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-10 06:12:03 -07:00
baonudesifeizhai	6cbd41909e	Feature/vit attention unification# 23880 (#23978 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-10 06:10:14 -07:00
danielafrimi	72d30108a0	Support for NemotronH Nano VLM (#23644 ) Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>	2025-09-10 06:10:06 -07:00
vllmellm	7c195d43da	[ROCm][Bugfix] Fix Aiter RMSNorm (#23412 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-09-10 21:08:03 +08:00
Lucas Wilkinson	0ae43dbf8c	[Attention] add DCP support for FLASH_ATTN_MLA backend (#24453 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2025-09-10 17:19:26 +08:00
li-jinpeng	267c80d31f	[Model] Limit CPU threads for image transformations in InternVL to reduce cpu contention. (#24519 ) Signed-off-by: li-jinpeng <3332126450@qq.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-09-10 16:45:44 +08:00
Flora Feng	77f62613f9	Consolidate rendering parameters into RenderConfig dataclass (#24543 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2025-09-10 08:44:47 +00:00
Remy	feaf202e93	[Bugfix] Guard `_may_reorder_batch` for encoder-only models on CPU (#24319 ) (#24348 ) Signed-off-by: Remy <eunhwan.shin@dtonic.io> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2025-09-10 14:24:42 +08:00
pwschuurman	4377b1ae3b	[Bugfix] Update Run:AI Model Streamer Loading Integration (#23845 ) Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai> Signed-off-by: Peter Schuurman <psch@google.com> Co-authored-by: Omer Dayan (SW-GPU) <omer@run.ai> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-09-09 21:37:17 -07:00
Chenheli Hua	009d689b0c	[Core] Simplify and unify mm uuid handling & auto-generated mm hash overrides processing. (#24271 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-09-09 21:36:09 -07:00
Wei	0efdb5c3ba	[gpt-oss] Cache permute indices for faster MXFP4 MoE layer loading (#24154 ) Signed-off-by: Wei Wei <wwei6@meta.com>	2025-09-10 04:27:53 +00:00
Wenlong Wang	53b42f4102	[BugFix][Spec Decode] Fix out-of-range index triggered by eagle3; re-enable test for LlamaForCausalLMEagle3 (#24392 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-09-09 21:24:23 -07:00
Chauncey	309d7aa401	[P/D] MultiConnector supports shutdown (#24425 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-09-09 21:24:11 -07:00
Yihua Cheng	b4a01aaf95	[KV Connector] More async support for `get_num_new_matched_tokens` (#23620 ) Signed-off-by: ApostaC <yihua98@uchicago.edu>	2025-09-09 21:23:37 -07:00
Nick Hill	f88e84016f	[BugFix] Fix async core engine client finalizer (#24540 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-09 21:07:13 -07:00
Ignacio Sica	3c2156b3af	[Hardware][Apple-CPU] Enable native bfloat16 on Apple Silicon (M2 and later) (#24129 ) Signed-off-by: ignaciosica <mignacio.sica@gmail.com>	2025-09-10 03:50:21 +00:00
Yong Hoon Shin	dc625ea6b8	[Perf] Convert np array to torch tensor to index into block table for attn chunking (#24474 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-09-09 20:01:06 -07:00
bnellnm	b23fb78623	[Bugfix] Fix for 24530. Fix naive all2all shared expert overlap. (#24538 )	2025-09-09 17:53:53 -07:00
Tyler Michael Smith	561f38dc3c	[Bugfix] Improve EPLB config validation error message (#24524 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-09-10 00:32:36 +00:00
Charlie Fu	73e688cb79	[ROCm][Feature] Enable Pipeline Parallelism with Ray Compiled Graph on ROCm (#24275 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-09-09 23:27:35 +00:00
Ekagra Ranjan	fb1a8f932a	[Benchmark] Add option to skip oversampling in benchmark (#24457 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>	2025-09-09 22:00:17 +00:00
Jiangyun Zhu	b5fb3005a8	[Log] Use a relative path in debug-level logs to distinguish files with identical names (#23846 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-09-09 16:46:35 -04:00
Wentao Ye	15de5ff9ea	[Feature] Disallow FlashMLA on Blackwell (#24521 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-09 14:59:34 -04:00
Chenyaaang	c3f9773b2c	[TPU] Fix tpu structured decoding in mixed batches (#24458 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-09-09 11:04:25 -07:00
Flora Feng	15cb047e25	Extend renderer with embedding support and integrate completion endpoint (#24405 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2025-09-10 01:46:46 +08:00
Jee Jee Li	9ad0688e43	[Bugfix] Fix hidden_size for multimodal classification model (#24501 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-09 10:37:25 -07:00
youkaichao	1aa427fdc1	[Kernels] Add Flash Linear Attention Kernels (#24518 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-09-10 00:04:41 +08:00
Micah Williamson	1c63a16b65	[Core] Run garbage collector after CUDA graph capture to fix throughput regression (#24128 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-09-09 10:38:10 -04:00
d.transposed	922d3b401b	[Bugfix] Handle the edge case in detokenizer where processed tokens contain both `stop` str and `eos` token (#23938 ) Signed-off-by: dtransposed <damian.bogunowicz@gmail.com>	2025-09-09 07:30:24 -07:00
wang.yuqi	19332c0479	[Model] Systematic support for fp32 head, pooling models part (#23810 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-09-09 07:29:50 -07:00
Wentao Ye	a55cf41a09	[Compilation][WideEP] Enable Piecewise CUDAGraph for DeepEPHT (#24123 )	2025-09-09 10:21:10 -04:00
Chen Zhang	1116590b16	[gpt-oss] Validate gpt-oss python tool during initialization (#23856 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-09-09 08:37:48 +00:00
WeiQing Chen	e283976f3a	[Performance][MM] Building the inverse permutation in O(n) time in Qwen2_5_VisionTransformer (#24443 ) Signed-off-by: Junhong <liujunhong11@huawei.com> Co-authored-by: Junhong <liujunhong11@huawei.com>	2025-09-09 00:24:11 -07:00
Didier Durand	46876dff32	[Doc]: fixing typos to improve docs (#24480 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-08 23:06:04 -07:00
Ming Yang	1823a00d67	[Misc] Support bench serve long context (#24373 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-09-08 22:53:10 -07:00
22quinn	0cdd213641	[Misc] Improve Worker process title and logging prefix (#22205 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-09-08 21:43:48 -07:00
cong-meta	b2f7745774	Add data_parallel_size to VllmConfig string representation (#24298 ) Co-authored-by: Cong Chen <congc@meta.com>	2025-09-08 21:35:18 -07:00
Zebing Lin	82dfb12e52	[Core] Use sha256 bytes instead of BlockHash to reduce GC overhead (#23673 ) Signed-off-by: linzebing <linzebing1995@gmail.com>	2025-09-08 21:34:37 -07:00
elvischenv	bba1042c6f	[Flashinfer] Support Flashinfer TRTLLM FP8-qkv BF16/FP16-out Attention Kernel (#23647 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-09-08 20:53:07 -07:00
CSWYF3634076	b6fbc15634	[BugFix][Model] Fix Ernie4.5-VL hanging on long inputs (#24074 ) Signed-off-by: wangyafeng <wangyafeng@baidu.com>	2025-09-09 11:37:16 +08:00
Harry Mellor	3e0d4a3475	Move `KVTransferConfig` from `config/__init__.py` to `config/kv_transfer.py` (#24434 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-08 20:30:32 -07:00
cjackal	13b89bd823	[doc] update `vllm serve` cli args documentation (#24329 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2025-09-09 03:07:58 +00:00
zhiweiz	170129eb28	[gpt-oss] Harmony changes with container tool support (#23386 ) Signed-off-by: zhiweiz <zhiweiz@fb.com> Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com> Co-authored-by: zhiweiz <zhiweiz@fb.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2025-09-08 19:03:50 -07:00
Tyler Michael Smith	955c624915	[Bugfix][Wide EP] Fix redundant work when using DeepEP, TP Attn, and EP MoE (#24134 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2025-09-08 19:01:51 -07:00
Chauncey	e680723eba	[Bugfix] Disable the statslogger if the api_server_count is greater than 1 (#22227 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-09-08 15:28:03 -07:00
Matthew Bonanni	620db1fc58	[Attention] FlashAttention MLA cudagraph support (#23958 ) Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-09-08 22:05:26 +00:00
Ekagra Ranjan	41183c1fe0	[Spec Decode] Fix offline spec_decode.py (#24257 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-09-08 20:44:13 +00:00
Yang Kaiyong	43d9ad03ba	[Model loader]: support multi-thread model weight loading (#23928 ) Signed-off-by: Yang Kaiyong <yangkaiyong.yky@antgroup.com> Signed-off-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-09-08 18:49:39 +00:00
Jee Jee Li	8d7f39b48c	[Model] Remove quantized mixtral (#24437 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-08 11:02:14 -07:00

1 2 3 4 5 ...

6330 Commits