xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-02 16:37:18 +08:00

Author	SHA1	Message	Date
cjackal	43b752c325	[Llama4] [multimodal] Fix misplaced dtype cast of `cos_sin_cache` in `Llama4VisionRotaryEmbedding` (#25889 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2025-09-30 20:35:15 +00:00
Or Ozeri	cfd302db9b	OffloadingConnector: Fix GPU block tracking bug (#25856 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-09-30 19:53:04 +00:00
bnellnm	fb610ae684	[Docs] Add moe kernel features doc (#25297 ) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-30 19:03:15 +00:00
Cyrus Leung	2f652e6cdf	[Doc] Improve MM Pooling model documentation (#25966 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-30 18:58:29 +00:00
Wentao Ye	e6a226efba	[Bug] Fix AttributeError: 'QKVParallelLinear' object has no attribute 'orig_dtype' (#25958 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-30 11:13:03 -07:00
youkaichao	a2e6fa7e03	[bugfix][deepseek] fix flashmla kernel selection (#25956 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-10-01 00:30:36 +08:00
Cyrus Leung	9f1c4ecaf2	[Bugfix] Token type and position embeddings fail to be applied to `inputs_embeds` (#25922 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-01 00:23:12 +08:00
Pavani Majety	ef283548f7	[Bugfix] Fix accuracy issue of TRTLLM FP8 MOE and improve logging (#25895 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-09-30 10:51:31 -04:00
Anion	f4db5e6de1	[Bugfix][Model] Fix inference for Hunyuan dense models (#25354 ) Signed-off-by: anion <1005128408@qq.com> Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com>	2025-09-30 14:38:07 +00:00
Sergio Paniego Blanco	099aaee536	Add Hugging Face Inference Endpoints guide to Deployment docs (#25886 ) Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-30 14:35:06 +00:00
Asaf Joseph Gardin	35fe398c7c	[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 (#25858 ) Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>	2025-09-30 07:30:44 -07:00
ihb2032	bb6d43047e	[Fix] Improve CPU backend compatibility for RISC-V (#25816 ) Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn> Signed-off-by: ihb2032 <1355790728@qq.com>	2025-09-30 13:48:07 +00:00
Reza Barazesh	bc546f76a1	[CI] Move applicable tests to CPU (#24080 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-30 14:45:20 +01:00
Nicolò Lucchesi	80608ba5af	[NIXL] Add support for MLA caches with different latent dim (#25902 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-09-30 12:18:29 +00:00
Lehua Ding	e184c9c510	[perf] Use CPU tensor to reduce GPU->CPU sync (#25884 ) Signed-off-by: Lehua Ding <lehuading@tencent.com>	2025-09-30 19:51:16 +08:00
Cyrus Leung	d7e34b4210	[Model] Move `vision_feature_select_strategy` into `resolve_visual_encoder_outputs` (#25938 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-30 11:24:57 +00:00
CSWYF3634076	ef6e0e7132	[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 (#25936 ) Signed-off-by: wangyafeng <wangyafeng@baidu.com>	2025-09-30 19:11:21 +08:00
Sergio Paniego Blanco	1ad3aca682	Updated TRL integration docs (#25684 ) Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-30 03:10:55 -07:00
a120092009	8d0afa9b42	[Doc] Add Cambricon MLU support (#25942 ) Signed-off-by: a120092009 <zhaoty0121@gmail.com>	2025-09-30 17:59:47 +08:00
Yongye Zhu	fa7e254a7f	[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by: Lucia Fang <fanglu@meta.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Co-authored-by: Lucia Fang <fanglu@meta.com> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Xiaozhu Meng <mxz297@gmail.com> Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-09-30 17:14:41 +08:00
Simon Danielsson	e23cacda35	[Bugfix]: Clean up chunked prefill logging when using whisper (#25075 ) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>	2025-09-30 08:17:49 +00:00
Zhou Jiahao	2e1b8bc2b6	[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorrect `logical_not` (#25925 ) Signed-off-by: zhoukz <me@zhoukz.com>	2025-09-30 08:15:23 +00:00
acisseJZhong	e47433b3c1	[BugFix] Pass config_format via try_get_generation_config (#25912 )	2025-09-30 05:09:50 +00:00
Lucas Wilkinson	23194d83e8	[BugFix] Fix DP/EP hang (#25906 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-09-30 04:18:59 +00:00
Harry Mellor	61aedb5ffe	Move`VllmConfig` from `config/__init__.py` to `config/vllm.py` (#25271 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-29 19:49:49 -07:00
Zhuohan Li	d3bd171123	[Benchmark] Support benchmark throughput for external launcher DP (#25913 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-09-30 01:43:57 +00:00
Wentao Ye	89e4050af4	[Bug] Fix Weight Loading for Block FP8 Cutlass SM90 (#25909 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-30 09:15:19 +08:00
Andrew Sansom	78a47f87ce	Test Prompt Embeds/LoRA compatibility and Enable LoRA Support for OPT Models (#25717 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai>	2025-09-30 08:10:58 +08:00
Aaron Pham	6a113d9aed	[V0 Deprecation] Remove `vllm.worker` and update according imports (#25901 )	2025-09-29 23:26:11 +00:00
Nicolò Lucchesi	2e4fe48c37	[NIXL] Increase default KV block eviction timeout on P (#25897 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-09-29 21:35:14 +00:00
Zhuohan Li	8eb0a1d906	[Doc] Polish example for torchrun dp (#25899 )	2025-09-29 21:31:34 +00:00
Thomas Parnell	fea3e476aa	[Kernel] Chunk-aligned mamba2 (#24683 )	2025-09-29 23:18:25 +02:00
Gregory Shtrasberg	61a3431613	[Bugfix][ROCm] Fixing trying to import non-existent symbols from libnccl.so (#25605 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-09-29 17:01:50 -04:00
Naman Lalit	9bedac9623	[Doc] Add documentation for vLLM continuous benchmarking and profiling (#25819 ) Signed-off-by: Naman Lalit <nl2688@nyu.edu>	2025-09-29 20:49:49 +00:00
Adrian Abeyta	c42ff4f4fd	[BugFix][torch.compile] KV scale calculation issues with FP8 quantization (#25513 ) Signed-off-by: adabeyta <aabeyta@redhat.com>	2025-09-29 15:52:04 -04:00
Lee Nau	d5ab28511c	[Bugfix] Use correct key "ignore" for config.json non-quantized layers (#25706 ) Signed-off-by: Lee Nau <lnau@nvidia.com>	2025-09-29 15:07:29 -04:00
Jee Jee Li	e61eb5e09d	[Model] Remove MotifForCausalLM (#25866 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-30 00:36:30 +08:00
Isotr0py	0899ba5b42	[CI/Build] Include Transformers backend test in nightly transformers test (#25885 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-29 09:33:39 -07:00
Rahul Tuli	145ac73317	[Bugfix][Speculative Decoding] Fix Eagle3 quantization config issue (#25883 ) Signed-off-by: Rahul Tuli <rtuli@redhat.com>	2025-09-29 11:37:20 -04:00
Chenxi Yang	d0d138bc55	[Nixl][P/D] Add cuda2cpu support (HD->DH transfer) (#24690 ) Signed-off-by: Chenxi Yang <cxyang@fb.com> Co-authored-by: Chenxi Yang <cxyang@fb.com>	2025-09-29 14:31:51 +00:00
Jiangyun Zhu	43227236ec	[torch.compile] serialize cudagraph_mode as its enum name instead of value (#25868 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-09-29 13:54:52 +00:00
Zhou Jiahao	8616300ae2	[Model][Bugfix] Fix issues in MiDashengLM implementation for quantized models (#25854 ) Signed-off-by: zhoukz <me@zhoukz.com>	2025-09-29 10:59:04 +00:00
Yingjun Mou	edbaadd91f	[Bugfix] Fix requirements paths in install instructions (#25827 ) Signed-off-by: yingjun-mou <renzomou@gmail.com>	2025-09-29 03:49:35 -07:00
youkaichao	9360d34fa1	update to latest deepgemm for dsv3.2 (#25871 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-09-29 17:51:43 +08:00
Cyrus Leung	1b67b04656	[Misc] Remove more `get_input_embeddings_v0` (#25857 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-29 08:03:37 +00:00
Isotr0py	bd51f78e39	[V0 Deprecation][Models] Remove all V0 condition for mm embeddings merge (#25331 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: isotr0py <2037008807@qq.com>	2025-09-29 14:09:18 +08:00
Roger Wang	65ecb4f134	[Bugfix] Fallback ViT attn backend to SDPA for blackwell (#25851 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-09-29 06:03:51 +00:00
Kunshang Ji	143844fa43	[XPU]Fix xpu spec decoding UTs, avoid using cuda graph (#25847 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-09-29 05:15:10 +00:00
Thomas Parnell	219cfbe7f6	Add Phi4FlashForCausalLM to _PREVIOUSLY_SUPPORTED_MODELS (#25832 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-09-29 05:08:17 +00:00
Robert Shaw	9b44a7d926	[P/D] NIXL Updates (#25844 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Chenheli Hua <huachenheli@outlook.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2025-09-29 04:46:30 +00:00

1 2 3 4 5 ...

10023 Commits