xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-28 17:17:22 +08:00

Author	SHA1	Message	Date
Varun Sundar Rabindranath	3137991f55	[BugFix] EPLB + B200 + DeepGEMM : Handle column-major scales tensor (#29162 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-21 14:28:17 -08:00
Julien Denize	57430fc95c	Default model load/config/tokenizer to `mistral` format if relevant files exist (#28659 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-11-21 13:58:59 -08:00
Wentao Ye	1f400c58b8	[CI] Add batch invariant test to ci (#27842 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-21 09:20:33 -07:00
rasmith	711241c13c	[CI/Build] Fix illegal memory access and unsupported test in kernels/attention/test_cache.py (#29118 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-21 10:58:38 -05:00
Cyrus Leung	aab0102a26	[V0 deprecation] Remove more V0 references (#29088 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-21 11:56:59 +00:00
WeiQing Chen	b34129bf8e	[Misc] remove useless v1 env (#29164 ) Signed-off-by: David Chen <530634352@qq.com>	2025-11-21 01:41:20 -08:00
Alex Brooks	b4734b9550	[Bugfix] Fix default MM LoRA alignment for single str prompts (#29140 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-11-21 13:32:30 +08:00
Jialin Ouyang	30b9c67743	Revert "[Redo] #26368 (#28771 )" (#29121 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-20 21:27:45 -08:00
Cyrus Leung	56e96b37e4	[V0 Deprecation] Remove `best_of` (#29090 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-21 11:40:40 +08:00
jeremyteboul	0730414999	[Core] Add audio_embeds support to chat completions (#29059 ) Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com> Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>	2025-11-21 11:39:47 +08:00
Jee Jee Li	9875be6431	[LoRA][2/2]Remove LoRA extra vocab (#28545 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-21 09:46:43 +08:00
Michael Goin	87cbbdff63	Update model references for OLMo3 (#29099 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-11-21 09:16:52 +08:00
rasmith	c7a29d2c8d	[CI/Build] Remove skip global cleanup in test_struct_output_generate.py (#29022 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-20 21:44:37 +00:00
rasmith	8237ab8a2b	[CI/Build] Skip lm-format-enforcer tests in test_struct_output_generate.py for now (#29021 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-20 21:35:14 +00:00
rasmith	5e5a7eb16f	[CI/Build] Make test_attention_selector.py run tests on correct platform (#29064 ) Signed-off-by: Randall Smith <ransmith@amd.com> Signed-off-by: rasmith <Randall.Smith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-20 20:45:56 +00:00
rasmith	3d84ef9054	[CI/Build][AMD] Skip if flash_attn_varlen_func not available in test_aiter_flash_attn.py (#29043 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-20 20:39:49 +00:00
Or Ozeri	647464719b	[KVConnector][Core] Support cross-layer KV blocks (#27743 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-11-20 19:09:59 +01:00
TJian	82b05b15e6	[BugFix] [FEAT] Enable fastsafetensors for ROCm platform (#28225 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-11-20 16:34:11 +00:00
Shinichi Hemmi	c9e093116c	[MODEL] Implement plamo3 (#28834 ) Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>	2025-11-20 03:00:19 -08:00
Or Ozeri	c0c2dd1e0b	[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks (#28951 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-20 18:55:10 +08:00
Vensen	fb8851f254	[Bugfix][cache_kernels]: Fix OOB in cache_kernels.cu (#28760 ) Signed-off-by: vensen <vensenmu@gmail.com> Signed-off-by: Vensenmu <vensenmu@gmail.com>	2025-11-20 02:52:02 -08:00
rasmith	322cb02872	[CI/Build][AMD] Fix import errors in tests/kernels/attention (#29032 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-20 17:48:09 +08:00
Wentao Ye	2c52c7fd9a	[Bug] Fix torch dynamo warning Dynamo detected a call to a `functools.lru_cache` (#29038 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-20 16:52:23 +08:00
Bradley D	1e1c06789e	[ci][amd] fix EPLB execution test (#28742 ) Signed-off-by: Bradley Davis <bradleyhd@meta.com>	2025-11-20 14:53:38 +07:00
Lukas Geiger	a9705a290a	[Model][QwenVL] Replace `torch.repeat_interleave` with faster `np.repeat` (#28964 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-19 22:04:23 -08:00
Canlin Guo	fe25772aa9	[Bugfix] Handle broken frames in video loading (#29001 ) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: 凌葭 <lvjiang.lj@alibaba-inc.com> Co-authored-by: 凌葭 <lvjiang.lj@alibaba-inc.com>	2025-11-20 04:38:12 +00:00
Benjamin Chislett	fcbcba6c70	[Feat] Iteration-level profiling for Torch and CUDA profiler (#28987 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-19 19:17:48 -08:00
liangel-02	1d642872a2	[torchao] fix safetensors for sharding (#28169 ) Signed-off-by: Angel Li <liangel@meta.com>	2025-11-19 16:39:45 -08:00
Nick Hill	9ccef8e333	[Misc] Colorize logs (#29017 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-19 19:26:04 -05:00
Alexander Matveev	3aaa94ac99	[Performance] Reduce DeepGEMM N dim restriction from 128 to 64 multiplier (#28687 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-11-19 15:47:13 -08:00
Micah Williamson	22e44ad589	[ROCm][CI] Fix Weight Loading With Multiple GPU Tests on ROCm (#28984 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-11-19 21:31:33 +00:00
Shu Wang	613abb50d5	[MoE] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#25990 ) Signed-off-by: Shu Wang. <shuw@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-19 13:29:06 -08:00
Wentao Ye	1607e664f0	[Bug] Fix Batch Invariant MLA test (#28967 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-19 21:18:32 +00:00
Ryan Rock	68d7231991	[CI/Build] Fix test_prefix_prefill for AMD (#28905 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2025-11-19 16:04:36 -05:00
Qiu	2fd893b4ce	[Feature] Prefill Context Parallel (PCP) basic support (#28718 ) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com> Signed-off-by: LookAround <lixushi@huawei.com> Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com> Co-authored-by: LookAround <lixushi@huawei.com> Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com> Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>	2025-11-19 15:52:44 -05:00
Copilot	61728cd1df	Re-enable FlashInfer for Llama4 on Blackwell in e2e fusion tests (#28966 ) Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-19 13:32:19 -05:00
Harry Mellor	a8b70304d6	Update `rope_scaling` to `rope_parameters` in preparation for Transformers v5 (#28542 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-19 09:06:36 -08:00
vnadathur	1ffe934c8a	[torch.compile] caching of config fields should be opt-out by default (#26468 ) Signed-off-by: vnadathur <glvikramn@gmail.com> Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com> Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com> Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com> Co-authored-by: WorldExplored <srreyansh.sethi@gmail.com> Co-authored-by: Srreyansh Sethi <107075589+worldexplored@users.noreply.github.com> Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-19 06:13:54 -08:00
Yanan Cao	2c8b9182b5	[CI] Reorganize compile tests so new tests are automatically included in CI (#28625 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-11-19 06:13:50 -08:00
Didier Durand	09540cd918	[Doc]: fix typos in various files (#29010 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-11-19 04:56:21 -08:00
gnovack	d69062c67a	add support for --fully-sharded-loras in fused_moe (#28761 ) Signed-off-by: gnovack <gnovack@amazon.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-19 16:32:00 +08:00
Roman Solomatin	71d0ae1c54	[Misc] Update embedding/cross encoder tests to use `mteb` v2 (#27329 ) Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: wang.yuqi <noooop@126.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-11-18 22:28:40 -08:00
Matthew Bonanni	4c23690f43	[Attention] FlashAttention ViT support, make default backend (#28763 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-18 20:06:21 -08:00
Strahinja Stamenkovic	814843e021	Enable bitsandbytes quantization on AMD GPUs that use warp size 32 (#27307 ) Signed-off-by: sstamenk <strahinja.stamenkovic@amd.com>	2025-11-19 03:12:31 +00:00
Li, Jiang	20852c8f4c	[CPU] Refactor CPU WNA16 (#28826 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-11-19 10:32:00 +08:00
Jialin Ouyang	40b6b38f2c	[Core] Switch Flat logprob control from environment variable to SamplingParams (#28914 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-11-19 02:10:02 +00:00
Kunshang Ji	2a2d5d2780	Replace `torch.cuda.Event` with `torch.Event` for better hardware compatibility (#26985 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-18 11:34:36 -08:00
Chendi.Xue	c3e2978620	[NIXL] fix cpu PD after physical <> logical block_size PR (#28904 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2025-11-18 14:03:23 -05:00
Kevin H. Luu	c64c0b78de	[chore] Move the rest of wikimedia url to S3 (#28921 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-18 09:44:18 -08:00
Nicolò Lucchesi	f226a3f0c1	[CI][NIXL] Change default `block_size` for tests (#28927 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-18 09:22:30 -08:00

1 2 3 4 5 ...

3604 Commits