xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-24 19:27:17 +08:00

Author	SHA1	Message	Date
Johnny Yang	3ecabd06ee	Fix tpu-inference platform path (#29554 ) Signed-off-by: Johnny Yang <johnnyyang@google.com>	2025-11-26 23:25:21 -08:00
Jee Jee Li	c069086b9c	[Bugfix] Fix getting device for MoE LoRA (#29475 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-26 23:16:07 -08:00
Woosuk Kwon	11ea5ec1ff	[Model Runner V2] Refactor CudaGraphManager (#29583 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-26 21:37:59 -08:00
Fadi Arafeh	ecb1952378	[cpu][fix] Fix Arm CI tests (#29552 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2025-11-27 13:09:41 +08:00
TJian	da8e1a1bf9	[DOC] Add vLLM Bangkok Meetup info (#29561 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-11-27 04:42:50 +00:00
Woosuk Kwon	ee80aee1ca	[Model Runner V2] Minor cleanup for build_attn_metadata (#29576 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-26 20:10:12 -08:00
Woosuk Kwon	0aeb698b77	[Model Runner V2] Minor code cleanup (#29570 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-26 19:47:17 -08:00
Louie Tsai	9bb33c8919	add xpu supported model and model id for cpu (#29380 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com>	2025-11-27 11:30:50 +08:00
Jinzhen Lin	a67dec7cba	[Bugfix] fix IMA issue in certain cases of the moe marlin kernel (#28619 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-26 19:02:21 -08:00
Matthew Bonanni	77740191de	[Attention][Async] Eliminate `seq_lens_cpu` in FlashAttention metadata building with DCP > 1 (#29449 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-26 18:48:43 -08:00
HDCharles	df01eda4dc	[Bugfix] Make compressed-tensors MoEs respect ignored layers (#28878 ) Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>	2025-11-26 21:35:13 -05:00
Johnny Yang	ba1fcd84a7	[TPU] add tpu_inference (#27277 ) Signed-off-by: Johnny Yang <johnnyyang@google.com>	2025-11-26 14:46:36 -08:00
Lucas Wilkinson	56539cddac	[Core] Refactor padding logic and pad for CUDA graphs before attention metadata building (#28579 )	2025-11-26 14:07:13 -05:00
Matthew Bonanni	430dd4d9eb	[Attention] Remove imports from `vllm/attention/__init__.py` (#29342 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-26 10:53:15 -07:00
Alec	c4c0354eec	[CI/Build] allow user modify pplx and deepep ref by ENV or command line (#29131 ) Signed-off-by: alec-flowers <aflowers@nvidia.com>	2025-11-26 17:41:16 +00:00
HDCharles	e603129505	[refactor] CTConfig methods to static/class methods (#28870 ) Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-26 17:21:58 +00:00
Wentao Ye	0b0aa874e8	[Perf] Optimize batch invariant BMM, 18.1% Throughput improvement, 10.7% TTFT improvement (#29345 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-26 09:38:52 -07:00
Huamin Li	70d5953f82	Revert "[Bugfix] Fix GPT-OSS AR+NORM fusion (#28841 )" (#29483 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-11-26 22:27:26 +08:00
yxt	3650a74ed8	Optimize the wording of the document and unify the terminology and th… (#29491 )	2025-11-26 05:16:12 -08:00
Yejing Lai	bb706d6048	Fix TeleChatForCausalLM not register issue (#29473 ) Signed-off-by: Lai, Yejing <yejing.lai@intel.com>	2025-11-26 05:15:00 -08:00
Cyrus Leung	e30859dff3	[Bugfix] Fix handling of image embeds in models (#29480 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-26 05:00:15 -08:00
Roger Wang	452a7c9f7c	[Misc] Allow LM only loading for Pixtral (#29451 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-11-26 05:00:00 -08:00
Pleaplusone	d9d342d214	[Performance][MLA][ROCm] Remove redundant D2D copy in deepseek (#27457 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-26 12:45:28 +08:00
Xin Yang	53d7f1f601	[Kernel] Use pre-allocated output buffer for triton kernel fused_experts (#29219 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2025-11-26 10:21:00 +08:00
dependabot[bot]	c5ee430328	Bump actions/checkout from 4 to 6 (#29293 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-11-26 01:57:08 +00:00
Michael Goin	8d6a89dffd	[UX] Suppress gloo log spam (#29250 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-25 17:19:35 -08:00
George D. Torres	56531b79cc	[Misc] Add backup hash algorithm for FIPS constrained environments (#28795 ) Signed-off-by: George D. Torres <gdavtor@gmail.com> Signed-off-by: George D. Torres <41129492+geodavic@users.noreply.github.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-11-26 00:50:22 +00:00
Xieyang Xu	12866af748	dummy run corner case (#29433 )	2025-11-26 00:20:35 +00:00
Lucia Fang	d8819c88eb	fix assertion for single world use case (uni) (#29429 ) Signed-off-by: Lu Fang <fanglu@fb.com> Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>	2025-11-26 00:14:23 +00:00
Andrey Khalyavin	de75b0bb70	[BugFix] Fix initialization of draft model. (#29319 ) Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2025-11-25 18:45:58 -05:00
Michael Goin	7df0289782	Change warning logs to debug for unimplemented MXFP4 Linear/Attention (#29441 ) Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-25 22:52:31 +00:00
Zhengxu Chen	0abc79482a	[caching] Add enable_prompt_embeds and cpu_offload_gb to compile hashes. (#29435 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2025-11-25 21:46:41 +00:00
Nick Hill	4e57c6587f	[Core] Support logprobs with spec decode + async scheduling (#29223 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-25 12:55:24 -08:00
Ilya Markov	e7d776273d	[Compile] Refactor. Move PostGradPassManager out of Compilation config (#29340 ) Signed-off-by: ilmarkov <markovilya197@gmail.com>	2025-11-25 19:58:56 +00:00
Eldar Kurtić	c32a18cbe7	Attempt to fix GPU OOM in a spec-decoding test (#29419 ) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>	2025-11-25 14:23:36 -05:00
Andrew Xia	b07555d26f	[responsesAPI][2] parse ResponseFunctionToolCallOutputItem (#29383 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-11-25 10:27:26 -08:00
Harry Mellor	0353d2e162	Fix RoPE related failures in Transformers nightly tests (#29333 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-25 16:23:45 +00:00
Harry Mellor	a1f2676879	Scheduled removal of `override_pooler_config` and `disable_log_requests` (#29402 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-25 16:08:57 +00:00
Yifan Qiao	48ddb02b79	[Hybrid Allocator] Support KV cache groups with different block_size (#29143 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-11-25 10:30:57 -05:00
Michael Goin	e502098643	[Kernel] Add NVFP4 MoE CUTLASS support for SM120 (#29242 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-11-25 06:59:07 -08:00
Michael Goin	dbc3d9991a	[UX] Put CUDA attention backend selection log into one line (#29337 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-25 06:46:18 -08:00
Injae Ryou	794029f012	[Feature]: Improve GGUF loading from HuggingFace user experience like repo_id:quant_type (#29137 ) Signed-off-by: Injae Ryou <injaeryou@gmail.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-25 14:28:53 +00:00
Eldar Kurtić	0231ce836a	Revert back to torch.equal over torch.allclose from #28819 (#29086 ) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>	2025-11-25 14:23:38 +00:00
Thomas Parnell	516c3f7847	[Bugfix] Fix logic for choosing default prefix caching setting (#29393 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-11-25 14:05:10 +00:00
Harry Mellor	51fc9e017a	Scheduled removal of `CompilationConfig.use_inductor` (#29323 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-25 12:55:42 +00:00
Harry Mellor	bf0c75cd4f	Make Transformers Nightly tests soft-fail and enable all tests (#29401 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-25 12:41:15 +00:00
Roger Wang	c2c661af9b	[Bugfix] Fix overallocation in MM profiling (#29386 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-11-25 12:38:36 +00:00
Nicolò Lucchesi	798e87db5c	[Core] Generalize Encoder-Decoder `seq_lens` computation to avoid Whisper hardcoded logic (#29268 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>	2025-11-25 11:32:11 +00:00
wang.yuqi	de6889946b	[Misc] Suppress log outputs when constructing the default vllm config. (#29291 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-25 03:00:44 -08:00
wang.yuqi	7a80b01889	[CI] Resettle pooling entrypoints tests. (#29370 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-11-25 10:39:10 +00:00

1 2 3 4 5 ...

11705 Commits