xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-01 22:27:22 +08:00

Author	SHA1	Message	Date
Nick Hill	4aed506b65	[Core] Streamline some structured output related code (#26737 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-14 23:27:44 +00:00
Boyuan Feng	a86b4c58e8	remove attn output view kernel (#26680 ) Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-14 22:53:10 +00:00
Nick Hill	ff4810ba73	[Minor] Group async_scheduling related fields in model runner init (#26736 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-14 14:46:37 -07:00
Nan Qin	9d6964926e	fix: response_format for completion (#23212 ) Signed-off-by: Nan2018 <qinnanjoshua@gmail.com>	2025-10-14 21:23:22 +00:00
Dhruvil Bhatt	0e65818910	Added MoE configs for llama 4, H200 device with tp=4/8 tuning (#26837 ) Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>	2025-10-14 14:21:03 -07:00
Jialin Ouyang	380f17527c	[Perf] Cache vllm.env.__getattr__ result to avoid recomputation (#26146 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-10-14 17:03:21 -04:00
HDCharles	b92ab3deda	Notice for deprecation of AutoAWQ (#26820 ) Signed-off-by: HDCharles <39544797+HDCharles@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-14 13:39:59 -07:00
Jialin Ouyang	acaa2c0a4a	[Core] Reuse empty block lists whenever possible in KVCacheBlocks to mitigate GC costs (#24964 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-10-14 12:58:43 -07:00
Matthew Bonanni	82af928c41	[Attention][Spec Decode] FlashMLA spec decode support (#26541 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-14 19:38:20 +00:00
Huamin Li	87efc681db	llama4_vision_rope: add HIP override to accept (q, k) and avoid (positions, q, k) mismatch (#26790 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-10-14 11:54:12 -07:00
Michael Goin	c3a722fcb2	[CI Failure] Fix tests with missing TinyLlama-1.1B-Chat-v1.0-FP8-e2e (#26816 ) Signed-off-by: mgoin <mgoin64@gmail.com> v0.11.1rc1	2025-10-14 18:38:59 +00:00
Ze'ev Klapow	aba48f7db1	[Kernel][MoE] Add MoE tunings for GLM 4.6-FP8 and GLM 4.5 Air on NVidia B200 (#26818 )	2025-10-14 11:20:39 -07:00
Michael Goin	04b5f9802d	[CI] Raise VLLM_MAX_SIZE_MB to 500 due to failing Build wheel - CUDA 12.9 (#26722 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-14 10:52:05 -07:00
Reza Barazesh	efc8f7d814	Update coveragerc and add codecov.yml for path fixes (#26435 ) Signed-off-by: Reza Barazesh <rezabarazesh@meta.com>	2025-10-14 09:45:06 -07:00
Wentao Ye	6d87a2838c	[Config] Remove Unused Environment Variable `VLLM_DISABLE_PAD_FOR_CUDAGRAPH` (#26743 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-14 11:47:49 -04:00
wang.yuqi	e6cdbd6792	Revert "[issues template] Encourage the author implement their own ideas" (#26814 )	2025-10-14 08:37:34 -07:00
Chauncey	df850c4912	[Feature][Responses API] Stream Function Call - harmony (#24317 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-10-14 08:31:43 -07:00
Qier Li	720394de43	[KVConnector][Metrics] Aggregate scheduler-side KVConnectorStats (#26046 ) Signed-off-by: Qier Li <kevin44036@gmail.com>	2025-10-14 14:38:07 +00:00
wang.yuqi	88a49745af	[issues template] Encourage the author implement their own ideas (#26671 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-10-14 22:32:36 +08:00
Boyuan Feng	ca683a2a72	use combo kernel to fuse qk-norm and qk-rope (#26682 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-10-14 09:40:59 -04:00
汪志鹏	e9f1b8c9e9	Adjusted the model order of the model registration file (#26798 ) Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>	2025-10-14 13:26:11 +00:00
Jaya Yuan	ea97940d6c	[DCP] Support Decode Context Parallel (DCP) for GQA with FlashAttention (#24864 ) Signed-off-by: yuanyongjie.yyj <yuanyongjie.yyj@antgroup.com> Signed-off-by: FENP <32334296+FENP@users.noreply.github.com> Signed-off-by: Jaya Yuan <yuanyongjie.yyj@antgroup.com>	2025-10-14 13:07:50 +00:00
Jee Jee Li	fdd32750f0	[CI/Build] Cleanup LoRA test (#26752 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-10-14 12:06:35 +00:00
Vladislav Bronzov	c715ba3735	[Feature] Change vllm.py with pydantic validation (#26726 ) Signed-off-by: Vladislav <vladislav.bronzov@gmail.com> Signed-off-by: Vladislav Bronzov <58587565+VladOS95-cyber@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-14 12:00:54 +00:00
Cyrus Leung	9c4cb68339	[Chore] Remove `SupportsV0Only` interface and update supported models docs (#26783 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-14 04:55:10 -07:00
Chauncey	780eb03d9b	[CI] Fix test_tool_id_kimi_k2 (#26787 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-10-14 10:27:07 +00:00
Cyrus Leung	ef9676a1f1	[Doc] ruff format some Python examples (#26767 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-14 03:21:53 -07:00
Harry Mellor	70b1b330e1	Don't allow `typos` to fix by default (#26785 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-14 03:05:15 -07:00
Cyrus Leung	d1d063a588	[Chore] Use `max_transformers_version` for Qwen-VL test (#26792 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-14 03:03:46 -07:00
Chendi.Xue	7e6edb1469	[NIXL][HeteroTP] Enable KV transfer from HND prefill to NHD decode (#26556 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2025-10-14 09:46:05 +00:00
Cyrus Leung	74704d4553	[Model] Use merge_by_field_config for MM models (O-P) (#26776 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-14 09:42:45 +00:00
Cyrus Leung	d2f816d6ff	[Bugfix] Standardize merging multimodal embeddings (#26771 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-14 09:36:21 +00:00
wangxiyuan	577d498212	[Plugin] Make plugin group clear (#26757 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-10-14 07:49:59 +00:00
Max Wittig	fd85c9f426	[Bugfix][FE]: Always include usage with `--enable-force-include-usage` (#20983 ) Signed-off-by: Max Wittig <max.wittig@siemens.com> Signed-off-by: Antoine Auger <antoineauger@users.noreply.github.com> Co-authored-by: Antoine Auger <antoineauger@users.noreply.github.com>	2025-10-14 09:17:39 +02:00
Ye (Charlotte) Qi	d32c611f45	[CI/Build] Use 127.0.0.1 instead of localhost in utils (#26750 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-10-14 07:04:00 +00:00
CSWYF3634076	01ad27faff	[Model][Bugfix]fix ernie45 load failed due to ernie45 eplb code (#26684 ) Signed-off-by: wangyafeng <wangyafeng@baidu.com>	2025-10-14 06:55:23 +00:00
Ryan Li	481545b397	scheduler.py: Update the name of the default scheduler. (#26758 ) Signed-off-by: Ryan Li <ryanli@ryanli.org>	2025-10-14 06:52:21 +00:00
Alexei-V-Ivanov-AMD	d3cc8427c0	[ci] Adding the test-amd.yaml for test definitions for the AMD backend. (alternative PR) (#26718 ) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>	2025-10-13 23:10:23 -07:00
vllmellm	4821ac1b4d	[CI] [ROCm] Automate CC list for ROCm related issue (#26753 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-10-14 13:57:26 +08:00
XiongfeiWei	4497c8f821	Fix lora tests failure in TPU CI due to the removal of LoRA bias (#26723 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-10-14 13:04:23 +08:00
Michael Yao	2e36cdbe2b	[Docs] Add a start tag to build.inc.md (#26747 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-10-13 21:51:55 -07:00
Maximilien de Bayser	fe3edb4cf0	Add support for the /rerank endpoint in vllm bench serve (#26602 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-10-14 04:25:43 +00:00
Heng Guo	29350922c6	[Feature][Quantization] auto_round format add support for regex (#24024 ) Signed-off-by: n1ck-guo <heng.guo@intel.com> Signed-off-by: Heng Guo <heng.guo@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-14 03:03:16 +00:00
Varun Sundar Rabindranath	8ae169286f	[torch.compile] Unwrap fused_marlin_moe custom op (#26739 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-14 02:22:16 +00:00
youkaichao	8a0af6a561	[build][torch.compile] upgrade depyf version (#26702 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-10-14 10:12:09 +08:00
Jialin Ouyang	cfded80793	[Easy] Fix env type check errors from VLLM_DEBUG_LOG_API_SERVER_RESPONSE (#26742 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-10-14 01:46:44 +00:00
Angela Yi	b59dd19b55	[compile] Enable sequence parallelism for full cuda graph without specifying compile sizes (#26681 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2025-10-13 18:15:34 -07:00
Michael Goin	3e051bda82	[UX] Replace VLLM_ALL2ALL_BACKEND with --all2all-backend (#26732 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-13 18:12:52 -07:00
Lucia Fang	8317f72354	[Misc][DP] support customized aggregated logger for dp (#24354 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-10-13 17:45:59 -07:00
Maximilien de Bayser	d8bebb008a	Add tests for chunked prefill and prefix cache with causal pooling models (#26526 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Ayush Singh <ayush1009208@gmail.com>	2025-10-14 07:45:04 +08:00

1 2 3 4 5 ...

10442 Commits