xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-29 09:53:30 +08:00

Author	SHA1	Message	Date
youkaichao	f510715882	[build] add torch to tool.uv no-build-isolation-package (#24303 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-11 13:19:44 +00:00
Tao He	f946197473	[Docs] Fixes a typo in the qwen3next model name. (#24654 ) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>	2025-09-11 19:35:14 +08:00
Fanli Lin	0cd72a7b72	[XPU] add missing dependency tblib for XPU CI (#24639 ) Signed-off-by: Fanli Lin <fanli.lin@intel.com>	2025-09-11 11:22:33 +00:00
Harry Mellor	5f5271f1ee	Move `LoRAConfig` from `config/__init__.py` to `config/lora.py` (#24644 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-11 11:01:38 +00:00
Harry Mellor	d6249d0699	Fix typing for `safetensors_load_strategy` (#24641 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-11 10:41:39 +00:00
wang.yuqi	25bb9e8c65	[CI Failure] fix models/language/pooling/test_auto_prefix_cache_support.py (#24636 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-09-11 03:31:23 -07:00
Nicolò Lucchesi	a1213fae5f	[Misc] Add @NickLucche to codeowners (#24647 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-09-11 17:18:09 +08:00
wang.yuqi	a8b0361c92	[CI] Split pooling from entrypoints Test (#24632 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-09-11 01:53:09 -07:00
Kyuyeun Kim	ed5ae4aace	[Bugfix] Fix _synced_weight_loader (#24565 ) Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>	2025-09-11 16:52:33 +08:00
Xingyu Liu	0fc36463e0	[CI]Add transformers_utils to Async Engine, Inputs, Utils, Worker Test (#24615 ) Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>	2025-09-11 01:52:10 -07:00
Michael Yao	d14c4ebf08	[Docs] Use 1-2-3 list for deploy steps in deployment/frameworks/ (#24633 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-09-11 01:50:12 -07:00
Russell Bryant	ba6011027d	[Docs] Update V1 doc to reflect whisper support (#24606 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-09-11 01:50:08 -07:00
Michael Yao	85df8afdae	[Docs] Revise frameworks/anything-llm.md (#24489 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-09-11 01:50:05 -07:00
Cyrus Leung	6aeb1dab4a	[Bugfix] Fix incorrect import of CacheConfig (#24631 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-11 01:48:25 -07:00
Tao He	e93f4cc9e3	Add the support for the qwen3 next model (a hybrid attention model). (#24526 ) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-11 15:32:09 +08:00
Jerry Zhang	2048c4e379	[torchao] Support quantization configs using module swap (#21982 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-09-10 23:53:24 -07:00
Chenxi Yang	d13360183a	Remove redundant all gather + split (#23441 ) Co-authored-by: Chenxi Yang <cxyang@meta.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2025-09-10 23:45:07 -07:00
TaehyunKim	9bd831f501	[Model] New model support for Motif-1-Tiny (#23414 ) Signed-off-by: ca1207 <ca1207zzz@gmail.com> Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com> Co-authored-by: WyldeCat <skan1543@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-10 23:29:40 -07:00
Didier Durand	e2b1f863aa	[Doc]: fixing doc typos (#24635 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-10 23:19:28 -07:00
shengshiqi-google	41329a0ff9	[Core] feat: Add --safetensors-load-strategy flag for faster safetensors loading from Lustre (#24469 ) Signed-off-by: Shiqi Sheng <shengshiqi@google.com> Signed-off-by: shengshiqi-google <160179165+shengshiqi-google@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-10 23:10:01 -07:00
Tomas Ruiz	ee0bc5e1b4	Enable --profile in 'vllm bench throughput' (#24575 ) Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>	2025-09-10 23:06:19 -07:00
Saman A. Pour	3d1393f6fc	Kimi K2 Fused MoE kernels Optimization configs (#24597 ) Signed-off-by: Saman Keon <samanamp@outlook.com>	2025-09-10 23:06:16 -07:00
Guy Stone	8a894084d2	[Engine][Chore] use local variable and remove output var assignment (#24554 ) Signed-off-by: Guy Stone <guys@spotify.com>	2025-09-10 23:05:42 -07:00
Nick Hill	e2d8c27f68	[BugFix] Fix pipeline parallel (#24621 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-10 23:05:30 -07:00
Li, Jiang	29799ddacc	[Bugfix] Add missing VIT backend dispatch on CPU (#24623 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-09-10 22:28:41 -07:00
Peter Salas	f17a6aa4ec	[Ultravox] Fix Gemma instantiation, support quantization via --hf-overrides (#24131 ) Signed-off-by: Peter Salas <peter@fixie.ai>	2025-09-10 22:25:34 -07:00
Wenlong Wang	6c8deacd72	[Bug] [Spec Decode] Fix model_initialization test and mismatch in aux_hidden_layers (#24613 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-09-10 21:23:18 -07:00
Chauncey	55b823ba0f	Add @chaunceyjiang to codeowner for reasoning Reasoning and Tool parser (#24406 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-09-11 04:23:04 +00:00
youkaichao	8c5a747246	[distributed] update known issues (#24624 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-09-11 11:09:38 +08:00
Alexandre Marques	5931b7e5d9	[Models][Quantization] Add quantization configuration update in Voxtral model (#24122 ) Signed-off-by: Alexandre Marques <almarque@redhat.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-10 19:13:56 -07:00
Jonathan Berkhahn	cc99baf14d	[Misc] Make timeout passable in init_distributed_environment (#24522 ) Signed-off-by: jberkhahn <jaberkha@us.ibm.com>	2025-09-10 15:41:12 -07:00
Hanjie Qiu	dcb28a332b	[Kernel] Flashinfer MLA (trtllm-gen) decode kernel integration (#21078 ) Signed-off-by: hjjq <hanjieq@nvidia.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-10 15:31:10 -07:00
Michael Goin	fba7856581	[Perf] Warmup FlashInfer attention during startup (#23439 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com>	2025-09-10 15:03:17 -07:00
Chen Zhang	b5e383cd8b	[gpt-oss] raise error for flashinfer backend without trtllm (#24482 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-09-10 14:33:13 -07:00
Gregory Shtrasberg	9a161307f5	[torch.compile][ROCm][V1] Enable attention output FP8 fusion for V1 attention backends (#19767 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-10 13:59:55 -07:00
Russell Bryant	37e8182bfe	[v1] Add Whisper model support (encoder-decoder) (#21088 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: NickLucche <nlucches@redhat.com>	2025-09-10 13:53:35 -07:00
Nick Hill	4db4426404	[CI] Fail subprocess tests with root-cause error (#23795 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-10 13:53:21 -07:00
Thien Tran	a0933c3bd6	[Bugfix] Enable FP8 KV cache for FlashInfer and Triton backend on non-sm100 GPUs (#24577 ) Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>	2025-09-10 12:33:41 -07:00
rongfu.leng	09e68bce34	[Misc] update log level debug to warning when process port is used by (#24226 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-09-10 11:32:57 -07:00
Xingyu Liu	9fb74c27a7	[Core] Support configuration parsing plugin (#24277 ) Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com> Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-10 11:32:43 -07:00
Ming Yang	4032949630	[Bugfix] Fix DeepEP config for DP4TP4 (#23619 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-09-10 10:37:56 -07:00
tomeras91	08abfa78ec	[Bugfix] fix modelopt exclude_modules name mapping (#24178 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-09-10 10:20:46 -07:00
Shiyan Deng	2bef2d1405	[Logging] allow config logging stream (#24336 ) Signed-off-by: Shiyan Deng <dsy842974287@meta.com>	2025-09-10 15:02:01 +00:00
Robin	36cacd0958	[Doc] Add documentation for GLM-4.5 series models: tool-calling and reasoning parser (#24589 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-09-10 07:50:55 -07:00
Jee Jee Li	bb3eb80d92	[Core] Split LoRA layers (#24574 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-10 07:47:51 -07:00
pwschuurman	fcc0a3130a	[CI] Fix tensorizer test assertion (#24545 ) Signed-off-by: Peter Schuurman <psch@google.com>	2025-09-10 06:57:36 -07:00
zzhxxx	736569da8d	[Platform] Custom ops support for LMhead and LogitsProcessor (#23564 ) Signed-off-by: zzhx1 <zzh_201018@outlook.com>	2025-09-10 06:26:31 -07:00
Kay Yan	2eb9986a2d	[BugFix] `python collect_env.py` and `vllm collect-env` compatibility with uv venv (#24066 ) Signed-off-by: Kay Yan <kay.yan@daocloud.io>	2025-09-10 21:25:33 +08:00
Hyogeun Oh (오효근)	ccee371e86	[Docs] Fix warnings in `mkdocs build` (continued) (#24092 ) Signed-off-by: Zerohertz <ohg3417@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-10 06:23:28 -07:00
RoadToNowhereX	c0bd6a684a	Fix Auto_Round Quatization Loading on SM75 and Lower GPUs (#24217 ) Signed-off-by: RoadToNowhereX <37441177+RoadToNowhereX@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-10 06:22:31 -07:00

1 2 3 4 5 ...

9359 Commits