xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-22 01:37:14 +08:00

Author	SHA1	Message	Date
Roger Wang	21da73343a	[Misc] Clean up flags in `vllm bench serve` (#25138 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-09-18 12:43:33 +00:00
Asaf Joseph Gardin	66072b36db	[Bugfix][Mamba] - Fix Conv State Kernel FP32 Support (#24883 ) Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>	2025-09-18 12:21:17 +00:00
Harry Mellor	3ed1ec4af2	Fix `validate-config` pre-commit check (#25157 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-18 12:06:28 +00:00
Harry Mellor	5a33ae9a3f	Fix forward reference warning in documentation (#25150 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-18 11:41:41 +00:00
William Song	c9ff9e6f0c	[Docs] add the parallel sampling usage in LLMEngine and AsyncLLM (#24222 )	2025-09-18 04:37:08 -07:00
Kay Yan	eaffe4486c	[Docs] Fix pooling-params doc references in openai_compatible_server.md (#24939 )	2025-09-18 04:36:47 -07:00
Harry Mellor	8ed039d527	Move `StructuredOutputsConfig` from `config/__init__.py` to `config/structured_outputs.py` (#25153 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-18 11:24:27 +00:00
Jee Jee Li	37970105fe	[Model] Improve Pooling Model (#25149 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-18 11:04:21 +00:00
Chauncey	cc935fdd7e	[Frontend] Support setting logprobs to -1 (#25031 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-09-18 10:34:42 +00:00
Elvir Crnčević	abdfcd4f3d	silu-v1: Fix EPS not being used during max-reduction (#25069 ) Signed-off-by: elvircrn <elvircrn@gmail.com>	2025-09-18 10:25:12 +00:00
ihb2032	4f02b77de4	Fix: Add explicit #include <omp.h> for OpenMP compatibility on certain toolchains (#24951 ) Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn> Signed-off-by: ihb2032 <1355790728@qq.com>	2025-09-18 17:43:23 +08:00
Aaron Pham	29283e8976	[Chore] Cleanup guided namespace, move to structured outputs config (#22772 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-18 09:20:27 +00:00
Punitvara	05b044e698	[Doc] Fix cross-reference warnings (#25058 ) Signed-off-by: Punit Vara <punitvara@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-18 02:05:16 -07:00
Gerard Finol	aa3f105c59	Add 'path' option to ImagePrompt data_format (#25081 ) Signed-off-by: Gerard Finol <gerard.finol@urv.cat>	2025-09-18 02:02:14 -07:00
Tao He	ef7eefe17a	[Qwen] Add fp8 checkpoint support for qwen3-next. (#25079 ) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>	2025-09-18 08:16:04 +00:00
rongfu.leng	350c94deb3	[Bugfix] when use s3 model cannot use default load_format (#24435 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-09-18 07:47:43 +00:00
Harry Mellor	f4cd80f944	Retrieve `sliding_window` from text config in Gemma3 MM (#25085 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-18 06:29:05 +00:00
Harry Mellor	349e0e3462	[Docs] Fix API Reference (#25140 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-17 23:23:29 -07:00
Lumina	81b16a2bc9	[Kernel] Better inf handling for grouped topk cu (#24886 ) Signed-off-by: lumina37 <starry.qvq@gmail.com>	2025-09-18 05:53:55 +00:00
Simon Mo	e111d5b0ae	[CLI] Use streaming in CLI chat and completion commands (#23769 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-09-17 22:30:26 -07:00
Simon Mo	a904ea78ea	[benchmark] add peak throughput metrics and plot (#23867 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-09-17 22:30:02 -07:00
Benjamin Chislett	b7433ca1a4	[Spec Decode] Efficient padded speculation (#24539 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-09-18 01:07:24 -04:00
Woosuk Kwon	5c65a72bb1	[V0 Deprecation] Remove more V0 tests (#25117 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-17 22:05:25 -07:00
YiwenC	9d8a2d86d2	[EPLB] Add EPLB support for hunyuan_v1 (#23078 )	2025-09-18 04:51:35 +00:00
Chaojun Zhang	3bc18127ff	[XPU] Whisper model support on XPU Platform (#25123 ) Signed-off-by: chzhang <chaojun.zhang@intel.com>	2025-09-18 04:30:10 +00:00
Andrew Sansom	bec060fd99	Mark prompt logprobs as incompatible with prompt embeds at API level (#25077 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai>	2025-09-17 21:25:07 -07:00
YiwenC	52bc9d5b3e	[Model] enable data parallel for InternVL vision encoder (#23909 ) Signed-off-by: Yiwen Chen <yiwen66@berkeley.edu> Signed-off-by: YiwenC <54658925+666even666@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-09-17 21:11:46 -07:00
bnellnm	dc2979c585	[Kernels] Overlap shared experts with combine instead of dispatch (#24254 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-09-18 12:10:21 +08:00
toncao	027d37df38	[Bugfix][Qwen3-Next] add prefixes to shared_expert in qwen3-next and mlp in qwen2moe to successfully load ignored params in quantized models (#24960 ) Signed-off-by: toncao <cpatonn@gmail.com> Co-authored-by: toncao <cpatonn@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-18 12:08:50 +08:00
Lukas Geiger	b98219670f	[Core][MM] Cleanup `MultiModalCache` (#25006 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-09-17 21:08:41 -07:00
Harry Mellor	32baf1d036	[Docs] Clean up the contributing README (#25099 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-17 21:05:18 -07:00
Roger Wang	3127274d02	[MM Encoder] Apply DP ViT for Qwen3-VL model series (#24955 ) Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Huang Jie <92386084+JJJYmmm@users.noreply.github.com> Co-authored-by: 松灵 <26085463+wulipc@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-17 21:04:21 -07:00
bnellnm	4ac510f484	[Kernels] Enable DeepGEMM by default (#24462 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-09-17 20:19:52 -07:00
Woosuk Kwon	7fb2a5be28	[V0 Deprecation] Skip PP test (#25128 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-17 20:18:36 -07:00
Woosuk Kwon	6c036615dc	[V0 Deprecation] Remove misc V0 tests (#25118 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-17 19:41:55 -07:00
Woosuk Kwon	2fc24e94f9	[V0 Deprecation] Remove V0 Tracing & Metrics tests (#25115 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-17 19:40:44 -07:00
Woosuk Kwon	2c3c1bd07a	[V0 Deprecation] Remove V0 Engine tests (#25114 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-17 19:38:09 -07:00
bnellnm	5963b98b46	[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses (#22537 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-09-17 17:43:31 -06:00
elvischenv	e6585ddb45	[Bugfix] Fix accuracy issue for silu_mul + nvfp4 quant fusion kernel (#24833 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-17 16:37:23 -07:00
Karan Goel	2a4d6412e6	Add a batched auto tune script (#25076 ) Signed-off-by: Karan Goel <karangoel@google.com> Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-17 22:41:18 +00:00
elvischenv	e67a79db03	[Bugfix] Refactor Flashinfer TRTLLM attention kernel selection logic (#24600 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-17 15:36:29 -07:00
Michael Goin	9f882d8791	Disable failing GPT-OSS Eval (Blackwell) for now (#25107 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-17 15:36:00 -07:00
Douglas Lehr	1a456c7c90	Aiter mha fp8 fix (#24991 ) Signed-off-by: Doug Lehr <douglehr@amd.com> Co-authored-by: Doug Lehr <douglehr@amd.com>	2025-09-17 22:29:14 +00:00
Alexander Matveev	fedb75fa27	[Bugfix][B200] Fix `cutlass_mla` hang (#24966 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-17 18:06:38 -04:00
Andrew Xia	bff2e5f1d6	[gpt-oss][2] fix types for streaming (#24556 ) Signed-off-by: Andrew Xia <axia@meta.com>	2025-09-17 22:04:28 +00:00
czhu-cohere	3c068c637b	[Kernel] Faster pre-processing time for W4A8 (#23972 ) Signed-off-by: czhu-cohere <conway.zhu@cohere.com>	2025-09-17 14:35:32 -07:00
ahao-anyscale	f20c3b0951	[BUG] Exclude .pth files when pulling remote files (#25092 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2025-09-17 20:42:09 +00:00
Mohammad Miadh Angkad	883131544f	[Bugfix] Update import path for bc_linter_include (#24766 ) Signed-off-by: Mohammad Miadh Angkad <mangkad.bsdsba2027@aim.edu>	2025-09-17 20:33:11 +00:00
Yihua Cheng	ee5fd49150	[Misc] Update owners for KV connector and V1 offloading (#25041 ) Signed-off-by: ApostaC <yihua98@uchicago.edu>	2025-09-17 12:37:29 -07:00
afeldman-nm	7ae9887542	[V1] Logits processor docs (#22919 ) Signed-off-by: Andrew Feldman <afeldman@redhat.com> Signed-off-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com> Co-authored-by: Joseph Marinier <Joseph.Marinier@gmail.com>	2025-09-17 11:53:12 -07:00

1 2 3 4 5 ...

9612 Commits