xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-28 10:27:19 +08:00

Author	SHA1	Message	Date
Roberto L. Castro	96ad65b7fe	[Transform] [Quantization] Add QuTLASS support to vLLM (#24440 ) Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Signed-off-by: Andrei Panferov <andrei@panferov.org> Co-authored-by: Andrei Panferov <andrei@panferov.org> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-10-10 09:43:40 -07:00
Elvir Crnčević	7b03584de8	Silu v2 (#25074 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: elvircrn <elvircrn@gmail.com> Signed-off-by: Elvir Crnčević <elvircrn@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>	2025-10-10 15:19:53 +00:00
Harry Mellor	e09d1753ec	Remove Python 3.9 support ahead of PyTorch 2.9 in v0.11.1 (#26416 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-08 10:40:42 -07:00
Lukas Geiger	6273fe8d3d	[Benchmarks] Fix imports in FP8 tuning script (#26407 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-10-08 16:31:59 +00:00
Lukas Geiger	338b1bf04f	[Benchmarks] Add support for Qwen 3 VL MoE tuning (#26419 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-10-08 14:01:08 +00:00
Cyrus Leung	7e4cd070b0	[V0 Deprecation] Remove `VLLM_USE_V1` from docs and scripts (#26336 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 16:46:44 +08:00
Karan Goel	824a3f403f	[Misc] auto_tune: kill specific vllm process (#26304 ) Signed-off-by: Karan Goel <karangoel@google.com>	2025-10-06 18:02:51 +00:00
Harry Mellor	6c04638214	Fix per file ruff ignores related to line length (#26262 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-06 05:12:40 +00:00
Harry Mellor	557b2e961d	Remove all cases of `fmt: on/off` (#26253 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 09:18:14 -07:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Jiangyun Zhu	eb0fa43868	[Perf] Optimize `reshape_and_cache` CUDA Kernel (#25955 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Co-authored-by: Liu-congo <1502632128@qq.com>	2025-10-03 01:33:46 -07:00
ElizaWszola	502640c3f9	[Perf] Fix and reapply move apply w8a8 block fp8 linear to class (#25696 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: ElizaWszola <elizaw.9289@gmail.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com>	2025-10-02 19:35:13 +00:00
Cyrus Leung	d00d652998	[CI/Build] Replace `vllm.entrypoints.openai.api_server` entrypoint with `vllm serve` command (#25967 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-02 10:04:57 -07:00
Jee Jee Li	67f3fb0844	[Bench] Add DeepSeekV32 to MoE benchmark (#25962 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-30 14:13:48 -07:00
Ekagra Ranjan	e71b8e210d	[Spec Decode] Add Batch Parallel Ngram. Upto 8x lower overhead. (#24986 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-09-25 15:22:03 -07:00
Cyrus Leung	2f17117606	[mypy] Fix wrong type annotations related to tuple (#25660 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-25 13:00:45 +00:00
Tyler Michael Smith	1260180c67	Revert "[Performance] Move apply_w8a8_block_fp8_linear to an op class… (#25607 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2025-09-25 08:05:21 +00:00
Saman A. Pour	90b139cfff	Enable Fbgemm NVFP4 on Dense models (#25609 ) Signed-off-by: Saman Keon <samanamp@outlook.com>	2025-09-24 21:12:53 -07:00
Wentao Ye	1f29141258	[Refactor] Use DeepGEMM Col Major TMA Aligned Tensor (#25517 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-24 18:52:36 -04:00
Michael Goin	d83f3f7cb3	Fixes and updates to bench_per_token_quant_fp8 (#25591 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-09-24 08:30:15 -07:00
Russell Bryant	164299500b	[Benchmark] Fix regression in structured output benchmark (#25500 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-09-24 10:40:42 +00:00
Chenxi Yang	0d235b874a	Add CUTLASS FP8 MOE benchmark scripts and kernel config (#25302 ) Signed-off-by: Chenxi Yang <cxyang@fb.com> Co-authored-by: Chenxi Yang <cxyang@fb.com>	2025-09-23 18:07:42 -06:00
ElizaWszola	63400259d0	[Performance] Move apply_w8a8_block_fp8_linear to an op class (#24666 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: ElizaWszola <elizaw.9289@gmail.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com>	2025-09-23 12:03:10 -07:00
Amir Samani	8c1c81a3de	[core] add nccl symmetric memory for all reduce (#24532 ) Signed-off-by: Amir Samani <asamani@nvidia.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-23 14:33:06 -04:00
Weida Hong	24e8222745	[Misc] Reduce initialization time of auto_tune (#23682 ) Signed-off-by: Weida Hong <wdhongtw@google.com>	2025-09-23 17:34:58 +00:00
Burkhard Ringlein	100b630a60	[V1][Kernel] Add triton implementation for `reshape_and_cache_flash` (#24503 ) Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com> Co-authored-by: Chih-Chieh Yang <chih.chieh.yang@ibm.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-23 12:52:40 -04:00
Cyrus Leung	6c117cff7d	[Frontend] Pass API server count to each process (#23717 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-20 01:15:19 +08:00
Aaron Pham	29283e8976	[Chore] Cleanup guided namespace, move to structured outputs config (#22772 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-18 09:20:27 +00:00
bnellnm	5963b98b46	[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses (#22537 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-09-17 17:43:31 -06:00
Karan Goel	2a4d6412e6	Add a batched auto tune script (#25076 ) Signed-off-by: Karan Goel <karangoel@google.com> Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-17 22:41:18 +00:00
dolpm	1b962e2457	[fix] lora benchmarks pass no_lora_flag_cpu (#23774 ) Signed-off-by: Dylan Maloy <34420038+dolpm@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-17 21:22:25 +08:00
Daniel Serebrenik	43a62c51be	Add more documentation and improve usability of lognormal dist (benchmark_serving_multi_turn) (#23255 ) Signed-off-by: daniels <daniels@pliops.com>	2025-09-17 05:53:17 +00:00
Isotr0py	5a411ef6c4	[Benchmarks] Add MMVU video dataset support and clean up deprecated datasets (#24719 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-17 03:29:43 +00:00
Tahsin Tunan	cef32104b4	[FP8] Extend per-token-group quantization support to QuantFP8 (#24342 ) Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com>	2025-09-16 18:31:06 -07:00
Ye (Charlotte) Qi	85e0df1392	[Docs] move benchmarks README to contributing guides (#24820 )	2025-09-16 05:52:57 -07:00
Jee Jee Li	04ad0dc275	[benchmark] Add triton version in the moe tuned config (#24769 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-16 14:10:54 +08:00
Elvir Crnčević	98229db244	[Kernels][DP/EP] Optimize Silu Kernel for R1 (#24054 ) Signed-off-by: elvircrn <elvircrn@gmail.com>	2025-09-13 00:17:27 -07:00
Didier Durand	bcb06d7baf	[Doc]: fix typos in various files (#24726 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-12 06:43:12 -07:00
Michael Goin	c3aea10dc8	[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-11 15:43:14 -07:00
Ilya Markov	1fdd5c42d7	[Kernels] Enable Torch Symmetric Memory All-Reduce By Default (#24111 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-11 09:45:31 -07:00
Jee Jee Li	d11ec124a0	[Bench] Add qwen-next in benchmark_moe.py (#24661 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-11 21:29:43 +08:00
TaehyunKim	9bd831f501	[Model] New model support for Motif-1-Tiny (#23414 ) Signed-off-by: ca1207 <ca1207zzz@gmail.com> Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com> Co-authored-by: WyldeCat <skan1543@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-10 23:29:40 -07:00
Ekagra Ranjan	0dc9cbb527	[Benchmark] Update bench doc with mtbench, blazedit, spec bench (#24450 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>	2025-09-09 21:15:41 +00:00
Ye (Charlotte) Qi	6fb2788163	[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-09-09 10:02:35 +00:00
elvischenv	bba1042c6f	[Flashinfer] Support Flashinfer TRTLLM FP8-qkv BF16/FP16-out Attention Kernel (#23647 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-09-08 20:53:07 -07:00
Jee Jee Li	62f66be1f7	[Bugfix] Fix Qwen3-coder moe tuned config (#24072 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-07 05:19:46 +00:00
Jiangyun Zhu	77aec83b8c	[Benchmark] add benchmark for custom activation op (#23908 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-06 20:12:05 -07:00
Didier Durand	83609ca91d	[Doc]: fix typos in Python comments (#24173 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-04 08:52:17 -07:00
anthonsu	04f3c35cff	Improve flexibility of auto_tune.sh execution. (#23766 ) Signed-off-by: Anthony Su <50185138+anthonsu@users.noreply.github.com> Signed-off-by: anthonsu <50185138+anthonsu@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-04 09:41:41 +00:00
Weida Hong	12e1e63cc5	[Misc] Enhance output readability of helper script (#24214 ) Signed-off-by: Weida Hong <wdhongtw@google.com>	2025-09-04 06:38:26 +00:00

1 2 3 4 5 ...

482 Commits