xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-02 01:07:11 +08:00

Author	SHA1	Message	Date
bnellnm	a462331e36	[Bugfix] Disable moe inplace for torch >= 2.9 (#26497 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-10-09 18:07:38 +00:00
roikoren755	4069db3f2e	[Bugfix] Enable padded FP4 quantization (#25947 ) Signed-off-by: Roi Koren <roik@nvidia.com>	2025-10-09 10:59:41 -07:00
Sage Moore	0d37450eb7	[BUGFIX] Add cu_tokens_across_sp to DPMetadata (#26457 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-10-09 17:13:56 +00:00
bnellnm	47e66c24e2	[Model] Apply shared experts overlap optimization to all models with shared experts (#26145 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-10-09 11:31:04 -04:00
Ming Yang	3b736e1c38	[Attention][DCP] Support DCP with query length > 1 (MTP) with FA3 (#25049 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-10-09 08:06:29 -07:00
Lukas Geiger	2c1c7dfb35	[Models][Qwen] Replace `pad` with `cat` for better performance (#26486 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-10-09 14:51:26 +00:00
Harry Mellor	e246ad6f0c	Upgrade Pydantic to v2.12.0 and remove hack for Python 3.13 (#26481 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-09 06:02:40 -07:00
Jiangyun Zhu	5728da11ea	Revert #26113 "[Frontend] CompilationConfig overhaul (#20283 ): deprecate use_inductor in favor of backend, simplify custom_ops" (#26472 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-10-09 05:43:55 -07:00
Simon Danielsson	92be3f3517	[Feature] Use pydantic validation in parallel.py config (#26417 ) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-09 12:41:31 +00:00
Isotr0py	d1ddf340c8	[V0 deprecation] Remove `QKVCrossParallelLinear` implementation (#26475 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-09 10:52:27 +00:00
Wenzheng Bi	ec10fd0abc	[Bugfix] Move current_platform import to avoid python import cache. (#16601 ) Signed-off-by: iwzbi <wzbi@zju.edu.cn>	2025-10-09 10:46:19 +00:00
Lukas Geiger	0426e3c5e1	[Models][Qwen3VL] Optimise `_validate_and_reshape_mm_tensor` (#26426 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-10-09 10:25:48 +00:00
Cyrus Leung	4bdf7ac593	[Bugfix] Fix SHM cache initialization (#26427 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-09 02:48:04 -07:00
Cyrus Leung	dc7976dd9f	[Misc] Upgrade more code to Python 3.10 (#26463 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-09 10:43:53 +01:00
Simon Danielsson	e4791438ed	[Feature] Use pydantic validation in lora.py and load.py configs (#26413 ) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>	2025-10-09 02:38:33 -07:00
youkaichao	e6e898f95d	[doc] add Volcengine as a compute sponsor (#26477 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-10-09 17:11:47 +08:00
Nick Hill	ddcbc2f334	[Misc] Misc code simplifications (#26450 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-09 02:10:06 -07:00
Jerry Zhang	a83ff278d6	[torchao] Add support for ModuleFqnToConfig using regex (#26001 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-10-09 08:32:32 +00:00
Rahul Tuli	cf4cd6c24f	Add: Support for multiple hidden layers in Eagle3 (#26164 ) Signed-off-by: Rahul Tuli <rtuli@redhat.com>	2025-10-09 07:30:50 +00:00
Harry Mellor	b960441812	Enable `RMSNorm` substitution for Transformers backend (#26353 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-09 07:28:51 +00:00
Luciano Martins	1317028aa8	[Model] Gemma3: Fix GGUF loading and quantization (#26189 ) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-09 07:00:53 +00:00
elvischenv	5e49c3e777	Bump Flashinfer to v0.4.0 (#26326 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-10-08 23:58:44 -07:00
pwschuurman	0d7c3cb51d	Update Dockerfile and install runai-model-streamer[gcs] package (#26464 ) Signed-off-by: Peter Schuurman <psch@google.com>	2025-10-08 23:48:51 -07:00
Jee Jee Li	1b2c440cd6	[Core] Relax the LoRA max rank (#26461 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-10-08 23:47:14 -07:00
Cyrus Leung	0f29dca988	[CI/Build] Fix model nightly tests (#26466 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-08 23:44:16 -07:00
Zhiyuan Li	d24cf322e1	[Hybrid]: Decouple Kernel Block Size from KV Page Size (#24486 ) Signed-off-by: lizhiyuan <uniartisan2017@gmail.com> Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com>	2025-10-08 23:43:39 -07:00
Qier Li	d17f0fbf30	[Core][KVConnector] Propagate all tokens on resumed preemptions (#24926 ) Signed-off-by: Qier Li <kevin44036@gmail.com> Co-authored-by: Qier Li <qier@fb.com>	2025-10-09 14:43:31 +08:00
Wenlong Wang	43ab8cfaa5	[MM][Doc] Add documentation for configurable mm profiling (#26200 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-10-08 23:21:20 -07:00
Matt	de253d63b7	[Hardware][AMD] Enable FlexAttention backend on ROCm (#26439 ) Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>	2025-10-09 06:20:18 +00:00
Huy Do	8bd696fa53	[Bugfix] Incorrect another MM data format in vllm bench throughput (#26462 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-10-09 05:58:46 +00:00
Nick Hill	bb6d8c21f9	[Bugfix] Catch and log invalid token ids in detokenizer #2 (#26445 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-08 21:20:25 -07:00
Zhuohan Li	ebf6ef1a9b	[Minor] Change warning->warning_once in preprocess (#26455 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-10-08 21:09:06 -07:00
Jee Jee Li	0c52d6ef81	[Bugfix] Set the minimum python version for gpt-oss (#26392 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-10-08 20:35:49 -07:00
Rui Qiao	467a4f98f1	[Misc] Redact ray runtime env before logging (#26302 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-10-08 17:43:34 -07:00
Naveenraj Kamalakannan	e614ab7806	Separate MLAAttention class from Attention (#25103 ) Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-08 17:11:11 -07:00
Matthew Bonanni	2a03f93de9	[Attention] Register FLASHMLA_SPARSE (#26441 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-08 22:28:52 +00:00
bnellnm	da364615fc	[Kernels] Modular kernel refactor (#24812 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-10-08 17:51:52 -04:00
Elaine Zhao	f08919b7d1	[Bugfix] Respect min_tokens in scheduler stop check (#26317 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com>	2025-10-08 14:08:24 -07:00
Lukas Geiger	93f2c0aa08	[Models] Improve iteration over layers (#26425 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-10-08 20:48:33 +00:00
Nicolò Lucchesi	4ebc9108a7	[Kernel] Centralize platform kernel import in `current_platform.import_kernels` (#26286 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-10-08 20:25:31 +00:00
Morrison Turnansky	e1ba235668	[BugFix] Fix failing test quantization/test_compressed_tensors.py::test_compressed_tensors_fp8_block_enabled (#26436 ) Signed-off-by: morrison-turnansky <mturnans@redhat.com>	2025-10-08 20:04:12 +00:00
elvischenv	b82f4307c9	[Bugfix][Flashinfer] fix VLLM_USE_TRTLLM_ATTENTION issue for models with diff hyperparameters (#25924 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-10-08 19:54:48 +00:00
Matthew Bonanni	76879cc160	[Attention] Implement universal BACKEND_MAP (#25900 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-08 12:00:25 -07:00
Vinay R Damodaran	b25d7b5657	[Feature] Change cache.py with pydantic validation (#26390 ) Signed-off-by: Vinay Damodaran <vrdn@hey.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-08 11:12:59 -07:00
Harry Mellor	e09d1753ec	Remove Python 3.9 support ahead of PyTorch 2.9 in v0.11.1 (#26416 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-08 10:40:42 -07:00
Wentao Ye	4ba8875749	[Bug] Fix Test in Batch Invariant (#26128 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-08 10:13:47 -07:00
Lukas Geiger	6273fe8d3d	[Benchmarks] Fix imports in FP8 tuning script (#26407 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-10-08 16:31:59 +00:00
Wentao Ye	9fb3ae4e6f	[Bug] Fix DeepGEMM Attention Test (#26423 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-08 12:23:41 -04:00
Aydin Abiar	76afe4edf8	[Bugfix] Fix `vllm bench ...` on CPU-only head nodes (#25283 ) Signed-off-by: Aydin Abiar <aydin@anyscale.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Aydin Abiar <aydin@anyscale.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-10-08 16:06:42 +00:00
Michael Goin	c1b06fc182	[CI Failure] Fix pre-commit issue for install_nixl_from_source_ubuntu.py (#26424 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-08 07:55:43 -07:00

1 2 3 4 5 ...

10293 Commits