xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-16 12:55:02 +08:00

Author	SHA1	Message	Date
Cyrus Leung	61dcc280fa	[Doc] Add Voxtral to Supported Models page (#22059 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-31 23:10:56 -07:00
Kyle Sayers	0f46a780d4	[Model] [Quantization] Support quantization for Gemma3n (#21974 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-07-31 22:45:15 -07:00
Mickaël Seznec	e1a7fe4af5	[BugFix] fix: aot passes kvcache dtype information (#19750 ) Signed-off-by: Mickael Seznec <mickael@mistral.ai>	2025-08-01 05:45:02 +00:00
Cyrus Leung	82de9b9d46	[Misc] Automatically resolve HF processor init kwargs (#22005 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-31 22:44:10 -07:00
Charent	ad57f23f6a	[Bugfix] Fix: Fix multi loras with tp >=2 and LRU cache (#20873 ) Signed-off-by: charent <19562666+charent@users.noreply.github.com>	2025-07-31 19:48:13 -07:00
Wentao Ye	3700642013	[Refactor] Remove Duplicate `per_block_cast_to_fp8`, Remove Dependencies of DeepGEMM (#21787 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-01 01:13:27 +00:00
Michael Goin	0bd409cf01	Move flashinfer-python to optional extra `vllm[flashinfer]` (#21959 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-31 18:02:11 -07:00
Matthew Bonanni	e360316ab9	Add DeepGEMM to Dockerfile in vllm-base image (#21533 ) Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-31 18:01:55 -07:00
Wentao Ye	c3e0e9337e	[Feature] Add Flashinfer MoE Support for Compressed Tensor NVFP4 (#21639 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-31 15:26:11 -07:00
Ilya Markov	6e672daf62	Add FlashInfer allreduce RMSNorm Quant fusion (#21069 ) Signed-off-by: ilmarkov <imarkov@redhat.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: ilmarkov <imarkov@redhat.com>	2025-07-31 13:58:38 -07:00
Benjamin Chislett	2dff2e21d9	[Bugfix] Fix MTP weight loading (#21941 )	2025-07-31 16:33:53 -04:00
Yong Hoon Shin	71470bc4af	[Misc] Add unit tests for chunked local attention (#21692 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-07-31 11:39:16 -07:00
zhiweiz	9e0726e5bf	[Meta] Official Eagle mm support, first enablement on llama4 (#20788 ) Signed-off-by: morgendave <morgendave@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.me>	2025-07-31 10:35:07 -07:00
XiongfeiWei	53c21e492e	Update torch_xla pin to 20250730 (#21956 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-07-31 17:26:43 +00:00
Alexei-V-Ivanov-AMD	0780bb5783	Removing amdproduction Tests (#22027 ) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>	2025-07-31 09:53:27 -07:00
Doug Smith	58bb902186	fix(setup): improve precompiled wheel setup for Docker builds (#22025 ) Signed-off-by: dougbtv <dosmith@redhat.com>	2025-07-31 09:52:48 -07:00
Zhengxu Chen	7349d5268b	[ez] Remove a trailing space from compilation/decorators.py (#22028 )	2025-07-31 09:46:07 -07:00
Song	9484641616	[Model] Add step3 vl (#21998 ) Signed-off-by: oliveryuan <yuansong@step.ai> Co-authored-by: oliveryuan <yuansong@step.ai>	2025-07-31 23:19:06 +08:00
amirkl94	207b750e19	[NVIDIA] Add SM100 Flashinfer MoE per tensor scale fp8 backend (#21458 ) Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-31 06:00:01 -07:00
Nick Hill	5daffe7cf6	[BugFix] Fix case where `collective_rpc` returns `None` (#22006 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-07-31 12:51:37 +00:00
wang.yuqi	2836dd73f1	[Model][CI] Let more pooling models support v1 (#21747 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-07-31 01:51:15 -07:00
Daniele	d2aab336ad	[CI/Build] get rid of unused VLLM_FA_CMAKE_GPU_ARCHES (#21599 ) Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>	2025-07-31 15:00:08 +08:00
Cyrus Leung	9532a6d563	[Deprecation] Remove deprecated args and methods (#21907 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-30 23:46:38 -07:00
Ning Xie	3e36fcbee6	[Bugfix]: fix metadata file copy in test_sharded_state_loader (#21830 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-07-31 06:22:11 +00:00
Michael Goin	055bd3978e	[CI Bugfix] Fix CI OOM for `test_shared_storage_connector_hashes` (#21973 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-31 11:45:29 +08:00
Jee Jee Li	0f7919fca0	[Misc] Expand SUPPORTED_HIDDEN_SIZES for DeepEP low-latency kernels (#21818 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-30 20:41:12 -07:00
Michael Goin	61445453df	[UX] Rename CUTLASS_MLA_VLLM_V1 to CUTLASS_MLA (#21966 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-30 20:40:34 -07:00
Sanchit Gandhi	ec02e536df	[Bugfix] Relax lang pin for voxtral (#21833 ) Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-30 20:38:52 -07:00
Michael Goin	9cb497bfa3	[Example] Add `async_llm_streaming.py` example for AsyncLLM streaming in python (#21763 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-30 18:39:46 -06:00
Zebing Lin	ca9e2be3ed	[Core] Move EngineCoreRequest to Request conversion out of EngineCore (#21627 ) Signed-off-by: linzebing <linzebing1995@gmail.com>	2025-07-30 15:00:54 -07:00
Bram	601f856d56	[Bugfix] Fix None value handling in trace span creation for cancelled requests (#20272 )	2025-07-30 14:44:02 -07:00
cascade	287f527f54	[Feature] Add async tensor parallelism for scaled mm (#20155 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-07-30 17:23:41 -04:00
Ming Yang	f12d9256b3	[Misc] Use dracut on CentOS and skip clone if repo exists for EP kernel installation (#21635 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-07-30 13:15:06 -07:00
Doug Smith	b9b753e7a7	For VLLM_USE_PRECOMPILED, only compiled .so files should be extracted (#21964 )	2025-07-30 13:04:40 -07:00
Nick Hill	56bd537dde	[Misc] Support more collective_rpc return types (#21845 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-07-30 10:20:20 -07:00
wenxindongwork	8f0d516715	[TPU] Support Pathways in vLLM (#21417 ) Signed-off-by: wenxindongwork <wenxindong@google.com>	2025-07-30 10:02:12 -07:00
wxsm	f4135232b9	feat(distributed): add `get_required_kvcache_layout` class method to kv connector api (#20433 ) Signed-off-by: wxsm <wxsms@foxmail.com>	2025-07-30 16:41:51 +00:00
Chenguang Zheng	4904e53c32	[Bugfix] SharedStorage Connector for V1 PD multimodal (#21611 ) Signed-off-by: fake0fan <645327136@qq.com> Signed-off-by: herotai214 <herotai214@gmail.com> Co-authored-by: herotai214 <herotai214@gmail.com>	2025-07-30 09:18:37 -07:00
Cyrus Leung	004203e953	[CI/Build] Fix registry tests (#21934 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-30 09:10:41 -07:00
633WHU	5c765aec65	[Bugfix] Fix TypeError in scheduler when comparing mixed request_id types (#21816 ) Signed-off-by: chiliu <chiliu@paypal.com> Co-authored-by: chiliu <chiliu@paypal.com>	2025-07-30 08:54:44 -07:00
Yong Hoon Shin	ad510309ee	Override attention metadata for fast prefill in some KV sharing setups (#21590 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-07-30 08:54:15 -07:00
Cyrus Leung	366f6b3a4d	[Bugfix] Fix multi-api server not working for text models (#21933 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-30 08:42:05 -07:00
Isotr0py	6e599eebe8	[Bugfix] Fix OOM tests in initialization test (#21921 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-07-30 07:35:47 -07:00
Harry Mellor	88edf5994c	[Docs] Reduce the size of the built docs (#21920 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-30 07:35:08 -07:00
Po-Han Huang (NVIDIA)	ff08e51940	[NVIDIA] Fix Llama4 Scout FP4 functionality issues (#21499 ) Signed-off-by: Po-Han Huang <pohanh@nvidia.com>	2025-07-30 07:33:40 -07:00
Ruixiang Tan	8f4a1c9a04	[Misc] Improve code readability of KVCacheManager (#21673 ) Signed-off-by: tanruixiang <tanruixiang0104@gmail.com> Signed-off-by: Ruixiang Tan <819464715@qq.com> Signed-off-by: GitHub <noreply@github.com>	2025-07-30 07:20:43 -07:00
Harry Mellor	36ede45989	Reduce time wasted in GitHub Actions using `concurrency` (#21919 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-30 07:18:02 -07:00
Cyrus Leung	0e40b26073	[CI/Build] Only run markdownlint in CI (#21892 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-30 07:17:14 -07:00
Wentao Ye	0271c2ff2f	[Test] Add Benchmark and Unit Test for `per_token_group_quant` (#21860 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-30 07:15:02 -07:00
youkaichao	e91d3c9cda	[misc] skip p2p check by default (#21904 )	2025-07-30 22:05:04 +08:00

1 2 3 4 5 ...

8185 Commits