xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-24 17:06:37 +08:00

Author	SHA1	Message	Date
Wentao Ye	eefbf4a68b	[Perf] Optimize `reshape_and_cache_flash` CUDA Kernel (#22036 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-01 19:18:51 -04:00
Michael Goin	88faa466d7	[CI] Initial tests for SM100 Blackwell runner (#21877 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-01 16:18:38 -07:00
Nick Hill	881e1af43a	[BugFix] Harden distributed DP startup (#21538 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-01 21:40:45 +00:00
XiongfeiWei	d84b97a3e3	Add lora test for tp>1 case for TPU. (#21970 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-08-01 18:56:08 +00:00
Rui Qiao	d331759488	Introduce RayPPCommunicator for ray-based PP (#21660 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-08-01 11:50:58 -07:00
Animesh Jain	9659bc7f27	[compile][startup] Disable C++ compilation of symbolic shapes (#20836 ) Signed-off-by: Animesh Jain <anijain@umich.edu>	2025-08-01 10:38:52 -07:00
Michael Goin	3277e8f9e1	Fix pre-commit failure for SECURTIY.md (#22102 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-01 10:36:07 -07:00
Jee Jee Li	8d705996df	[Misc] Minor enhancement of benchmark_moe (#22068 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-02 01:35:30 +08:00
Harry Mellor	38c8bce8b6	Enable headless models for pooling in the Transformers backend (#21767 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-01 10:31:29 -07:00
Varun Sundar Rabindranath	ac45c44d98	[Bugfix] [Performance] DeepEPHighThroughput + DeepSeek : Quant before Dispatch (#21837 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-08-01 10:14:38 -07:00
Huzaifa Sidhpurwala	d6664664b4	security policy: take 1 (#21119 ) Signed-off-by: Huzaifa Sidhpurwala <huzaifas@redhat.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-08-01 10:09:49 -07:00
rongfu.leng	b879ecd6e2	[Bugfix] fix when skip tokenizer init (#21922 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-08-01 10:09:36 -07:00
Isotr0py	3f8e952179	[Bugfix] Fix glm4.1v video inference issue (#22067 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-08-01 09:33:30 -07:00
Harry Mellor	326a1b001d	Improve documentation of `ModelConfig.try_get_generation_config` to prevent future confusion (#21526 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-01 09:32:27 -07:00
Harry Mellor	2d7b09b998	Deprecate `--disable-log-requests` and replace with `--enable-log-requests` (#21739 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-01 17:16:37 +01:00
David Xia	97608dc276	[Docs] use `uv` in CPU installation docs (#22089 ) Signed-off-by: David Xia <david@davidxia.com>	2025-08-01 07:55:55 -07:00
Nick Hill	3146519add	[BugFix] Don't change title of top-level process (#22032 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-01 07:37:55 -07:00
Richard Zou	8026a335a1	[BugFix] Update AttnFusionPass cache key (#21947 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2025-08-01 07:11:29 -07:00
Wentao Ye	a59cd9d9f7	[Refactor] Fix Compile Warning #1444-D (#21462 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-01 06:10:30 -07:00
Abirdcfly	5c54d9759d	[Bugfix][PD] set max_completion_tokens=1 if req has this value (#21841 ) Signed-off-by: Abirdcfly <fp544037857@gmail.com>	2025-08-01 06:08:45 -07:00
Gamhang	0a6d305e0f	feat(multimodal): Add customizable background color for RGBA to RGB conversion (#22052 ) Signed-off-by: Jinheng Li <ahengljh@gmail.com> Co-authored-by: Jinheng Li <ahengljh@gmail.com>	2025-08-01 06:07:33 -07:00
Michael Goin	f81c1bb055	[Bugfix] Check NVIDIA artifactory is accessible before using flashinfer cubin kernels (#21893 )	2025-08-01 08:28:45 -04:00
Harry Mellor	fb0e0d46fc	Fix `get_kwargs` for case where type hint is `list[Union[str, type]]` (#22016 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-01 05:26:42 -07:00
TJian	26b5f7bd2a	[BUG] [ROCm] Fix import bug on ROCm (#22083 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-08-01 05:25:20 -07:00
Dipika Sikka	dfbc1f8880	[Speculative Decoding] Add `speculators` config support (#21345 )	2025-08-01 08:25:18 -04:00
Harry Mellor	87c94bc879	Revert "Update sampling_metadata.py (#21937 )" (#22088 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-01 05:24:46 -07:00
Jee Jee Li	28b18cc741	[Quantization] Enable BNB support for InternS1 (#21953 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-01 11:09:54 +00:00
WeiQing Chen	4931486988	[Doc] Added warning of speculating with draft model (#22047 ) Signed-off-by: Dilute-l <dilu2333@163.com> Co-authored-by: Dilute-l <dilu2333@163.com>	2025-08-01 02:11:56 -07:00
Woosuk Kwon	0f81b310db	[Misc] Remove upper bound in openai package version (#22060 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-01 02:11:40 -07:00
wuhang	e6680f9e25	[Bugfix] Add log prefix in non-dp mode engine core (#21889 ) Signed-off-by: wuhang <wuhang6@huawei.com>	2025-08-01 09:04:16 +00:00
Roger Wang	27a145e893	[Doc] Add example for Step3-VL (#22061 ) Signed-off-by: Roger Wang <hey@rogerw.me>	2025-08-01 08:35:49 +00:00
Simon Mo	da31f6ad3d	Revert precompile wheel changes (#22055 )	2025-08-01 08:26:24 +00:00
Sungyoon Jeong	98df153abf	[Frontend] Align tool_choice="required" behavior with OpenAI when tools is empty (#21052 ) Signed-off-by: Sungyoon Jeong <sungyoon.jeong@furiosa.ai>	2025-08-01 07:54:17 +00:00
Zebing Lin	e0f63e4a35	[Core] Avoid repeated len(block_token_ids) check in hash_request_tokens (#21781 ) Signed-off-by: linzebing <linzebing1995@gmail.com>	2025-08-01 00:23:29 -07:00
Cyrus Leung	b4e081cb15	[Bugfix] Disable multi-modal preprocessor cache for DP (#21896 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-01 08:03:56 +01:00
Hongsheng Liu	79731a79f0	[Doc] Fix a syntax error of example code in structured_outputs.md (#22045 ) Signed-off-by: wangzi <3220100013@zju.edu.cn> Co-authored-by: wangzi <3220100013@zju.edu.cn>	2025-08-01 00:01:22 -07:00
Aviad Rossmann	53d7c39271	Update sampling_metadata.py (#21937 ) Signed-off-by: Aviad Rossmann <aviadr@neureality.ai>	2025-07-31 23:23:18 -07:00
Cyrus Leung	61dcc280fa	[Doc] Add Voxtral to Supported Models page (#22059 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-31 23:10:56 -07:00
Kyle Sayers	0f46a780d4	[Model] [Quantization] Support quantization for Gemma3n (#21974 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-07-31 22:45:15 -07:00
Mickaël Seznec	e1a7fe4af5	[BugFix] fix: aot passes kvcache dtype information (#19750 ) Signed-off-by: Mickael Seznec <mickael@mistral.ai>	2025-08-01 05:45:02 +00:00
Cyrus Leung	82de9b9d46	[Misc] Automatically resolve HF processor init kwargs (#22005 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-31 22:44:10 -07:00
Charent	ad57f23f6a	[Bugfix] Fix: Fix multi loras with tp >=2 and LRU cache (#20873 ) Signed-off-by: charent <19562666+charent@users.noreply.github.com>	2025-07-31 19:48:13 -07:00
Wentao Ye	3700642013	[Refactor] Remove Duplicate `per_block_cast_to_fp8`, Remove Dependencies of DeepGEMM (#21787 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-01 01:13:27 +00:00
Michael Goin	0bd409cf01	Move flashinfer-python to optional extra `vllm[flashinfer]` (#21959 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-31 18:02:11 -07:00
Matthew Bonanni	e360316ab9	Add DeepGEMM to Dockerfile in vllm-base image (#21533 ) Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-31 18:01:55 -07:00
Wentao Ye	c3e0e9337e	[Feature] Add Flashinfer MoE Support for Compressed Tensor NVFP4 (#21639 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-31 15:26:11 -07:00
Ilya Markov	6e672daf62	Add FlashInfer allreduce RMSNorm Quant fusion (#21069 ) Signed-off-by: ilmarkov <imarkov@redhat.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: ilmarkov <imarkov@redhat.com>	2025-07-31 13:58:38 -07:00
Benjamin Chislett	2dff2e21d9	[Bugfix] Fix MTP weight loading (#21941 )	2025-07-31 16:33:53 -04:00
Yong Hoon Shin	71470bc4af	[Misc] Add unit tests for chunked local attention (#21692 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-07-31 11:39:16 -07:00
zhiweiz	9e0726e5bf	[Meta] Official Eagle mm support, first enablement on llama4 (#20788 ) Signed-off-by: morgendave <morgendave@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.me>	2025-07-31 10:35:07 -07:00

1 2 3 4 5 ...

8222 Commits