xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-02 06:57:54 +08:00

Author	SHA1	Message	Date
Ilya Markov	4e26d3b09e	[Compile] Conditional compilation. Introduce compile_ranges (#24252 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Luka Govedič <luka.govedic@gmail.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com>	2025-12-05 18:17:32 +00:00
Matthew Bonanni	66e674cdd5	[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-12-05 09:48:43 -08:00
Mark McLoughlin	dff0a2b394	[NIXL] Add remote_request_id to kv_transfer_params (#29665 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-12-05 09:43:48 -08:00
Nick Hill	dc264bcea1	[BugFix] Eagerly abort cancelled final-step requests (#29987 ) Currently, when requests are cancelled while executing their final step, "completion" is handled based on normal stop processing (e.g. length or stop token), so the abort has no effect. This is typically not a problem, but when a kv connector is involved it thinks the request completed successfully rather than being aborted. This is problematic for disaggregated prefill which will free kv cache blocks if the request was aborted but not if it completed successfully—since the cancelled request will never be sent to the decode side, kv cache blocks remain pinned until the fall-back timeout expires. The problem is exacerbated when many requests are cancelled and/or there are large prefills whose forward pass takes a long time (since the window is bigger). This PR fixes the problem by processing pending aborts immediately prior to processing model output each step; we process only aborts, not new requests, since it's preferable for latency to process model outputs before new incoming requests. Fixes #26400. Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-05 17:28:32 +00:00
Nicolò Lucchesi	78c44fd722	[NIXL] Small cleanup of unused variables (#29618 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-05 18:17:36 +01:00
Andrew Xia	da7bc54ea8	[responsesAPI][5] ResponsesParser with tools for full MCP python loop (#29798 ) Signed-off-by: Andrew Xia <axia@fb.com> Signed-off-by: Andrew Xia <axia@meta.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-12-05 11:11:50 -05:00
Mark McLoughlin	949a6a19d2	[NIXL] Add compatibility checking to NIXL KV connector handshake (#29503 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-12-05 15:52:45 +01:00
strinczer	b73b158ab0	[Bugfix] Fix parse_output_message crash on commentary with no recipient (#29972 ) Signed-off-by: Shai Trinczer <strinczer@icloud.com> Signed-off-by: strinczer <strinczer@icloud.com>	2025-12-05 10:51:12 +00:00
Alec S	65ee97288a	[BugFix] Adding env variable to disable async grammar compilation (#29996 ) Signed-off-by: Alec Solder <alecs@fb.com> Signed-off-by: Alec S <10566873+alecsolder@users.noreply.github.com> Co-authored-by: Alec Solder <alecs@fb.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-12-05 00:49:37 -08:00
Yanan Cao	62b3333448	[Frontend] Remove deprecated -O.xx flag (#29991 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-12-05 00:47:22 -08:00
rasmith	feecba09af	[CI/Build][AMD] Use float16 in test_reset_prefix_cache_e2e to avoid accuracy issues (#29997 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-05 08:42:25 +00:00
Chukwuma Nwaugha	6e865b6a83	Refactor example prompts fixture (#29854 ) Signed-off-by: nwaughac@gmail.com	2025-12-05 06:44:32 +00:00
Charlie Fu	2c22c4ca2d	[ROCm][CI] Increase the memory threshold for test_deep_sleep_fp8_kvcache (#30104 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-12-05 04:51:44 +00:00
Hubert de La Jonquiere	befb59e5b1	[Model] Add Holo2 reasoning parser (#30048 ) Signed-off-by: hdlj-h <hubert@hcompany.ai>	2025-12-05 10:38:45 +08:00
Laith Sakka	1f0d184590	[aot_compile]change VLLM backend to read fake args from example_value (#29104 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-12-04 17:33:45 -05:00
Lucas Wilkinson	c8ab988b15	[BugFix] Fix DBO assert `assert B_block_table == B_q` (#29933 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-04 14:48:54 -05:00
Mercykid-bash	1119f6e47a	Abstract eplb algo (#26471 ) Signed-off-by: Che Ruan <cr623@ic.ac.uk> Signed-off-by: mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com> Signed-off-by: Mercykid-bash <ruanche0218@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Che Ruan <cr623@ic.ac.uk> Co-authored-by: mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-04 19:09:09 +00:00
Harry Mellor	e10c84e06a	Access `partial_rotary_factor` from `rope_parameters` (#29966 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-04 18:42:49 +00:00
Kuntai Du	ece2825a29	[KVConnector] Remove v0-related kv connector components such as kv pipe and kv lookup buffer (#29705 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2025-12-04 18:20:48 +00:00
Qiu	46cbbca05c	[CI][DCP][Perf] reduce DCP CI execution time (#29858 ) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>	2025-12-04 17:28:21 +00:00
Cyrus Leung	b286a311c2	[Chore] Deprecate `merge_by_field_config` arg (#30035 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-04 17:21:24 +00:00
Doug Smith	5b4b42c0b6	Mark DBO test as flaky on b200 for Distributed B200 test (#29913 ) Signed-off-by: dougbtv <dosmith@redhat.com>	2025-12-04 10:38:03 -05:00
Harry Mellor	9998ea5b57	Delete HF version of Phi 4 MM (#30049 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-04 13:44:50 +00:00
wang.yuqi	74c4d80c6c	[Model][6/N] Improve all pooling task \| Support chunked prefill with ALL pooling (#27145 ) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-04 13:44:15 +00:00
Chauncey	6796ce8bdb	[Bugfix] Fix the issue with interleaved thinking when using streaming (#30033 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-12-04 11:11:59 +00:00
Andreas Karatzas	e96a6a6dca	[ROCm][CI][Bugfix] Fixing the `Multi-Modal Models Test (Extended) 1` group (#30013 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-04 11:00:16 +00:00
Noa Neria	6366c098d7	Validating Runai Model Streamer Integration with S3 Object Storage (#29320 ) Signed-off-by: Noa Neria <noa@run.ai>	2025-12-04 18:04:43 +08:00
rasmith	f2f4cea6cc	[CI/Build][AMD] Skip test on test_hybrid_attention_mamba_tensor_shapes on ROCm, requires FLASHINFER (#29995 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-04 09:30:22 +00:00
Arpit Khandelwal	dfdda96747	[Core] Remove forced None assignment for deprecated PassConfig flags (#29994 ) Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-04 09:15:04 +00:00
Mark McLoughlin	899e2ef558	[Core] Fix standalone runs of test_reset_prefix_cache_e2e (#29899 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-12-04 16:22:03 +08:00
Micah Williamson	5430e110c0	[CI][AMD] Match Main CI Behavior By Skipping test_eplb_spec_decode In AMD CI (#30006 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-04 16:20:54 +08:00
Charlie Fu	9aa33a74b0	[Rocm][CI] Fix test_speculator_eagle3 by skipping the CompressedTensorw4a16 Model (#30001 ) Signed-off-by: charlifu <charlifu@amd.com> Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>	2025-12-04 07:52:28 +00:00
Cyrus Leung	9ae2f60374	[Misc] Various cleanups for MM input processing (#29970 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-04 06:22:20 +00:00
Benjamin Bartels	fca3f46658	[Frontend] Fixes anthropic /v1/messages streaming not containing input_tokens on first chunk (#29971 ) Signed-off-by: bbartels <benjamin@bartels.dev>	2025-12-04 05:50:27 +00:00
Shengqi Chen	1109f98288	[CI] fix docker image build by specifying merge-base commit id when downloading pre-compiled wheels (#29930 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2025-12-03 14:08:19 -08:00
Elizabeth Thomas	b5407869c8	[Bugfix] Respect VLLM_CONFIGURE_LOGGING value (#28671 ) Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Jane Xu <janeyx@meta.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Johnny Yang <johnnyyang@google.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: bruceszchen <bruceszchen@tencent.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Johnny Yang <24908445+jcyang43@users.noreply.github.com>	2025-12-03 22:00:52 +00:00
bnellnm	2902c34826	[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton (#29929 ) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-12-03 20:49:00 +00:00
Varun Sundar Rabindranath	19bee6d12d	[Performance][DP/EP] Add silu_mul_per_token_group_quant_fp8_colmajor kernel (#29470 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-12-03 18:04:59 +00:00
avigny	dd5d1ef780	[Bugfix] Mistral tool parser streaming update (#19425 ) Signed-off-by: avigny <47987522+avigny@users.noreply.github.com> Signed-off-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Jeff Cook <jeff@jeffcook.io> Co-authored-by: sfbemerk <benjaminmerkel@mail.de> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-03 17:45:31 +00:00
Micah Williamson	d1f7392c5f	[ROCm][CI] Fix v1/logits_processors failure on ROCm (#29927 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-04 01:17:07 +08:00
Lumis Chen	9bcf92295a	[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163 ) Signed-off-by: LuminolT <lumischen01@gmail.com> Signed-off-by: Lumis Chen <lumischen01@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-12-03 16:06:57 +00:00
rasmith	5aa9b09040	[CI/Build][AMD] Skip test_shared_storage_connector_hashes in test_shared_storage_connector.py due to hipErrorLaunchFailure when calling .cpu() (#29839 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-03 22:56:35 +08:00
Tsukasa OI	42c1949643	[Bugfix][Quantization] Support BF16 tensors on GGUF (#29948 ) Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>	2025-12-03 10:33:46 +00:00
Isotr0py	cc4e296ea6	[CI/Build] Avoid duplicate empty inputs test for common multimodal generation tests (#29907 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-03 10:27:36 +00:00
Chauncey	3f42b05fbc	[Refactor] [1/N] to simplify the vLLM serving architecture (#28040 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-12-03 01:26:39 -08:00
Andrew Xia	3a7751485b	[responsesAPI] support input output messages for non harmony models (#29549 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-12-02 23:59:23 -08:00
Arpit Khandelwal	d7284a2604	[Core] Rename PassConfig flags as per RFC #27995 (#29646 ) Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-12-03 03:38:55 +00:00
Andreas Karatzas	506ed87e87	[ROCm][CI][Bugfix] Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers accuracy issues (#29909 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-03 10:36:49 +08:00
Micah Williamson	c014de1ec7	[ROCm][CI] Fix test_cudagraph_mode.py Failure For AMD CI (#29808 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-02 22:54:36 +00:00
Julien Denize	1b1e35aaf9	[BUGFIX] Fix regex pattern for Mistral Tool Call (#29918 ) Signed-off-by: juliendenize <julien.denize@mistral.ai>	2025-12-02 14:51:58 -08:00

1 2 3 4 5 ...

3874 Commits