xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-27 22:57:29 +08:00

Author	SHA1	Message	Date
avigny	dd5d1ef780	[Bugfix] Mistral tool parser streaming update (#19425 ) Signed-off-by: avigny <47987522+avigny@users.noreply.github.com> Signed-off-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Jeff Cook <jeff@jeffcook.io> Co-authored-by: sfbemerk <benjaminmerkel@mail.de> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-03 17:45:31 +00:00
Micah Williamson	d1f7392c5f	[ROCm][CI] Fix v1/logits_processors failure on ROCm (#29927 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-04 01:17:07 +08:00
Lumis Chen	9bcf92295a	[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163 ) Signed-off-by: LuminolT <lumischen01@gmail.com> Signed-off-by: Lumis Chen <lumischen01@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-12-03 16:06:57 +00:00
rasmith	5aa9b09040	[CI/Build][AMD] Skip test_shared_storage_connector_hashes in test_shared_storage_connector.py due to hipErrorLaunchFailure when calling .cpu() (#29839 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-03 22:56:35 +08:00
Tsukasa OI	42c1949643	[Bugfix][Quantization] Support BF16 tensors on GGUF (#29948 ) Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>	2025-12-03 10:33:46 +00:00
Isotr0py	cc4e296ea6	[CI/Build] Avoid duplicate empty inputs test for common multimodal generation tests (#29907 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-03 10:27:36 +00:00
Chauncey	3f42b05fbc	[Refactor] [1/N] to simplify the vLLM serving architecture (#28040 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-12-03 01:26:39 -08:00
Andrew Xia	3a7751485b	[responsesAPI] support input output messages for non harmony models (#29549 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-12-02 23:59:23 -08:00
Arpit Khandelwal	d7284a2604	[Core] Rename PassConfig flags as per RFC #27995 (#29646 ) Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-12-03 03:38:55 +00:00
Andreas Karatzas	506ed87e87	[ROCm][CI][Bugfix] Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers accuracy issues (#29909 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-03 10:36:49 +08:00
Micah Williamson	c014de1ec7	[ROCm][CI] Fix test_cudagraph_mode.py Failure For AMD CI (#29808 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-02 22:54:36 +00:00
Julien Denize	1b1e35aaf9	[BUGFIX] Fix regex pattern for Mistral Tool Call (#29918 ) Signed-off-by: juliendenize <julien.denize@mistral.ai>	2025-12-02 14:51:58 -08:00
Chauncey	0a9caca9f5	[Bugfix] fix --scheduling-policy=priority & n>1 crashes engine (#29764 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-02 22:42:28 +00:00
Sage Moore	e6f114ac25	[Bugfix][EPLB] Prevent user-provided EPLB config from being overwritten with defaults (#29911 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-12-02 13:20:22 -09:00
Harry Mellor	6fc5841db1	Fix some more Transformers nightly tests (#29872 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 21:49:44 +00:00
Divakar Verma	afb1e5b380	[CI][ROCm][tests/v1/e2e] Fix multiprocessing launch for the test (#29123 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-02 20:46:10 +00:00
Copilot	1c593e117d	Fix boolean nested params, add dict format support, and enhance plotting for vllm bench sweep (#29025 ) Signed-off-by: Luka Govedič <luka.govedic@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-12-02 20:40:56 +00:00
Isotr0py	63b1da76ba	[Chore]: Reorganize gguf utils funtions under `transformers_utils` (#29891 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-02 17:33:23 +00:00
Andrew Xia	52cb349fc0	[responsesAPI][3] ResponsesParser to set up non harmony MCP (#29413 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-12-02 11:24:45 -05:00
ImaGoodFella	60c3d413af	[Multimodal][Core] Optimize multimodal preprocessing cache by hashing image bytes instead of pixel values (#29621 ) Signed-off-by: Rahul Steiger <rasteiger@ethz.ch> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-02 21:49:02 +08:00
Cyrus Leung	68ffbca7e4	[Chore] Use `tokenizer.encode` and `tokenizer.decode` directly (#29851 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-02 12:30:40 +00:00
Harry Mellor	951445a52d	Remove default values from `InitVar`s so that they're not stored (#29859 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 12:16:37 +00:00
Julien Denize	d8c6210eea	Add Mistral Large 3 and Ministral 3 (#29757 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Signed-off-by: Mickael Seznec <mickael@mistral.ai> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Mickael Seznec <mickael@mistral.ai>	2025-12-02 10:29:00 +00:00
杰兮	48d15a32aa	[CI] Fix Bad_words test for tokenizer encode/decode asymmetry (#28193 ) Signed-off-by: zhyajie <yajizhan@amd.com> Co-authored-by: zhyajie <yajizhan@amd.com>	2025-12-02 00:02:12 -08:00
Boyuan Feng	3b221cb661	[BugFix] respect VLLM_LOGGING_LEVEL in logger (#29761 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-12-02 07:49:16 +00:00
Cyrus Leung	653591d5e7	[Chore] Move tokenizer initialization methods (#29793 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-02 13:33:37 +08:00
Divakar Verma	e2fbfc955e	[CI][AMD] spec_decode:eagle skip FLASH_ATTN for deepseek on ROCm (#29827 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-02 05:27:46 +00:00
Divakar Verma	a690fb5bd6	[CI][ROCm] Fix test_correctness_sliding_window (#29243 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-02 04:53:27 +00:00
usberkeley	81fe3f82af	[BugFix] Fix index error in ngram_proposer (#29779 ) Signed-off-by: Bradley <bradley.b.pitt@gmail.com>	2025-12-02 04:48:11 +00:00
Zuyi Zhao	53bf71b0f0	[Misc] Update conftest for entrypoints/sagemaker test folder (#29799 ) Signed-off-by: Zuyi Zhao <zhaozuy@amazon.com>	2025-12-01 18:56:39 -09:00
Zhuohan Li	d0cd728907	[Core] Support reseting all running requests' KV while calling `reset_prefix_cache` (#28827 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-02 02:25:05 +00:00
Andrew Xia	fa8804ad9c	[responsesAPI][4] fix responseOutputItem Kimi K2 thinking bug (#29555 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-12-02 02:11:35 +00:00
Nick Hill	44822d7ff2	[BugFix] Preserve spec decoding uniform decode when scheduling (#29759 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-01 17:15:52 -08:00
shivampr	cabc77cc86	[Core][Observability] Add KV cache residency metrics (#27793 ) Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior: vllm:kv_block_lifetime_seconds — total lifetime from allocation to free vllm:kv_block_idle_before_evict_seconds — idle duration before eviction vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates. Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled. Two new runtime flags are introduced: --kv-cache-metrics – enable KV cache residency metrics --kv-cache-metrics-sample – control sampling ratio (default: 0.01) Signed-off-by: Shivam <shivamprasad91@gmail.com>	2025-12-01 18:27:53 +00:00
knlnguyen1802	fc6acc88ca	[Bugfix] Missing cached item in the MultiModalReceiverCache (#28525 ) Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: Chenguang Zheng <645327136@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-01 10:18:07 -08:00
BADAOUI Abdennacer	d0985c5feb	[Hardware][AMD] Remove ROCm skip conditions for transformers backend tests (#29782 ) Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>	2025-12-02 02:03:13 +08:00
sangbumlikeagod	092bb73b8a	[Frontend] add 'verbose_json' and 'timestamp' feature on Whisper Transcription/Translation (#24209 ) Signed-off-by: sangbumlikeagod <oironese@naver.com> Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>	2025-12-01 18:19:17 +01:00
Marcin Ostrowski	5cfa967efa	[Bugfix] TypeError: 'NoneType' object is not callable (#29414 ) Signed-off-by: Marcin Ostrowski <marcinx.ostrowski@intel.com>	2025-12-01 13:16:44 +00:00
Isotr0py	b95db244ee	[v1] Add real sliding window calculation to FlexAttention direct BlockMask building (#26015 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com> Co-authored-by: baonudesifeizhai <baonudesifeizhai@gmail.com>	2025-12-01 13:12:51 +00:00
Zhengxu Chen	ad9d656bfa	[multimodal][test] Reduce memory utilization for test_siglip to avoid OOM (#29504 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-01 20:41:48 +08:00
Cyrus Leung	f0a28bf661	[Misc] Unify tokenizer registration (#29767 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-01 11:34:58 +00:00
daniel-salib	014ece97c7	[Frontend] Add tool filtering support to ToolServer (#29224 ) Signed-off-by: Daniel Salib <danielsalib@meta.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2025-12-01 08:03:57 +00:00
wang.yuqi	62de4f4257	[Frontend] Resettle pooling entrypoints (#29634 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-12-01 15:30:43 +08:00
Huamin Li	83805a6078	[CI] Skip paddleocr_vl for transformer 4.57.3 (#29758 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-12-01 04:38:06 +00:00
Omer Ullman Argov	39d28108f4	[Feat] Support non-gated activations in NVFP4 modelopt path (#29004 )	2025-11-30 11:02:40 -05:00
Cyrus Leung	64bc09ba27	[Core] Enable `inputs_embeds_size` separate from `hidden_size` (#29741 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-30 17:31:12 +08:00
Cyrus Leung	2afcec4dec	[Misc] Update `TokenizerLike` interface and move `get_cached_tokenizer` (#29730 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-30 14:59:47 +08:00
Vensen	66b5840287	[Bugfix][sleepmode][fp8 kv cache]: Fix FP8 KV cache + sleep(level=2) gibberish output (#28783 ) Signed-off-by: vensen <vensenmu@gmail.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-11-30 14:24:25 +08:00
Xin Yang	a491b0911b	[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#29708 ) Signed-off-by: Xin Yang <xyangx@amazon.com> Signed-off-by: Xin Yang <105740670+xyang16@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-30 10:37:25 +08:00
Jee Jee Li	b9d0504a36	[Bugfix] Revert test_tokenization.py (#29729 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-29 16:35:15 +00:00

1 2 3 4 5 ...

3736 Commits