xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-06 21:37:20 +08:00

Author	SHA1	Message	Date
Nick Hill	646d62f636	[Core] Use tuple for kv cache group block ids (#19175 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-10 07:01:17 +02:00
Siyuan Liu	7d44c469fe	[TPU]Fix KV cache sharing tests (#19371 )	2025-06-09 18:38:15 -04:00
liusiqian-tal	31f58be96a	[Frontend] Make TIMEOUT_KEEP_ALIVE configurable through env var (#18472 ) Signed-off-by: liusiqian <liusiqian@tal.com>	2025-06-09 21:41:21 +00:00
22quinn	c1c7dbbeeb	[Bugfix][Core] Prevent token lengths exceeding `max_model_len` in V0 (#19348 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-09 23:01:29 +08:00
Varun Sundar Rabindranath	5cf2daea9a	[Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. (#19298 ) Signed-off-by: Varun <vsundarr@redhat.com> Co-authored-by: Varun <vsundarr@redhat.com>	2025-06-09 10:50:39 -04:00
Isotr0py	b8089195b4	[v1] Add fp32 support to v1 engine through flex attn (#19319 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-06-09 22:10:44 +08:00
Jee Jee Li	95a6568b5c	[CI/Build] Fix LoRA test (#19350 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-06-09 09:52:10 +00:00
Richard Zou	3a4d417707	[Misc] Cleanup compilation tests (#19343 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-06-09 15:05:44 +08:00
Dipika Sikka	c123bc33f9	[Quantization] Add compressed-tensors NVFP4 support (#18312 )	2025-06-08 09:05:55 -04:00
Richard Zou	3d64d366e0	[Misc] Change tests/compile to use VLLM_V1 by default (#19302 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-06-08 16:06:48 +08:00
Richard Zou	eaa2e51088	[Bugfix] Re-enable use_cudagraph in vLLM v1 (#19299 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2025-06-08 08:56:12 +08:00
Luka Govedič	2d8476e465	[BugFix][V1] Fix memory profiling bug (#18974 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-06-07 10:34:51 -07:00
Isotr0py	d2f0e7e615	[CI/Build] Improve Llama GGUF test robustness (#19287 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-07 17:23:28 +08:00
Driss Guessous	cf02f9b283	Add FlexAttention to V1 (#16078 ) Signed-off-by: drisspg <drisspguessous@gmail.com>	2025-06-06 21:58:55 -07:00
ElizaWszola	84166fee97	[Kernel] Integrate CUTLASS MoE kernel with PPLX (#18762 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-06-06 18:26:11 -07:00
Lu Fang	6e0cd10f72	[Easy][Test] Simplify test_function_tool_use with multiple parametrizes (#19269 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-06-07 09:19:09 +08:00
Nick Hill	46ecc57973	[BugFix] Fix tpu_model_runner block_id concatenation (#19228 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-06 16:28:17 -07:00
Adolfo Victoria	ca27f0f9c1	[Bugfix][Core] Update cancellation logic in `generate()` to handle Generator exits (#19225 ) Co-authored-by: Adolfo Victoria <adovi@meta.com>	2025-06-06 20:17:54 +00:00
Nick Hill	aad30bd306	[BugFix] Fix MultiConnector test after HMA changes (#19291 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-06 20:16:24 +00:00
jmswen	7353492a47	[Core] Raise when non-multi-instance DP clients target a DP rank (#19227 ) Signed-off-by: Jon Swenson <jmswen@gmail.com>	2025-06-06 19:03:01 +08:00
Siqi Yan	f168b85725	Unit Test for run_dp_sharded_vision_model (#19103 ) Signed-off-by: Siqi Yan <siqi@meta.com> Co-authored-by: Siqi Yan <siqi@meta.com>	2025-06-06 16:24:02 +08:00
Richard Zou	da511d54d8	Fix CompilationConfig repr (#19091 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-06-06 16:23:35 +08:00
Dipika Sikka	94870359cd	[Quantization] Bump compressed-tensors version; update NVFP4A16 test model (#19224 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>	2025-06-06 01:21:54 -07:00
Chengji Yao	b61dc5f972	[TPU] update torch_xla pin (#19231 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-06-06 04:27:38 +00:00
Chen Zhang	f8a1a2d108	[v1] Hybrid Memory Allocator (#17996 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-05 20:47:09 -07:00
Benjamin Chislett	3465b87ef8	[Bugfix] Fix EAGLE vocab embedding construction for Llama 70B (#19033 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-06-05 19:10:08 -07:00
Jerry Zhang	c8134bea15	Fix AOPerModuleConfig name changes (#18869 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-06-05 18:51:32 -07:00
Luis Vega	cb6d572e85	[Model] NemotronH support (#18863 ) Signed-off-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com> Co-authored-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>	2025-06-05 21:29:28 +00:00
Dipika Sikka	aa49f14832	[Quantization] Skip Fp4 Test for `compressed-tensors` (#19217 )	2025-06-05 18:21:53 +00:00
Povilas Kanapickas	85e2b7bb13	[MISC][Bugfix] Use less CPU when message queue has been empty for some time (#16226 ) Signed-off-by: Povilas Kanapickas <povilas@radix.lt>	2025-06-05 16:53:08 +00:00
Chiyue Wei	61059bee40	[Hardware][NVIDIA] FP4 MoE kernel optimization (#19110 ) Signed-off-by: Chiyue Wei <chiyuew@nvidia.com> Co-authored-by: Chiyue Wei <chiyuew@nvidia.com>	2025-06-05 09:48:26 -07:00
Guillaume Calmettes	9bc8bb07cf	[Bugfix] properly catch PIL-related errors for vision models when incorrect data urls are provided (#19202 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-06-05 12:59:28 +00:00
Chauncey	8fc57501d3	[Bugfix]: Fix the incompatibility issue with stream when Thinking is disabled (#19135 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-06-05 06:24:24 +00:00
Robert Shaw	c56ed8bb0e	[Bugfix][Nixl] Fix full prefix cache hit bug (#18632 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-06-05 02:07:32 +00:00
Nicolò Lucchesi	b2fac67130	[P/D] Heterogeneous TP (#18833 ) Signed-off-by: nicklucche <nlucches@redhat.com>	2025-06-04 23:25:34 +00:00
Varun Sundar Rabindranath	c3fd4d669a	[Kernel] Integrate batched/masked deepgemm kernel (#19111 ) Signed-off-by: Varun <vsundarr@redhat.com> Co-authored-by: Varun <vsundarr@redhat.com>	2025-06-04 21:59:18 +00:00
Siyuan Liu	7ee2590478	[TPU] Update dynamo dump file name in compilation test (#19108 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-06-04 16:13:43 -04:00
jmswen	c8dcc15921	Allow AsyncLLMEngine.generate to target a specific DP rank (#19102 ) Signed-off-by: Jon Swenson <jmswen@gmail.com>	2025-06-04 08:26:47 -07:00
Cyrus Leung	01dc9a76db	[CI/Build][Bugfix] Ensure compatibility with transformers 4.52 (#18678 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-04 04:49:20 -07:00
wang.yuqi	35cf32df30	Improve the output precision of embedding models (#19092 )	2025-06-04 11:48:57 +00:00
Seiji Eicher	2669a0d7b5	Fix ValueError: Missing value for tag key(s): model_name,engine. (#19113 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-06-04 17:10:45 +08:00
Siyuan Liu	8e972d9c44	[TPU] Skip hanging tests (#19115 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-06-04 01:43:00 -07:00
Woosuk Kwon	b124e1085b	[Bugfix] Fix FA3 full cuda graph correctness (#19106 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-06-03 23:10:15 -07:00
Kaixi Hou	41aa578428	[NVIDIA] Add Cutlass MLA backend (#17625 )	2025-06-03 21:40:26 -07:00
Vadim Gimpelson	5d6d1adf15	[KERNEL] Sampler. CUDA kernel for applying repetition penalty (#18437 )	2025-06-03 21:13:01 -07:00
Li, Jiang	4555143ea7	[CPU] V1 support for the CPU backend (#16441 )	2025-06-03 18:43:01 -07:00
Yan Ru Pei	b712be98c7	feat: add data parallel rank to KVEventBatch (#18925 )	2025-06-03 17:14:20 -07:00
Chen Zhang	a8da78eac9	[Bugfix] Max concurrency estimation and check_enough_kv_cache_memory for models with sliding window layers (#19029 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-04 00:14:06 +00:00
Chauncey	4de790fcad	[Bugfix]: Fix the incompatibility issue with tool_choice 'required' when Thinking is enabled (#19075 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-06-03 23:27:24 +00:00
Chen Zhang	b5fd9506c1	[Bugfix] get_num_blocks_to_allocate with null_block (#19031 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-03 15:30:55 -07:00

1 2 3 4 5 ...

2106 Commits