5482 Commits

Author SHA1 Message Date
shangmingc
d03308be0c
[Misc] Remove stale func in KVTransferConfig (#14746)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-03-28 17:33:32 +00:00
Cyrus Leung
c6bc0034d0
[Misc] Remove unused utils and clean up imports (#15708)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-28 09:41:16 -07:00
Woosuk Kwon
70e132244a
[Minor] Remove TGI launching script (#15646)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-28 09:30:08 -07:00
Michael Goin
47e9038d23
Fix cpu offload testing for gptq/awq/ct (#15648)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-03-29 00:29:32 +08:00
Kebe
432cf22a6a
[Bugfix] Fix regex compile display format (#15368)
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-03-28 08:58:44 -07:00
Reid
2914006fe0
[doc] add missing imports (#15699)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-03-28 15:56:48 +00:00
Russell Bryant
7329ff5468
[V1] Support disable_any_whtespace for guidance backend (#15584)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-28 23:46:45 +08:00
Cyrus Leung
541d1df486
[Bugfix] embed_is_patch for Idefics3 (#15696)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-28 08:27:52 -07:00
Chauncey
3b00ff9138
[Bugfix][v1] xgrammar structured output supports Enum. (#15594)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-03-28 06:14:53 -07:00
Jee Jee Li
91276c5721
[Model] Adding torch compile annotations to chatglm (#15624)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-28 21:14:09 +08:00
Harry Mellor
0b4167526d
[Docs] Add "Generation quality changed" section to troubleshooting (#15701)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-28 13:03:21 +00:00
Reid
fd5fd26902
[Frontend] update priority for --api-key and VLLM_API_KEY (#15588)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-03-28 19:40:12 +08:00
Ce Gao
3bbaacbe15
[Bugfix][Frontend] Eliminate regex based check in reasoning full generator (#14821)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
2025-03-28 11:20:35 +00:00
Lize Cai
a10314c6b3
[Misc] Fix test_sleep to use query parameters (#14373)
Signed-off-by: Lize Cai <lize.cai@sap.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-03-28 18:00:14 +08:00
Jee Jee Li
70f2c2a709
[Bugfix] Fix 'InductorAdaptor object has no attribute 'cache_dir' (#15674)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-28 17:10:40 +08:00
Li, Jiang
280d074103
[CPU][CI] Improve CPU Dockerfile (#15690)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-03-28 01:36:31 -07:00
Ce Gao
32b14baf8a
[Refactor][Frontend] Keep all logic about reasoning into one class (#14428)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
2025-03-28 00:23:30 -07:00
Robert Shaw
2d9045fce8
[TPU][CI] Fix TPUModelRunner Test (#15667)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-03-28 00:01:26 -07:00
Cyrus Leung
355f66348c
[V1] Remove legacy input registry (#15673)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-27 23:34:34 -07:00
Cyrus Leung
8693e47e6a
[Bugfix] Fix mm_hashes forgetting to be passed (#15668)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-28 05:51:05 +00:00
Jason (Siyu) Zhu
cec8c7d7f8
Refactor error handling for multiple exceptions in preprocessing (#15650)
Signed-off-by: JasonZhu1313 <jasonchu13@outlook.com>
2025-03-28 03:27:20 +00:00
Gregory Shtrasberg
4d0ec37267
[Quantization][FP8] Adding support for fp8 gemm layer input in fp8 (#14578)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-03-28 02:58:16 +00:00
Chen Xia
e7f720ea56
[Misc]add coding benchmark for speculative decoding (#15303)
Signed-off-by: CXIAAAAA <cxia0209@gmail.com>
2025-03-28 10:47:05 +08:00
Wes
4ae17bf1e2
Revert "Use Cache Hinting for fused_moe kernel (#15511)" (#15645)
Signed-off-by: Wes Medford <wryanmedford@gmail.com>
2025-03-27 19:45:55 -07:00
Robert Shaw
8a49eea74b
[CI][TPU] Temporarily Disable Quant Test on TPU (#15649)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-03-27 19:45:05 -07:00
wwl2755
b4245a48df
[Doc] Fix dead links in Job Board (#15637)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
2025-03-28 02:43:40 +00:00
Kebe
4e0f6076be
[Bugfix] Fix failure to launch in Tensor Parallel TP mode on macOS. (#14948)
Signed-off-by: Kebe <mail@kebe7jun.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-03-28 10:13:41 +08:00
Jee Jee Li
726efc6a32
[Quantization][V1] BitsAndBytes support V1 (#15611)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-28 10:12:47 +08:00
Robert Shaw
bd45912b99
[TPU] Lazy Import (#15656)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-03-28 09:57:01 +08:00
Nick Hill
15dac210f0
[V1] AsyncLLM data parallel (#13923)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-27 16:14:41 -07:00
Russell Bryant
112b3e5b3b
[CI] Update rules for applying tpu label. (#15634)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-27 22:15:26 +00:00
cnorman
32d669275b
Correct PowerPC to modern IBM Power (#15635)
Signed-off-by: Christy Norman <christy@linux.vnet.ibm.com>
2025-03-27 15:04:32 -07:00
Nicolò Lucchesi
4098b72210
[Bugfix][TPU][V1] Fix recompilation (#15553)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-03-27 19:15:06 +00:00
Harry Mellor
46450b8d33
Use absolute placement for Ask AI button (#15628)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-27 18:52:18 +00:00
Cyrus Leung
13ac9cab21
[Misc] Avoid direct access of global mm_registry in compute_encoder_budget (#15621)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-27 17:52:00 +00:00
Yuan Tang
66aa4c0bf4
[Feature] Add middleware to log API Server responses (#15593)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-03-27 17:49:38 +00:00
Cyrus Leung
247181536f
[Misc] Replace is_encoder_decoder_inputs with split_enc_dec_inputs (#15620)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-27 17:36:32 +00:00
Cyrus Leung
07bf813fb5
[Doc] Link to onboarding tasks (#15629)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-27 16:30:53 +00:00
Hiroaki Sugiyama
8958217ad5
[Bugfix] Fix use_cascade_attention handling for Alibi-based models on vllm/v1 (#15211)
Signed-off-by: h-sugi <h.sugi@ieee.org>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-27 22:29:29 +08:00
Cyrus Leung
ac5bc615b0
[Model] MiniCPM-V/O supports V1 (#15487)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-27 06:07:29 -07:00
Reid
8063dfc61a
[Doc] update --system for transformers installation in docker doc (#15616)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-03-27 20:38:46 +08:00
Richard Zou
6278bc829e
Fix incorrect filenames in vllm_compile_cache.py (#15494)
Signed-off-by: <zou3519@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-03-27 18:33:41 +08:00
wang.yuqi
3f532cb6a6
[Misc] Use model_redirect to redirect the model name to a local folder. (#14116) 2025-03-27 02:21:23 -07:00
Cyrus Leung
e6c9053f9e
[Misc] Clean up scatter_patch_features (#15559)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-27 07:45:00 +00:00
Robert Shaw
43ed4143c4
[Quantization] Fp8 Channelwise Dynamic Per Token GroupedGEMM (#15587)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com>
Co-authored-by: ElizaWszola <ewszola@redhat.com>
2025-03-27 06:47:25 +00:00
Bella kira
f4c98b4d4c
[Misc] Consolidate LRUCache implementations (#15481)
Signed-off-by: Bella kira <2374035698@qq.com>
2025-03-27 06:43:43 +00:00
Robert Shaw
e1e0fd7543
[TPU] Avoid Triton Import (#15589)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-03-27 06:43:02 +00:00
Rui Qiao
df8d3d1287
[Misc] Restrict ray version dependency and update PP feature warning in V1 (#15556) 2025-03-27 06:21:07 +00:00
Chengji Yao
619d3de8bd
[TPU] [V1] fix cases when max_num_reqs is set smaller than MIN_NUM_SEQS (#15583)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-03-26 22:46:26 -07:00
Gregory Shtrasberg
ecff8309a3
[ROCm] Env variable to trigger custom PA (#15557)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-03-26 22:46:12 -07:00