Li, Jiang
|
280d074103
|
[CPU][CI] Improve CPU Dockerfile (#15690)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-03-28 01:36:31 -07:00 |
|
Ce Gao
|
32b14baf8a
|
[Refactor][Frontend] Keep all logic about reasoning into one class (#14428)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
|
2025-03-28 00:23:30 -07:00 |
|
Robert Shaw
|
2d9045fce8
|
[TPU][CI] Fix TPUModelRunner Test (#15667)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-03-28 00:01:26 -07:00 |
|
Cyrus Leung
|
355f66348c
|
[V1] Remove legacy input registry (#15673)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-27 23:34:34 -07:00 |
|
Cyrus Leung
|
8693e47e6a
|
[Bugfix] Fix mm_hashes forgetting to be passed (#15668)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-28 05:51:05 +00:00 |
|
Jason (Siyu) Zhu
|
cec8c7d7f8
|
Refactor error handling for multiple exceptions in preprocessing (#15650)
Signed-off-by: JasonZhu1313 <jasonchu13@outlook.com>
|
2025-03-28 03:27:20 +00:00 |
|
Gregory Shtrasberg
|
4d0ec37267
|
[Quantization][FP8] Adding support for fp8 gemm layer input in fp8 (#14578)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-03-28 02:58:16 +00:00 |
|
Chen Xia
|
e7f720ea56
|
[Misc]add coding benchmark for speculative decoding (#15303)
Signed-off-by: CXIAAAAA <cxia0209@gmail.com>
|
2025-03-28 10:47:05 +08:00 |
|
Wes
|
4ae17bf1e2
|
Revert "Use Cache Hinting for fused_moe kernel (#15511)" (#15645)
Signed-off-by: Wes Medford <wryanmedford@gmail.com>
|
2025-03-27 19:45:55 -07:00 |
|
Robert Shaw
|
8a49eea74b
|
[CI][TPU] Temporarily Disable Quant Test on TPU (#15649)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-03-27 19:45:05 -07:00 |
|
wwl2755
|
b4245a48df
|
[Doc] Fix dead links in Job Board (#15637)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-03-28 02:43:40 +00:00 |
|
Kebe
|
4e0f6076be
|
[Bugfix] Fix failure to launch in Tensor Parallel TP mode on macOS. (#14948)
Signed-off-by: Kebe <mail@kebe7jun.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-03-28 10:13:41 +08:00 |
|
Jee Jee Li
|
726efc6a32
|
[Quantization][V1] BitsAndBytes support V1 (#15611)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-28 10:12:47 +08:00 |
|
Robert Shaw
|
bd45912b99
|
[TPU] Lazy Import (#15656)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-03-28 09:57:01 +08:00 |
|
Nick Hill
|
15dac210f0
|
[V1] AsyncLLM data parallel (#13923)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-27 16:14:41 -07:00 |
|
Russell Bryant
|
112b3e5b3b
|
[CI] Update rules for applying tpu label. (#15634)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-27 22:15:26 +00:00 |
|
cnorman
|
32d669275b
|
Correct PowerPC to modern IBM Power (#15635)
Signed-off-by: Christy Norman <christy@linux.vnet.ibm.com>
|
2025-03-27 15:04:32 -07:00 |
|
Nicolò Lucchesi
|
4098b72210
|
[Bugfix][TPU][V1] Fix recompilation (#15553)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-27 19:15:06 +00:00 |
|
Harry Mellor
|
46450b8d33
|
Use absolute placement for Ask AI button (#15628)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-27 18:52:18 +00:00 |
|
Cyrus Leung
|
13ac9cab21
|
[Misc] Avoid direct access of global mm_registry in compute_encoder_budget (#15621)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-27 17:52:00 +00:00 |
|
Yuan Tang
|
66aa4c0bf4
|
[Feature] Add middleware to log API Server responses (#15593)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2025-03-27 17:49:38 +00:00 |
|
Cyrus Leung
|
247181536f
|
[Misc] Replace is_encoder_decoder_inputs with split_enc_dec_inputs (#15620)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-27 17:36:32 +00:00 |
|
Cyrus Leung
|
07bf813fb5
|
[Doc] Link to onboarding tasks (#15629)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-27 16:30:53 +00:00 |
|
Hiroaki Sugiyama
|
8958217ad5
|
[Bugfix] Fix use_cascade_attention handling for Alibi-based models on vllm/v1 (#15211)
Signed-off-by: h-sugi <h.sugi@ieee.org>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-27 22:29:29 +08:00 |
|
Cyrus Leung
|
ac5bc615b0
|
[Model] MiniCPM-V/O supports V1 (#15487)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-27 06:07:29 -07:00 |
|
Reid
|
8063dfc61a
|
[Doc] update --system for transformers installation in docker doc (#15616)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-03-27 20:38:46 +08:00 |
|
Richard Zou
|
6278bc829e
|
Fix incorrect filenames in vllm_compile_cache.py (#15494)
Signed-off-by: <zou3519@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-03-27 18:33:41 +08:00 |
|
wang.yuqi
|
3f532cb6a6
|
[Misc] Use model_redirect to redirect the model name to a local folder. (#14116)
|
2025-03-27 02:21:23 -07:00 |
|
Cyrus Leung
|
e6c9053f9e
|
[Misc] Clean up scatter_patch_features (#15559)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-27 07:45:00 +00:00 |
|
Robert Shaw
|
43ed4143c4
|
[Quantization] Fp8 Channelwise Dynamic Per Token GroupedGEMM (#15587)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com>
Co-authored-by: ElizaWszola <ewszola@redhat.com>
|
2025-03-27 06:47:25 +00:00 |
|
Bella kira
|
f4c98b4d4c
|
[Misc] Consolidate LRUCache implementations (#15481)
Signed-off-by: Bella kira <2374035698@qq.com>
|
2025-03-27 06:43:43 +00:00 |
|
Robert Shaw
|
e1e0fd7543
|
[TPU] Avoid Triton Import (#15589)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-03-27 06:43:02 +00:00 |
|
Rui Qiao
|
df8d3d1287
|
[Misc] Restrict ray version dependency and update PP feature warning in V1 (#15556)
|
2025-03-27 06:21:07 +00:00 |
|
Chengji Yao
|
619d3de8bd
|
[TPU] [V1] fix cases when max_num_reqs is set smaller than MIN_NUM_SEQS (#15583)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-03-26 22:46:26 -07:00 |
|
Gregory Shtrasberg
|
ecff8309a3
|
[ROCm] Env variable to trigger custom PA (#15557)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-03-26 22:46:12 -07:00 |
|
Jerry Zhang
|
dcf2a590f5
|
Allow torchao quantization in SiglipMLP (#15575)
|
2025-03-26 22:45:51 -07:00 |
|
Cody Yu
|
54aa619459
|
[V1] Refactor num_computed_tokens logic (#15307)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-27 04:54:36 +00:00 |
|
Mengqing Cao
|
fb22be5817
|
[moe][quant] add weight name case for offset (#15515)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-03-27 04:50:29 +00:00 |
|
Wei Zeng
|
7f301dd8ef
|
[Doc] Update V1 user guide for fp8 kv cache support (#15585)
Signed-off-by: weizeng <weizeng@roblox.com>
|
2025-03-26 19:39:03 -07:00 |
|
Varun Sundar Rabindranath
|
8095341a01
|
[misc] LoRA: Remove unused long context test data (#15558)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-27 10:04:51 +08:00 |
|
Chenyaaang
|
69db16a46a
|
add platform check back (#15578)
Signed-off-by: Chenyaaang <llccyy1212@gmail.com>
|
2025-03-27 01:50:27 +00:00 |
|
Michael Goin
|
ce78f9af4e
|
Add automatic tpu label to mergify.yml (#15560)
|
2025-03-26 21:39:58 -04:00 |
|
ElizaWszola
|
9239bf718e
|
[Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com>
|
2025-03-27 00:54:44 +00:00 |
|
Matthew Vine
|
7a6d45bc8a
|
Support FIPS enabled machines with MD5 hashing (#15299)
Signed-off-by: Matthew Vine <32849887+MattTheCuber@users.noreply.github.com>
|
2025-03-26 20:19:46 -04:00 |
|
Chengji Yao
|
e74ff409e0
|
[TPU] support disabling xla compilation cache (#15567)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-03-27 00:09:28 +00:00 |
|
Wes
|
7a888271f5
|
Use Cache Hinting for fused_moe kernel (#15511)
|
2025-03-26 23:21:34 +00:00 |
|
Alexander Matveev
|
9d119a86ae
|
[V1] TPU CI - Fix test_compilation.py (#15570)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-03-26 21:51:54 +00:00 |
|
Alexander Matveev
|
b2e85e26f4
|
[V1] TPU - Revert to exponential padding by default (#15565)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-03-26 21:35:05 +00:00 |
|
Alexei-V-Ivanov-AMD
|
dd8a29da99
|
Applying some fixes for K8s agents in CI (#15493)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2025-03-26 20:35:11 +00:00 |
|
marko
|
27df5199d9
|
Support SHA256 as hash function in prefix caching (#15297)
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
|
2025-03-26 11:11:28 -07:00 |
|