Nicolò Lucchesi
|
da461f3cbf
|
[TPU][V1][Bugfix] Fix w8a8 recompiilation with GSM8K (#15714)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-28 21:13:06 -07:00 |
|
Jinzhen Lin
|
5b800f0932
|
[Bugfix] set VLLM_WORKER_MULTIPROC_METHOD=spawn for vllm.entrypoionts.openai.api_server (#15700)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-03-28 21:12:26 -07:00 |
|
cyyever
|
8427f70493
|
Use numba 0.61 for python 3.10+ to support numpy>=2 (#15692)
Signed-off-by: cyy <cyyever@outlook.com>
|
2025-03-29 12:11:51 +08:00 |
|
Russell Bryant
|
7a7992085b
|
[CI] Speed up V1 structured output tests (#15718)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-28 21:10:45 -07:00 |
|
Varun Sundar Rabindranath
|
1286211f57
|
[Bugfix] LoRA V1: add and fix entrypoints tests (#15715)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-28 21:10:41 -07:00 |
|
Nick Hill
|
6d531ad7b8
|
[Misc][V1] Misc code streamlining (#15723)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-28 20:59:47 -07:00 |
|
Ce Gao
|
762b424a52
|
[Docs] Document v0 engine support in reasoning outputs (#15739)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
|
2025-03-29 03:46:57 +00:00 |
|
pengyuange
|
de1cb38769
|
[Model] Support Skywork-R1V (#15397)
Signed-off-by: jiacai.liu <932997367@qq.com>
Co-authored-by: jiacai.liu <932997367@qq.com>
|
2025-03-28 20:39:21 -07:00 |
|
Gregory Shtrasberg
|
c802f5430d
|
[ROCm][AMD][Build] Update AMD supported arch list (#15632)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-03-28 20:39:18 -07:00 |
|
simpx
|
cff8991a50
|
[Docs][V1] Optimize diagrams in prefix caching design (#15716)
|
2025-03-29 03:33:58 +00:00 |
|
daniel-salib
|
f3f8d8fff4
|
implement prometheus fast-api-instrumentor for http service metrics (#15657)
|
2025-03-29 00:12:02 +00:00 |
|
Reid
|
26df46ee59
|
[Misc] cli auto show default value (#15582)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-03-28 22:23:00 +00:00 |
|
Alexander Matveev
|
c3f687ac22
|
[V1] TPU - Fix the chunked prompt bug (#15713)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-03-28 20:19:04 +00:00 |
|
Luka Govedič
|
04437e313d
|
[Bugfix] [torch.compile] Add Dynamo metrics context during compilation (#15639)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-03-28 14:01:09 -06:00 |
|
Robert Shaw
|
038bededba
|
[TPU] [Perf] Improve Memory Usage Estimation (#15671)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-03-28 17:37:52 +00:00 |
|
shangmingc
|
d03308be0c
|
[Misc] Remove stale func in KVTransferConfig (#14746)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-03-28 17:33:32 +00:00 |
|
Cyrus Leung
|
c6bc0034d0
|
[Misc] Remove unused utils and clean up imports (#15708)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-28 09:41:16 -07:00 |
|
Woosuk Kwon
|
70e132244a
|
[Minor] Remove TGI launching script (#15646)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-28 09:30:08 -07:00 |
|
Michael Goin
|
47e9038d23
|
Fix cpu offload testing for gptq/awq/ct (#15648)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-03-29 00:29:32 +08:00 |
|
Kebe
|
432cf22a6a
|
[Bugfix] Fix regex compile display format (#15368)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-03-28 08:58:44 -07:00 |
|
Reid
|
2914006fe0
|
[doc] add missing imports (#15699)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-03-28 15:56:48 +00:00 |
|
Russell Bryant
|
7329ff5468
|
[V1] Support disable_any_whtespace for guidance backend (#15584)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-28 23:46:45 +08:00 |
|
Cyrus Leung
|
541d1df486
|
[Bugfix] embed_is_patch for Idefics3 (#15696)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-28 08:27:52 -07:00 |
|
Chauncey
|
3b00ff9138
|
[Bugfix][v1] xgrammar structured output supports Enum. (#15594)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-03-28 06:14:53 -07:00 |
|
Jee Jee Li
|
91276c5721
|
[Model] Adding torch compile annotations to chatglm (#15624)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-28 21:14:09 +08:00 |
|
Harry Mellor
|
0b4167526d
|
[Docs] Add "Generation quality changed" section to troubleshooting (#15701)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-28 13:03:21 +00:00 |
|
Reid
|
fd5fd26902
|
[Frontend] update priority for --api-key and VLLM_API_KEY (#15588)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-03-28 19:40:12 +08:00 |
|
Ce Gao
|
3bbaacbe15
|
[Bugfix][Frontend] Eliminate regex based check in reasoning full generator (#14821)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
|
2025-03-28 11:20:35 +00:00 |
|
Lize Cai
|
a10314c6b3
|
[Misc] Fix test_sleep to use query parameters (#14373)
Signed-off-by: Lize Cai <lize.cai@sap.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-03-28 18:00:14 +08:00 |
|
Jee Jee Li
|
70f2c2a709
|
[Bugfix] Fix 'InductorAdaptor object has no attribute 'cache_dir' (#15674)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-28 17:10:40 +08:00 |
|
Li, Jiang
|
280d074103
|
[CPU][CI] Improve CPU Dockerfile (#15690)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-03-28 01:36:31 -07:00 |
|
Ce Gao
|
32b14baf8a
|
[Refactor][Frontend] Keep all logic about reasoning into one class (#14428)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
|
2025-03-28 00:23:30 -07:00 |
|
Robert Shaw
|
2d9045fce8
|
[TPU][CI] Fix TPUModelRunner Test (#15667)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-03-28 00:01:26 -07:00 |
|
Cyrus Leung
|
355f66348c
|
[V1] Remove legacy input registry (#15673)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-27 23:34:34 -07:00 |
|
Cyrus Leung
|
8693e47e6a
|
[Bugfix] Fix mm_hashes forgetting to be passed (#15668)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-28 05:51:05 +00:00 |
|
Jason (Siyu) Zhu
|
cec8c7d7f8
|
Refactor error handling for multiple exceptions in preprocessing (#15650)
Signed-off-by: JasonZhu1313 <jasonchu13@outlook.com>
|
2025-03-28 03:27:20 +00:00 |
|
Gregory Shtrasberg
|
4d0ec37267
|
[Quantization][FP8] Adding support for fp8 gemm layer input in fp8 (#14578)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-03-28 02:58:16 +00:00 |
|
Chen Xia
|
e7f720ea56
|
[Misc]add coding benchmark for speculative decoding (#15303)
Signed-off-by: CXIAAAAA <cxia0209@gmail.com>
|
2025-03-28 10:47:05 +08:00 |
|
Wes
|
4ae17bf1e2
|
Revert "Use Cache Hinting for fused_moe kernel (#15511)" (#15645)
Signed-off-by: Wes Medford <wryanmedford@gmail.com>
|
2025-03-27 19:45:55 -07:00 |
|
Robert Shaw
|
8a49eea74b
|
[CI][TPU] Temporarily Disable Quant Test on TPU (#15649)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-03-27 19:45:05 -07:00 |
|
wwl2755
|
b4245a48df
|
[Doc] Fix dead links in Job Board (#15637)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-03-28 02:43:40 +00:00 |
|
Kebe
|
4e0f6076be
|
[Bugfix] Fix failure to launch in Tensor Parallel TP mode on macOS. (#14948)
Signed-off-by: Kebe <mail@kebe7jun.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-03-28 10:13:41 +08:00 |
|
Jee Jee Li
|
726efc6a32
|
[Quantization][V1] BitsAndBytes support V1 (#15611)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-28 10:12:47 +08:00 |
|
Robert Shaw
|
bd45912b99
|
[TPU] Lazy Import (#15656)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-03-28 09:57:01 +08:00 |
|
Nick Hill
|
15dac210f0
|
[V1] AsyncLLM data parallel (#13923)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-27 16:14:41 -07:00 |
|
Russell Bryant
|
112b3e5b3b
|
[CI] Update rules for applying tpu label. (#15634)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-27 22:15:26 +00:00 |
|
cnorman
|
32d669275b
|
Correct PowerPC to modern IBM Power (#15635)
Signed-off-by: Christy Norman <christy@linux.vnet.ibm.com>
|
2025-03-27 15:04:32 -07:00 |
|
Nicolò Lucchesi
|
4098b72210
|
[Bugfix][TPU][V1] Fix recompilation (#15553)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-27 19:15:06 +00:00 |
|
Harry Mellor
|
46450b8d33
|
Use absolute placement for Ask AI button (#15628)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-27 18:52:18 +00:00 |
|
Cyrus Leung
|
13ac9cab21
|
[Misc] Avoid direct access of global mm_registry in compute_encoder_budget (#15621)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-27 17:52:00 +00:00 |
|