youkaichao
|
c7c9851036
|
[ci/build] fix wheel size check (#12396)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-24 17:31:25 +08:00 |
|
Roger Wang
|
3c818bdb42
|
[Misc] Use VisionArena Dataset for VLM Benchmarking (#12389)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-01-24 00:22:04 -08:00 |
|
youkaichao
|
6dd94dbe94
|
[perf] fix perf regression from #12253 (#12380)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-24 11:34:27 +08:00 |
|
Woosuk Kwon
|
0e74d797ce
|
[V1] Increase default batch size for H100/H200 (#12369)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-01-24 03:19:55 +00:00 |
|
Dipika Sikka
|
55ef66edf4
|
Update compressed-tensors version (#12367)
|
2025-01-24 11:19:42 +08:00 |
|
omer-dayan
|
5e5630a478
|
[Bugfix] Path join when building local path for S3 clone (#12353)
Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai>
|
2025-01-24 11:06:07 +08:00 |
|
Russell Bryant
|
d3d6bb13fb
|
Set weights_only=True when using torch.load() (#12366)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-01-24 02:17:30 +00:00 |
|
Nick Hill
|
24b0205f58
|
[V1][Frontend] Coalesce bunched RequestOutputs (#12298)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2025-01-23 17:17:41 -08:00 |
|
Russell Bryant
|
c5cffcd0cd
|
[Docs] Update spec decode + structured output in compat matrix (#12373)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-01-24 01:15:52 +00:00 |
|
Woosuk Kwon
|
682b55bc07
|
[Docs] Add meetup slides (#12345)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-01-23 14:10:03 -08:00 |
|
Junichi Sato
|
9726ad676d
|
[Misc] Fix OpenAI API Compatibility Issues in Benchmark Script (#12357)
Signed-off-by: Junichi Sato <junichi.sato@sbintuitions.co.jp>
|
2025-01-23 17:02:13 -05:00 |
|
Dipika Sikka
|
eb5cb5e528
|
[BugFix] Fix parameter names and process_after_weight_loading for W4A16 MoE Group Act Order (#11528)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2025-01-23 21:40:33 +00:00 |
|
Isotr0py
|
2cbeedad09
|
[Docs] Document Phi-4 support (#12362)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-23 19:18:51 +00:00 |
|
Siyuan Liu
|
2c85529bfc
|
[TPU] Update TPU CI to use torchxla nightly on 20250122 (#12334)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-01-23 18:50:16 +00:00 |
|
Gregory Shtrasberg
|
e97f802b2d
|
[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
|
2025-01-23 18:04:03 +00:00 |
|
youkaichao
|
6e650f56a1
|
[torch.compile] decouple compile sizes and cudagraph sizes (#12243)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-24 02:01:30 +08:00 |
|
youkaichao
|
3f50c148fd
|
[core] add wake_up doc and some sanity check (#12361)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-24 02:00:50 +08:00 |
|
Isotr0py
|
8c01b8022c
|
[Bugfix] Fix broken internvl2 inference with v1 (#12360)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-23 17:20:33 +00:00 |
|
Roger Wang
|
99d01a5e3d
|
[V1] Simplify M-RoPE (#12352)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: imkero <kerorek@outlook.com>
|
2025-01-23 23:13:23 +08:00 |
|
Cyrus Leung
|
d07efb31c5
|
[Doc] Troubleshooting errors during model inspection (#12351)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-23 22:46:58 +08:00 |
|
Lucas Wilkinson
|
978b45f399
|
[Kernel] Flash Attention 3 Support (#12093)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-01-23 06:45:48 -08:00 |
|
Isotr0py
|
c5b4b11d7f
|
[Bugfix] Fix k_proj's bias for whisper self attention (#12342)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-23 10:15:33 +00:00 |
|
liuzhenwei
|
8ae5ff2009
|
[Hardware][Gaudi][BugFix] Fix dataclass error due to triton package update (#12338)
Signed-off-by: zhenwei <zhenweiliu@habana.ai>
|
2025-01-23 08:35:46 +00:00 |
|
youkaichao
|
511627445e
|
[doc] explain common errors around torch.compile (#12340)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-23 14:56:02 +08:00 |
|
Cody Yu
|
f0ef37233e
|
[V1] Add uncache_blocks (#12333)
|
2025-01-23 04:19:21 +00:00 |
|
Russell Bryant
|
7551a34032
|
[Docs] Document vulnerability disclosure process (#12326)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-01-23 03:44:09 +00:00 |
|
Michael Goin
|
01a55941f5
|
[Docs] Update FP8 KV Cache documentation (#12238)
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-01-23 11:18:09 +08:00 |
|
Alexei-V-Ivanov-AMD
|
8d7aa9de71
|
[Bugfix] Fixing AMD LoRA CI test. (#12329)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2025-01-23 10:53:02 +08:00 |
|
rasmith
|
68c4421b6d
|
[AMD][Quantization] Add TritonScaledMMLinearKernel since int8 is broken for AMD (#12282)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-01-23 00:10:37 +00:00 |
|
Nick Hill
|
aea94362c9
|
[Frontend][V1] Online serving performance improvements (#12287)
|
2025-01-22 22:22:12 +00:00 |
|
Cody Yu
|
7206ce4ce1
|
[Core] Support reset_prefix_cache (#12284)
|
2025-01-22 18:52:27 +00:00 |
|
Konrad Zawora
|
96f6a7596f
|
[Bugfix] Fix HPU multiprocessing executor (#12167)
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
|
2025-01-23 02:07:07 +08:00 |
|
Jee Jee Li
|
84bee4bd5c
|
[Misc] Improve the readability of BNB error messages (#12320)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-22 16:56:54 +00:00 |
|
Robin
|
fc66dee76d
|
[Misc] Fix the error in the tip for the --lora-modules parameter (#12319)
Signed-off-by: wangerxiao <863579016@qq.com>
|
2025-01-22 16:48:41 +00:00 |
|
Cyrus Leung
|
6609cdf019
|
[Doc] Add docs for prompt replacement (#12318)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-22 14:56:29 +00:00 |
|
Roger Wang
|
16366ee8bb
|
[Bugfix][VLM] Fix mixed-modality inference backward compatibility for V0 (#12313)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-01-22 21:06:36 +08:00 |
|
zhou fan
|
528dbcac7d
|
[Model][Bugfix]: correct Aria model output (#12309)
Signed-off-by: xffxff <1247714429@qq.com>
|
2025-01-22 11:39:19 +00:00 |
|
Cyrus Leung
|
cd7b6f0857
|
[VLM] Avoid unnecessary tokenization (#12310)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-22 11:08:31 +00:00 |
|
youkaichao
|
68ad4e3a8d
|
[Core] Support fully transparent sleep mode (#11743)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-22 14:39:32 +08:00 |
|
Mengqing Cao
|
4004f144f3
|
[Build] update requirements of no-device (#12299)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-01-22 14:29:31 +08:00 |
|
youkaichao
|
66818e5b63
|
[core] separate builder init and builder prepare for each batch (#12253)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-22 14:13:52 +08:00 |
|
Nick Hill
|
222a9dc350
|
[Benchmark] More accurate TPOT calc in benchmark_serving.py (#12288)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-01-22 13:46:14 +08:00 |
|
Cyrus Leung
|
cbdc4ad5a5
|
[Ci/Build] Fix mypy errors on main (#12296)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-22 12:06:54 +08:00 |
|
Liangfu Chen
|
016e3676e7
|
[CI] add docker volume prune to neuron CI (#12291)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
|
2025-01-22 10:47:49 +08:00 |
|
Kevin H. Luu
|
64ea24d0b3
|
[ci/lint] Add back default arg for pre-commit (#12279)
Signed-off-by: kevin <kevin@anyscale.com>
|
2025-01-22 01:15:27 +00:00 |
|
Cyrus Leung
|
df76e5af26
|
[VLM] Simplify post-processing of replacement info (#12269)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-21 16:48:13 -08:00 |
|
Hongxia Yang
|
09ccc9c8f7
|
[Documentation][AMD] Add information about prebuilt ROCm vLLM docker for perf validation purpose (#12281)
Signed-off-by: Hongxia Yang <hongxyan@amd.com>
|
2025-01-22 07:49:22 +08:00 |
|
Aleksandr Malyshev
|
69196a9bc7
|
[BUGFIX] When skip_tokenize_init and multistep are set, execution crashes (#12277)
Signed-off-by: maleksan85 <maleksan@amd.com>
Co-authored-by: maleksan85 <maleksan@amd.com>
|
2025-01-21 23:30:46 +00:00 |
|
Divakar Verma
|
2acba47d9b
|
[bugfix] moe tuning. rm is_navi() (#12273)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-01-21 22:47:32 +00:00 |
|
Jani Monoses
|
9c485d9e25
|
[Core] Free CPU pinned memory on environment cleanup (#10477)
|
2025-01-21 11:56:41 -08:00 |
|