Nick Hill
|
5ea5c514da
|
[BugFix] Increase timeout for startup failure test (#17642)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-05-05 20:53:19 +00:00 |
|
Russell Bryant
|
d3efde8176
|
[Benchmarks] Remove invalid option under V1 engine (#17651)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-05 16:30:22 -04:00 |
|
Thomas J. Fan
|
aea302be6c
|
Use git-path commit in hook (#17616)
Signed-off-by: Thomas J. Fan <thomasjpfan@gmail.com>
|
2025-05-05 17:55:32 +00:00 |
|
Isotr0py
|
cc05b90d86
|
[Doc] Fix broken cuda installation doc rendering (#17654)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-05 17:52:40 +00:00 |
|
Jinzhen Lin
|
1d0c9d6b2d
|
[Kernel] some optimizations for dense marlin and moe marlin (#16850)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-05-05 09:39:30 -07:00 |
|
Tyler Michael Smith
|
f62cad6431
|
[Build/CI] Upgrade CUTLASS to 3.9.2 (#17641)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-05-04 19:23:17 -07:00 |
|
Chauncey
|
5394ad7387
|
[Bugfix] fix KeyError on top logprobs are special tokens (#17637)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-05-04 19:22:35 -07:00 |
|
Tyler Michael Smith
|
68e1ee0072
|
[Bugfix][Easy] Fix whitespace in shm_broadcast.py logging (#17635)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-05-04 19:20:19 -07:00 |
|
Cyrus Leung
|
2858830c39
|
[Bugfix] Prioritize dtype in root config before checking text config (#17629)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-04 12:43:05 +00:00 |
|
Harry Mellor
|
d6484ef3c3
|
Add full API docs and improve the UX of navigating them (#17485)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-03 19:42:43 -07:00 |
|
Cyrus Leung
|
46fae69cf0
|
[Misc] V0 fallback for --enable-prompt-embeds (#17615)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-03 22:59:24 +00:00 |
|
Isotr0py
|
f66f1e0fa3
|
[Bugfix] Fix broken Qwen2.5-omni tests (#17613)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-03 17:08:14 +00:00 |
|
Cyrus Leung
|
887d7af882
|
[Core] Gate prompt_embeds behind a feature flag (#17607)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-04 00:19:20 +08:00 |
|
Gregory Shtrasberg
|
a92842454c
|
[Bugfix][ROCm] Using device_type because on ROCm the API is still torch.cuda (#17601)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-05-02 22:25:47 -07:00 |
|
Tyler Michael Smith
|
c8386fa61d
|
[Build/CI] Upgrade CUTLASS to 3.9.1 (#17602)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-05-02 22:25:14 -07:00 |
|
Chenyaaang
|
87baebebd8
|
[Frontend][TPU] Add TPU default max-num-batched-tokens based on device name (#17508)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-05-02 21:42:44 -07:00 |
|
rasmith
|
e3d0a1d190
|
[Quantizaton] [AMD] Add support for running DeepSeek int8 w8a8 MoE on ROCm (#17558)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-05-02 21:41:10 -07:00 |
|
22quinn
|
d47b605eca
|
Update test requirements to CUDA 12.8 (#17576)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-05-02 21:40:15 -07:00 |
|
Liangfu Chen
|
22c6f6397f
|
[Neuron][Build] Require setuptools >= 77.0.3 for PEP 639 (#17603)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
|
2025-05-03 02:41:59 +00:00 |
|
Kevin H. Luu
|
3ec97e2cc5
|
[release] Add command to clean up Docker containers/images in TPU release machine (#17606)
|
2025-05-02 18:54:34 -07:00 |
|
Eric Hartford
|
9b103a1d76
|
fix typo in logging (#17605)
|
2025-05-02 18:04:40 -07:00 |
|
Richard Zou
|
b90b0852e9
|
[easy] Print number of needed GPUs in skip message (#17594)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-05-02 15:27:43 -07:00 |
|
Xiaodong Wang
|
9352cdb56d
|
[Hardware][AMD] Improve OAM device ID + llama4 Maverick MOE tuning (#16263)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Lu Fang <lufang@fb.com>
|
2025-05-02 19:44:19 +00:00 |
|
Zhiyu
|
182f40ea8b
|
Add NVIDIA TensorRT Model Optimizer in vLLM documentation (#17561)
|
2025-05-02 11:36:46 -07:00 |
|
Caleb_Du
|
3e887d2e0c
|
permute/unpermute kernel for moe optimization (#14568)
Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>
|
2025-05-02 11:31:55 -07:00 |
|
Lucas Wilkinson
|
0f87d8f7b2
|
[BugFix][Attention] Fix sliding window attention in V1 giving incorrect results (#17574)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-02 11:01:38 -07:00 |
|
Hui Liu
|
4c33d67321
|
[Bugfix] fix tmp_out and exp_sums dimensions (#17438)
Signed-off-by: Hui Liu <96135754+hliuca@users.noreply.github.com>
|
2025-05-02 16:44:07 +00:00 |
|
Cyrus Leung
|
cb234955df
|
[Misc] Clean up input processing (#17582)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-02 08:11:53 -07:00 |
|
Reid
|
3a500cd0b6
|
[doc] miss result (#17589)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-02 07:04:49 -07:00 |
|
Michael Goin
|
868c546da4
|
Support W8A8 INT8 MoE for compressed-tensors (#16745)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-02 10:03:32 -04:00 |
|
Cyrus Leung
|
99404f53c7
|
[Security] Fix image hash collision (#17378)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-02 08:36:39 -04:00 |
|
Harry Mellor
|
785d75a03b
|
Automatically tell users that dict args must be valid JSON in CLI (#17577)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-02 05:24:55 -07:00 |
|
Reid
|
6d1479ca4b
|
[doc] add the print result (#17584)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-02 05:24:45 -07:00 |
|
Yang Wang
|
b8b0859b5c
|
add more pytorch related tests for torch nightly (#17422)
Signed-off-by: Yang Wang <elainewy@meta.com>
|
2025-05-02 03:29:59 -07:00 |
|
Cyrus Leung
|
d7543862bd
|
[Misc] Rename assets for testing (#17575)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-02 03:29:25 -07:00 |
|
Robert Shaw
|
c777df79f7
|
[BugFix] Fix Memory Leak (#17567)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-05-02 01:07:03 -07:00 |
|
Andrew Sansom
|
cc2a77d7f1
|
[Core] [Bugfix] Add Input Embeddings (#15428)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: 临景 <linjing.yx@alibaba-inc.com>
Co-authored-by: Bryce1010 <bryceyx@gmail.com>
Co-authored-by: Nan2018 <nan@protopia.ai>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-02 01:06:39 -07:00 |
|
Isotr0py
|
9e2de9b9e9
|
[Bugifx] Remove TritonPlaceholder from sys.modules (#17317)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-02 00:45:01 -07:00 |
|
Jerry Zhang
|
109e15a335
|
Add pt_load_map_location to allow loading to cuda (#16869)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-05-01 23:23:42 -07:00 |
|
Michael Goin
|
f192ca90e6
|
Fix PixtralHF missing spatial_merge_size (#17571)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-01 22:14:09 -07:00 |
|
Cyrus Leung
|
f89d0e11bf
|
[Misc] Continue refactoring model tests (#17573)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-01 22:06:08 -07:00 |
|
Michael Goin
|
b4003d11fc
|
Check if bitblas is installed during support check (#17572)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-02 04:32:54 +00:00 |
|
Michael Goin
|
292fc59d61
|
[CI] Actually run tests/kv_transfer/test_disagg.py in CI (#17555)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-02 04:05:04 +00:00 |
|
Lucas Wilkinson
|
afcb3f8863
|
[Attention] MLA move o_proj q_proj into cuda-graph region (#17484)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-02 03:16:26 +00:00 |
|
David Xia
|
afb12e4294
|
[Doc] note that not all unit tests pass on CPU platforms (#17554)
Signed-off-by: David Xia <david@davidxia.com>
|
2025-05-02 02:57:21 +00:00 |
|
Michael Goin
|
24aebae177
|
[Bugfix] Disable gptq_bitblas for <SM80 to fix GPTQ on V100/T4 (#17541)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-01 17:59:35 -07:00 |
|
qizixi
|
39c0813a7f
|
[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE3 (#17504)
Signed-off-by: qizixi <qizixi@meta.com>
|
2025-05-01 16:19:30 -07:00 |
|
Chenyaaang
|
9b70e2b4c1
|
[Misc][Tools][Benchmark] Publish script to auto tune server parameters (#17207)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-05-01 19:53:03 +00:00 |
|
Chen Xia
|
173daac19d
|
[Bug]change the position of cuda_graph_sizes in dataclasses (#17548)
Signed-off-by: CXIAAAAA <cxia0209@gmail.com>
|
2025-05-01 11:52:37 -07:00 |
|
sstamenk
|
04f2cfc894
|
Remove duplicate code from dbrx.py (#17550)
|
2025-05-01 11:51:58 -07:00 |
|