Robert Shaw
|
f16bf63877
|
updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
|
2025-07-07 01:13:20 +00:00 |
|
Robert Shaw
|
b835205d33
|
updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
|
2025-07-07 00:32:42 +00:00 |
|
Robert Shaw
|
c22a6cb1cc
|
cleanup
Signed-off-by: Robert Shaw <robshaw@redhat.com>
|
2025-07-07 00:30:51 +00:00 |
|
rshaw@neuralmagic.com
|
7fbcbbfc45
|
updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-07-01 03:15:16 +00:00 |
|
rshaw@neuralmagic.com
|
ff5a0cfa6e
|
updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-07-01 02:49:54 +00:00 |
|
rshaw@neuralmagic.com
|
56939c835d
|
updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-07-01 01:34:46 +00:00 |
|
rshaw@neuralmagic.com
|
1172b70b79
|
updated vllm
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-07-01 00:16:07 +00:00 |
|
rshaw@neuralmagic.com
|
15bc311d28
|
updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-30 20:09:12 +00:00 |
|
rshaw@neuralmagic.com
|
70b76554d1
|
updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-30 20:01:56 +00:00 |
|
rshaw@neuralmagic.com
|
128eca2ce3
|
update for use batched
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-30 19:48:33 +00:00 |
|
rshaw@neuralmagic.com
|
6babd39366
|
print out
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-30 19:30:14 +00:00 |
|
rshaw@neuralmagic.com
|
491347cbc3
|
updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-30 13:42:36 +00:00 |
|
rshaw@neuralmagic.com
|
569de248cb
|
cleanup
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-30 12:36:19 +00:00 |
|
rshaw@neuralmagic.com
|
f015919fc8
|
add comment about hack
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-30 12:25:48 +00:00 |
|
Pravein Govindan Kannan
|
c4b9b2e682
|
Increase chunk size to reduce no. of threads
|
2025-06-30 15:03:52 +05:30 |
|
Pravein Govindan Kannan
|
17546dc79f
|
Add threading for load-balancing to different workers
|
2025-06-30 14:40:18 +05:30 |
|
rshaw@neuralmagic.com
|
5d8b665366
|
updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-30 01:59:02 +00:00 |
|
rshaw@neuralmagic.com
|
cda2f2c453
|
updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-30 01:54:43 +00:00 |
|
rshaw@neuralmagic.com
|
b9be6fd35a
|
updated to make send_notif work
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-30 01:51:37 +00:00 |
|
rshaw@neuralmagic.com
|
8283d7b85c
|
updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-30 01:45:03 +00:00 |
|
rshaw@neuralmagic.com
|
c481d30c17
|
update
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-30 01:39:15 +00:00 |
|
rshaw@neuralmagic.com
|
dedb1a5424
|
updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-30 01:30:06 +00:00 |
|
rshaw@neuralmagic.com
|
ee2a4b0889
|
updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-30 01:11:22 +00:00 |
|
rshaw@neuralmagic.com
|
f9617c75ad
|
updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-27 18:48:05 +00:00 |
|
rshaw@neuralmagic.com
|
5d2eac70e7
|
update
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-27 15:12:03 +00:00 |
|
rshaw@neuralmagic.com
|
fea0731cf4
|
update
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-27 15:11:23 +00:00 |
|
rshaw@neuralmagic.com
|
5b8c64dc77
|
updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-19 13:12:43 +00:00 |
|
rshaw@neuralmagic.com
|
489e5ba5ce
|
updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-19 13:10:52 +00:00 |
|
Charlie Fu
|
a44b1c951d
|
[Feature][ROCm] Add full graph capture support for TritonAttentionBackend (#19158)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-06-17 17:03:06 -04:00 |
|
Michael Goin
|
b447624ee3
|
[Bugfix] Fix faulty triton importing logic when using Ray for DP (#19734)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-17 20:59:29 +00:00 |
|
Jiayi Yao
|
cda92307c1
|
[Misc] Update lmcache connector with the latest connector apis (#19441)
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>
|
2025-06-17 19:57:54 +00:00 |
|
Wentao Ye
|
ffb2cd6b54
|
[Perf] Optimize moe_align_block_size CUDA kernel (#19572)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-06-17 11:49:26 -07:00 |
|
Isotr0py
|
ca94d7fa00
|
[Bugfix] Update multimodel models mapping to fit new checkpoint after Transformers v4.52 (#19151)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-17 15:58:38 +00:00 |
|
CYJiang
|
5a1c2e15d8
|
[Mis] remove duplicate engine status checks (#19647)
Signed-off-by: googs1025 <googs1025@gmail.com>
|
2025-06-17 08:17:38 -07:00 |
|
Nicolò Lucchesi
|
4c8f64faa7
|
[V1][Kernel] Flashinfer HND KV cache layout (#19280)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-06-17 09:09:22 -04:00 |
|
jvlunteren
|
ccd7c05089
|
[Kernel] Add Split-KV Support to Unified Triton Attention Kernel (#19152)
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
|
2025-06-17 10:45:07 +00:00 |
|
quanliu
|
5c76b9cdaf
|
[Core] add remove_seq_from_computed_blocks_tracker to BlockSpaceManager (#19686)
Signed-off-by: 刘全 <quan.liu2@dbappsecurity.com.cn>
Co-authored-by: 刘全 <quan.liu2@dbappsecurity.com.cn>
|
2025-06-17 04:40:58 +00:00 |
|
Driss Guessous
|
ddfed314f9
|
Fixes IMA for TP w/ flex-attention (#19712)
Signed-off-by: drisspg <drisspguessous@gmail.com>
|
2025-06-17 04:01:50 +00:00 |
|
Di Liu
|
5b3ad5ecf2
|
[DOC] fix doc typos (#19600)
Signed-off-by: Di Liu <liu-di@sjtu.edu.cn>
|
2025-06-17 11:34:53 +08:00 |
|
nguyenhoangthuan99
|
ede5c4ebdf
|
[Frontend] add chunking audio for > 30s audio (#19597)
Signed-off-by: nguyenhoangthuan99 <thuanhppro12@gmail.com>
|
2025-06-17 11:34:00 +08:00 |
|
Conroy Cheers
|
0860087aff
|
[Fix] Fall back to Gloo when NCCL backend is unavailable (#19641)
Signed-off-by: conroy-cheers <conroy@corncheese.org>
|
2025-06-17 08:42:14 +08:00 |
|
Dipika Sikka
|
6bc7b57315
|
[Quantization] Remove FP4 emulation; Fall-back to marlin for device < 100 (#19563)
|
2025-06-16 17:33:51 -04:00 |
|
Russell Bryant
|
90f9c2eb5c
|
[V1] Change return type on get_multimodal_embeddings() (#19446)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-06-16 13:32:15 -04:00 |
|
qscqesze
|
387bdf0ab9
|
[Model] Add support for MiniMaxM1ForCausalLM (shares architecture with MiniMaxText01ForCausalLM) (#19677)
Signed-off-by: QscQ <qscqesze@gmail.com>
|
2025-06-16 09:47:14 -07:00 |
|
bnellnm
|
5e5baa91aa
|
[Kernels] Use empty for modular MoE workspaces (#19667)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-06-16 14:58:01 +00:00 |
|
Chauncey
|
836d4ce140
|
[Bugfix] fix missing 'finish_reason': null in streaming chat (#19662)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-06-16 14:10:39 +00:00 |
|
Isotr0py
|
1173804dca
|
[Bugfix] Fix TP inference for Flex attention backend (#19657)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-16 11:21:37 +00:00 |
|
Shawn Tan
|
4d5424029b
|
[Feature]:Allow for Granite MoE Hybrid models with _only_ shared experts. (#19652)
Signed-off-by: Shawn Tan <shawntan@ibm.com>
|
2025-06-16 11:14:18 +00:00 |
|
Nick Hill
|
ee35e96ac3
|
[BugFix] Don't catch BaseException when dumping execute_model errors (#19626)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-16 11:01:08 +00:00 |
|
Szymon Ożóg
|
dec66d253b
|
[Kernel] GGUF MMVQ kernel for multiple input vectors (#18754)
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
|
2025-06-16 17:33:26 +08:00 |
|