7203 Commits

Author SHA1 Message Date
rshaw@neuralmagic.com
15bc311d28 updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-30 20:09:12 +00:00
rshaw@neuralmagic.com
70b76554d1 updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-30 20:01:56 +00:00
rshaw@neuralmagic.com
128eca2ce3 update for use batched
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-30 19:48:33 +00:00
rshaw@neuralmagic.com
6babd39366 print out
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-30 19:30:14 +00:00
rshaw@neuralmagic.com
491347cbc3 updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-30 13:42:36 +00:00
rshaw@neuralmagic.com
569de248cb cleanup
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-30 12:36:19 +00:00
rshaw@neuralmagic.com
f015919fc8 add comment about hack
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-30 12:25:48 +00:00
Robert Shaw
39e6bd19fd
Merge pull request #17 from praveingk/batching
Load balance across multiple workers
2025-06-30 08:21:03 -04:00
Pravein Govindan Kannan
c4b9b2e682 Increase chunk size to reduce no. of threads 2025-06-30 15:03:52 +05:30
Pravein Govindan Kannan
17546dc79f Add threading for load-balancing to different workers 2025-06-30 14:40:18 +05:30
rshaw@neuralmagic.com
5d8b665366 updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-30 01:59:02 +00:00
rshaw@neuralmagic.com
cda2f2c453 updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-30 01:54:43 +00:00
rshaw@neuralmagic.com
b9be6fd35a updated to make send_notif work
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-30 01:51:37 +00:00
rshaw@neuralmagic.com
8283d7b85c updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-30 01:45:03 +00:00
rshaw@neuralmagic.com
c481d30c17 update
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-30 01:39:15 +00:00
rshaw@neuralmagic.com
dedb1a5424 updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-30 01:30:06 +00:00
rshaw@neuralmagic.com
ee2a4b0889 updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-30 01:11:22 +00:00
rshaw@neuralmagic.com
f9617c75ad updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-27 18:48:05 +00:00
rshaw@neuralmagic.com
5d2eac70e7 update
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-27 15:12:03 +00:00
rshaw@neuralmagic.com
fea0731cf4 update
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-27 15:11:23 +00:00
rshaw@neuralmagic.com
9eaa81b9c9 updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-19 13:18:39 +00:00
rshaw@neuralmagic.com
852ee4b132 updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-19 13:16:50 +00:00
rshaw@neuralmagic.com
87bf6812b2 updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-19 13:15:50 +00:00
rshaw@neuralmagic.com
5b8c64dc77 updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-19 13:12:43 +00:00
rshaw@neuralmagic.com
489e5ba5ce updated
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
2025-06-19 13:10:52 +00:00
Chenyaaang
dac8cc49f4
[TPU] Update torch version to include paged attention kernel change (#19706)
Signed-off-by: Chenyaaang <chenyangli@google.com>
2025-06-17 22:24:49 +00:00
Charlie Fu
a44b1c951d
[Feature][ROCm] Add full graph capture support for TritonAttentionBackend (#19158)
Signed-off-by: charlifu <charlifu@amd.com>
2025-06-17 17:03:06 -04:00
Michael Goin
b447624ee3
[Bugfix] Fix faulty triton importing logic when using Ray for DP (#19734)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-17 20:59:29 +00:00
Jiayi Yao
cda92307c1
[Misc] Update lmcache connector with the latest connector apis (#19441)
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>
2025-06-17 19:57:54 +00:00
Michael Goin
bf57ccc5c2
Remove sm120 arch from sm100 cutlass kernel arch list (#19716)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-17 11:49:39 -07:00
Wentao Ye
ffb2cd6b54
[Perf] Optimize moe_align_block_size CUDA kernel (#19572)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-06-17 11:49:26 -07:00
Isotr0py
ca94d7fa00
[Bugfix] Update multimodel models mapping to fit new checkpoint after Transformers v4.52 (#19151)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-17 15:58:38 +00:00
CYJiang
5a1c2e15d8
[Mis] remove duplicate engine status checks (#19647)
Signed-off-by: googs1025 <googs1025@gmail.com>
2025-06-17 08:17:38 -07:00
Nicolò Lucchesi
4c8f64faa7
[V1][Kernel] Flashinfer HND KV cache layout (#19280)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-06-17 09:09:22 -04:00
David Xia
93aee29fdb
[doc] split "Other AI Accelerators" tabs (#19708) 2025-06-17 22:05:29 +09:00
Reid
154d063b9f
[doc][mkdocs] Add edit button to documentation (#19637)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-17 11:10:31 +00:00
jvlunteren
ccd7c05089
[Kernel] Add Split-KV Support to Unified Triton Attention Kernel (#19152)
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
2025-06-17 10:45:07 +00:00
Huy Do
c48c6c4008
Add a doc on how to update PyTorch version (#19705) 2025-06-17 18:10:37 +08:00
Isotr0py
aed8468642
[Doc] Add missing llava family multi-image examples (#19698)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-17 07:05:21 +00:00
quanliu
5c76b9cdaf
[Core] add remove_seq_from_computed_blocks_tracker to BlockSpaceManager (#19686)
Signed-off-by: 刘全 <quan.liu2@dbappsecurity.com.cn>
Co-authored-by: 刘全 <quan.liu2@dbappsecurity.com.cn>
2025-06-17 04:40:58 +00:00
Driss Guessous
ddfed314f9
Fixes IMA for TP w/ flex-attention (#19712)
Signed-off-by: drisspg <drisspguessous@gmail.com>
2025-06-17 04:01:50 +00:00
Di Liu
5b3ad5ecf2
[DOC] fix doc typos (#19600)
Signed-off-by: Di Liu <liu-di@sjtu.edu.cn>
2025-06-17 11:34:53 +08:00
nguyenhoangthuan99
ede5c4ebdf
[Frontend] add chunking audio for > 30s audio (#19597)
Signed-off-by: nguyenhoangthuan99 <thuanhppro12@gmail.com>
2025-06-17 11:34:00 +08:00
Lucas Wilkinson
07334959d8
[Wheel Size] Only build FA2 8.0+PTX (#19336) 2025-06-17 12:32:49 +09:00
David Xia
119f683949
[doc] add project flag to gcloud TPU command (#19664)
Signed-off-by: David Xia <david@davidxia.com>
2025-06-17 01:00:09 +00:00
Conroy Cheers
0860087aff
[Fix] Fall back to Gloo when NCCL backend is unavailable (#19641)
Signed-off-by: conroy-cheers <conroy@corncheese.org>
2025-06-17 08:42:14 +08:00
Dipika Sikka
6bc7b57315
[Quantization] Remove FP4 emulation; Fall-back to marlin for device < 100 (#19563) 2025-06-16 17:33:51 -04:00
Russell Bryant
90f9c2eb5c
[V1] Change return type on get_multimodal_embeddings() (#19446)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-06-16 13:32:15 -04:00
qscqesze
387bdf0ab9
[Model] Add support for MiniMaxM1ForCausalLM (shares architecture with MiniMaxText01ForCausalLM) (#19677)
Signed-off-by: QscQ <qscqesze@gmail.com>
2025-06-16 09:47:14 -07:00
bnellnm
5e5baa91aa
[Kernels] Use empty for modular MoE workspaces (#19667)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-06-16 14:58:01 +00:00