Luka Govedič
|
3597b06a4f
|
[CUDA] Enable full cudagraph for FlashMLA (#18581)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-06-13 18:12:26 +00:00 |
|
qscqesze
|
a24cb91600
|
[Model] Fix minimax model cache & lm_head precision (#19592)
Signed-off-by: qingjun <qingjun@minimaxi.com>
|
2025-06-13 12:08:20 +00:00 |
|
Nick Hill
|
7e8d97dd3f
|
[BugFix] Honor enable_caching in connector-delayed kvcache load case (#19435)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-13 09:46:32 +00:00 |
|
youkaichao
|
d70bc7c029
|
[torch.compile] reorganize the cache directory to support compiling multiple models (#19064)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-06-13 15:23:25 +08:00 |
|
Boyuan Feng
|
ce688ad46e
|
use base version for version comparison (#19587)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-06-13 15:09:34 +08:00 |
|
汪志鹏
|
cefdb9962d
|
[Fix] The zip function in Python 3.9 does not have the strict argument (#19549)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-06-13 14:57:48 +08:00 |
|
Li, Jiang
|
6458721108
|
[CPU] Refine default config for the CPU backend (#19539)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-06-13 13:27:39 +08:00 |
|
Hyogeun Oh (오효근)
|
bb4a0decef
|
[Misc] Correct broken docs link (#19553)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
|
2025-06-12 22:27:13 -07:00 |
|
qizixi
|
c68698b326
|
[Bugfix] Fix EAGLE vocab embedding for multimodal target model (#19570)
Signed-off-by: qizixi <qizixi@meta.com>
|
2025-06-12 23:09:19 -04:00 |
|
Varun Sundar Rabindranath
|
e3b12667d4
|
[BugFix] : Fix Batched DeepGemm Experts (#19515)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-06-12 20:43:02 -06:00 |
|
Russell Bryant
|
c57bb199b3
|
[V1] Resolve failed concurrent structured output requests (#19565)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-06-12 23:30:09 +00:00 |
|
Michael Goin
|
a3319f4f04
|
[Bugfix] Enforce contiguous input for dynamic_per_token FP8/INT8 quant (#19452)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-12 15:39:15 -04:00 |
|
Varun Sundar Rabindranath
|
9d880f594d
|
[Misc] Turn MOE_DP_CHUNK_SIZE into an env var (#19506)
|
2025-06-12 18:01:16 +00:00 |
|
Ekagra Ranjan
|
017ef648e9
|
[Spec Decode][Benchmark] Generalize spec decode offline benchmark to more methods and datasets (#18847)
|
2025-06-12 10:30:56 -07:00 |
|
Luka Govedič
|
f98548b9da
|
[torch.compile][ROCm] Fuse quantization onto attention using a torch.compile pass (#16756)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-12 08:31:04 -07:00 |
|
mobicham
|
96846bb360
|
Fix TorchAOConfig skip layers (#19265)
Signed-off-by: mobicham <hicham@mobiuslabs.com>
|
2025-06-12 22:22:53 +08:00 |
|
Nicolò Lucchesi
|
1129e2b1ab
|
[V1][NixlConnector] Drop num_blocks check (#19532)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-06-12 12:36:14 +00:00 |
|
Jee Jee Li
|
73e2e0118f
|
[Quantization] Improve AWQ logic (#19431)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-12 11:02:11 +00:00 |
|
jmswen
|
c9280e6346
|
[Bugfix] Respect num-gpu-blocks-override in v1 (#19503)
Signed-off-by: Jon Swenson <jmswen@gmail.com>
|
2025-06-12 11:00:23 +00:00 |
|
Michael Goin
|
af09b3f0a0
|
[Bugfix][V1] Allow manual FlashAttention for Blackwell (#19492)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-12 10:40:24 +00:00 |
|
rasmith
|
2e090bd5df
|
[AMD][Kernel][BugFix] fix test_rocm_compressed_tensors_w8a8 for rocm (#19509)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-06-12 07:14:24 +00:00 |
|
wonjun Jang
|
1b0b065eb5
|
[BugFix] Handle missing sep_token for Qwen3-Reranker in Score API (#19522)
Signed-off-by: strutive07 <strutive07@gmail.com>
|
2025-06-12 07:00:47 +00:00 |
|
Nick Hill
|
d5bdf899e4
|
[BugFix] Work-around incremental detokenization edge case error (#19449)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-12 06:43:20 +00:00 |
|
22quinn
|
7e3e74c97c
|
[Frontend] Improve error message in tool_choice validation (#19239)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-12 01:13:00 -04:00 |
|
Brayden Zhong
|
3f6341bf7f
|
Add Triton Fused MoE kernel config for E=16 on B200 (#19518)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-06-12 04:31:51 +00:00 |
|
Varun Sundar Rabindranath
|
e5d35d62f5
|
[BugFix] Force registration of w8a8_block_fp8_matmul_deepgemm via lazy import (#19514)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-06-12 04:28:12 +00:00 |
|
Ning Xie
|
2f1c19b245
|
[CI] change spell checker from codespell to typos (#18711)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-11 19:57:10 -07:00 |
|
Robert Shaw
|
97a9465bbc
|
[UX] Add Feedback During CUDAGraph Capture (#19501)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-11 21:09:05 +00:00 |
|
rasmith
|
c7ea0b56cd
|
[AMD] [Quantization] Add override flag for attention dtype instead of using kv_cache_dtype trigger (#17331)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-06-11 15:53:28 -04:00 |
|
bnellnm
|
29fa5cac1c
|
[Kernels] Add activation chunking logic to FusedMoEModularKernel (#19168)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-06-11 12:53:10 -04:00 |
|
Jee Jee Li
|
04a55612dd
|
[Misc] Fix misleading ROCm warning (#19486)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-12 00:12:10 +08:00 |
|
Ximingwang-09
|
3c8694eabe
|
Fix some typo (#19475)
Signed-off-by: ximing.wxm <ximing.wxm@antgroup.com>
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
|
2025-06-11 10:36:04 +00:00 |
|
Michael Goin
|
7484e1fce2
|
Add cache to cuda get_device_capability (#19436)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-11 17:37:05 +08:00 |
|
Cyrus Leung
|
a2142f0196
|
Support non-string values in JSON keys from CLI (#19471)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-11 09:34:04 +00:00 |
|
Lu Fang
|
871d6b7c74
|
[Misc] Reduce warning message introduced in env_override (#19476)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-11 17:29:54 +08:00 |
|
Cyrus Leung
|
68b4a26149
|
[Doc] Update V1 User Guide for Hardware and Models (#19474)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-11 00:49:06 -07:00 |
|
artetaout
|
b8e809a057
|
[Kernel] Support deep_gemm for linear methods (#19085)
Signed-off-by: artetaout <lulala341@gmail.com>
|
2025-06-11 15:14:45 +08:00 |
|
Junhao Li
|
2d40665fe8
|
Add fused MOE config for Qwen3 30B A3B on B200 (#19455)
Signed-off-by: Junhao Li <junhao@ubicloud.com>
|
2025-06-11 13:43:46 +08:00 |
|
Lukas Geiger
|
96ada386b7
|
[Misc] Remove unused MultiModalHasher.hash_prompt_mm_data (#19422)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-06-11 05:18:57 +00:00 |
|
wang.yuqi
|
3952731e8f
|
[New Model]: Support Qwen3 Embedding & Reranker (#19260)
|
2025-06-10 20:07:30 -07:00 |
|
Richard Zou
|
77f0d465d0
|
[BugFix] Allow use_cudagraph to work with dynamic VLLM_USE_V1 (#19390)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-06-11 07:54:41 +08:00 |
|
Xu Wenqing
|
22c3c0aa4a
|
Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B-FP8 (#19401)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
|
2025-06-11 07:23:57 +08:00 |
|
py-andy-c
|
33f8dba7c6
|
[Model] use AutoWeightsLoader for commandr (#19399)
Signed-off-by: py-andy-c <pychen1017@gmail.com>
|
2025-06-10 22:42:21 +00:00 |
|
Gregory Shtrasberg
|
5241ca50d6
|
[ROCm][V1] Adding ROCm to the list of plaforms using V1 by default (#19440)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-06-10 22:06:15 +00:00 |
|
Jee Jee Li
|
b6553be1bc
|
[Misc] Slight improvement of the BNB (#19418)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-06-10 13:51:49 +00:00 |
|
Rachel Guo
|
467bef18a3
|
[BugFix][FlashInfer] Fix attention backend interface mismatch with unexpected keyword use_irope (#19134)
Signed-off-by: Yunqiu Guo <guorachel@meta.com>
|
2025-06-10 16:48:51 +08:00 |
|
Isotr0py
|
5f1ac1e1d1
|
Revert "[v1] Add fp32 support to v1 engine through flex attn" (#19404)
|
2025-06-10 01:30:20 -07:00 |
|
Louie Tsai
|
9368cc90b2
|
Automatically bind CPU OMP Threads of a rank to CPU ids of a NUMA node. (#17930)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com>
|
2025-06-10 06:22:05 +00:00 |
|
Lukas Geiger
|
319cb1e351
|
[Core] Batch multi modal input using pinned memory (#19169)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-06-10 13:44:59 +08:00 |
|
Li Wang
|
1efef71645
|
[Bugfix] Fix modelscope token passed in (#19389)
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-10 13:39:37 +08:00 |
|