Harry Mellor
|
45c3936e94
|
[Docs] Hide the navigation and toc sidebars on home page (#22749)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-12 17:12:26 -07:00 |
|
Frank Wang
|
ba81acbdc1
|
[Bugfix] Bump DeepGEMM Version to Fix SMXX Layout Issues (#22606)
Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
|
2025-08-12 15:43:06 -07:00 |
|
RUTHLESS-BOT
|
53c730286c
|
[Misc] parametrize 'dtype' in test_flash_mla (#22641)
Signed-off-by: RUTHLESS-BOT <wujiafeng@cmbchina.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-12 16:31:48 -04:00 |
|
zifeitong
|
6534d2fc97
|
Fix torch version check for SM100 mxfp4 (#22535)
Signed-off-by: Zifei Tong <zifeitong@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-08-12 12:54:42 -07:00 |
|
Nicolò Lucchesi
|
422f22e012
|
[CI][Nixl] Check kv cache layout during handshake (#22745)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-08-12 12:53:52 -07:00 |
|
Xiaozhu Meng
|
6bd8ebf026
|
[Kernel][AMD] Avoid D2H copy and cumsum kernel (#22683)
Signed-off-by: Xiaozhu <mxz297@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-12 12:53:36 -07:00 |
|
Wentao Ye
|
dab4f9f764
|
[Chore] Update CODEOWNERS to include @yewentao256 for CUDA kernels, attention backends, quantization, and related tests (#22741)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-13 00:50:31 +08:00 |
|
TeeKen Lau
|
c42fe0b63a
|
Add more test scenario for tensor schema (#22733)
Signed-off-by: teekenl <teekenlau@gmail.com>
|
2025-08-12 16:34:41 +00:00 |
|
Rahul Tuli
|
5a4b4b3729
|
Add: SupportsEagle3 interface for explicit EAGLE3 support (#22642)
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
|
2025-08-12 09:24:52 -07:00 |
|
Daniel Serebrenik
|
e5d3d63c42
|
[Benchmark] Fix terminal colors in benchmark_serving_multi_turn (python 3.12) (#22730)
Signed-off-by: daniels <daniels@pliops.com>
|
2025-08-12 14:41:37 +00:00 |
|
Nicolò Lucchesi
|
3d9d40efde
|
[Bugfix][CI] Fix test_remote_decode_lifecycle.py::test_short_prompt_lifecycle (#22727)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-08-12 07:30:17 -07:00 |
|
Po-Han Huang (NVIDIA)
|
67c153b88a
|
Fix Llama4 FlashInfer FP4 MoE issues (#22511)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
|
2025-08-12 05:50:59 -07:00 |
|
wang.yuqi
|
f7ad6a1eb3
|
[CI Failure] fix tests/entrypoints/openai/test_skip_tokenizer.py (#22708)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-12 05:42:58 -07:00 |
|
Harry Mellor
|
80bb1e8afe
|
Officially support SmolLM3 using the Transformers backend (#22665)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-12 05:38:48 -07:00 |
|
Nicolò Lucchesi
|
d030b01548
|
[BugFix][Nixl][PD] Fix heterogenous TP (#22663)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-08-12 05:37:30 -07:00 |
|
Harry Mellor
|
767e63b860
|
[Docs] Improve docs navigation (#22720)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-12 04:25:55 -07:00 |
|
Yongye Zhu
|
007dd90859
|
[gpt-oss] Enable gpt-oss on ampere (#22714)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-08-12 03:21:44 -07:00 |
|
Jee Jee Li
|
b8a9d0e429
|
[Misc] remove GH discussions link (#22722)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-12 03:15:33 -07:00 |
|
zejunchen-zejun
|
50f2aae1b4
|
[LMCache][Example] Align the PYTHONHASHSEED for prefillers and decoders for KV chunks hashing (#21161)
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
|
2025-08-12 02:05:14 -07:00 |
|
RishiAstra
|
46ae7f6666
|
[Bugfix] Mamba2 SSD varlen bug fix initstates decay, improve test, assert chunk pwr 2 (#21783)
Signed-off-by: Rishi Astra <40644327+RishiAstra@users.noreply.github.com>
|
2025-08-12 02:04:37 -07:00 |
|
Jun-Howie
|
1ece7f30ba
|
Fix: AWQ Marlin get_quant_method does not recognize "modules_to_not_convert" (#21888)
Signed-off-by: JunHowie <JunHowie@aliyun.com>
Co-authored-by: JunHowie <JunHowie@aliyun.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-12 02:03:53 -07:00 |
|
phantomlei
|
bc8372efc3
|
[Bugfix] Fix erroneous randomly generated cases in bad word testing (#22170)
Signed-off-by: phantomlei <phantomlei3@gmail.com>
|
2025-08-12 02:03:22 -07:00 |
|
Sugar-zsg
|
8d17fa633e
|
[V0] Correct CUDA Graph capture for encoder-decoder models (#22630)
|
2025-08-12 02:01:08 -07:00 |
|
dongluw
|
9f909b8996
|
[New Model] Support Command-A-Vision (#22660)
Signed-off-by: donglu <donglu@cohere.com>
|
2025-08-12 01:39:54 -07:00 |
|
Chendi.Xue
|
59f3b93636
|
[DOC] update v1_guide with INTEL HW (#22679)
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
|
2025-08-12 01:22:49 -07:00 |
|
Harry Mellor
|
78077d5417
|
Move SchedulerConfig from config/__init__.py to config/scheduler.py (#22626)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-12 00:23:49 -07:00 |
|
wang.yuqi
|
6d729c43fb
|
[Bugfix] Fix ModernBert load & Enable sliding window attention for bidirectional attention. (#22637)
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-08-12 00:23:17 -07:00 |
|
Sooraj S
|
2f4657952b
|
[doc] Update x86 CPU-inference installation doc to reflect optionality of AVX512f (#22707)
Signed-off-by: Sooraj S <94284954+sooraj-satheesh@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com>
|
2025-08-12 00:21:08 -07:00 |
|
Hongsheng Liu
|
3a7e3bbdd2
|
[Doc] Added unmentioned required option "method" in the usage of EAGLE-3 based models (#21737)
Signed-off-by: Dilute-l <dilu2333@163.com>
Co-authored-by: Dilute-l <dilu2333@163.com>
|
2025-08-12 00:14:51 -07:00 |
|
Harry Mellor
|
4fbd8bb597
|
Fix passing SpeculativeConfig from the CLI (#22652)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-11 22:13:32 -07:00 |
|
Chen Zhang
|
ad344ef552
|
[gpt-oss] Small bug fixes for frontend (#22512)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-11 22:04:38 -07:00 |
|
Chen Zhang
|
bbaf9e9cb1
|
[gpt-oss] Fix mxfp4 support (#22700)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-11 21:22:26 -07:00 |
|
Benji Beck
|
4678503476
|
Migrate MiniCPMVImageInputs to TensorSchema (#21939)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-08-11 20:43:37 -07:00 |
|
Michael Goin
|
93d0652433
|
[CI] Increase timeout for test_completion_with_image_embeds (#22670)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-11 20:31:36 -07:00 |
|
Michael Goin
|
ea1292ad3e
|
[CI Failure] Use float32 for tests/entrypoints/openai/test_audio.py (#22686)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-11 20:20:42 -07:00 |
|
Po-Han Huang (NVIDIA)
|
dc5e4a653c
|
Upgrade FlashInfer to v0.2.11 (#22613)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-08-11 19:58:41 -07:00 |
|
Harry Mellor
|
839ab00349
|
Re-enable Xet on TPU tests now that hf_xet has been updated (#22666)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-11 19:54:40 -07:00 |
|
Andy Chen
|
9b94d6ec8f
|
Enable 4bit bnb prequant MOE (#21548)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-11 19:02:14 -07:00 |
|
Chen Zhang
|
1891a265d3
|
[gpt-oss] Add test for response API + harmony (but skipped) (#22554)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-11 17:47:24 -07:00 |
|
Chen Zhang
|
95a935fc48
|
[gpt-oss] Support streaming in response API (#22431)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-11 17:46:59 -07:00 |
|
Harry Mellor
|
458e74eb90
|
Support more parallel styles in Transformers backend TP (#22651)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-11 10:42:48 -07:00 |
|
TJian
|
65abe111a3
|
[CI] Skip Tree Attn Test in test_max_len.py to unblock CI (#22664)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-08-11 10:36:05 -07:00 |
|
22quinn
|
807d21b80d
|
[BugFix] [Spec Decode] Remove LlamaForCausalLMEagle3 to fix CI (#22611)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-11 10:31:36 -07:00 |
|
Isotr0py
|
c90fb03df5
|
[CI/Build] Skip Mllama HF runner tests with Transformers v4.55.0 (#22659)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-08-11 10:00:58 -07:00 |
|
wang.yuqi
|
84cf78acee
|
[Model] Pooling models default to using chunked prefill & prefix caching if supported. (#20930)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-11 09:41:37 -07:00 |
|
GuanLuo
|
16fb668b61
|
fix: NIXL connector transfers partial block to pass full multi-modal context (#21074)
Signed-off-by: GuanLuo <gluo@nvidia.com>
|
2025-08-11 09:40:55 -07:00 |
|
Wentao Ye
|
f7dcce7a4a
|
[Feature] Add VLLM_USE_DEEP_GEMM_E8M0 Env to Control E8M0 Scale (#21968)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-11 09:39:08 -07:00 |
|
Isotr0py
|
8e13d9fe6d
|
[Misc] Further clean up some redundant config definitions (#22649)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-08-11 09:22:25 -07:00 |
|
Eric Curtin
|
3fa5b25845
|
Document aarch64 CPU support works (#22646)
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
|
2025-08-11 07:22:45 -07:00 |
|
danielafrimi
|
14a5d903ab
|
[Model] NemotronH Support (#22349)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
|
2025-08-11 04:09:24 -07:00 |
|