Ilya Markov
|
0313cf854d
|
[PERF] PyTorch Symmetric Memory All-Reduce (#20759)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-22 15:39:08 -06:00 |
|
Zhewen Li
|
0483fabc74
|
[CI/Build] add EP dependencies to docker (#21976)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-08-22 13:34:40 -07:00 |
|
Shiyan Deng
|
da65bec309
|
add an env var for path to pre-downloaded flashinfer cubin files (#22675)
|
2025-08-22 19:25:45 +00:00 |
|
Isotr0py
|
4645024d3a
|
[Quantization] Allow GGUF quantization to skip unquantized layer (#23188)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-22 13:04:22 -06:00 |
|
Isotr0py
|
cd7a3df26f
|
[Bugfix] Fix broken Florence-2 model (#23426)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-08-22 17:50:52 +00:00 |
|
Isotr0py
|
32d2b4064f
|
[Model] Add Ovis2.5 PP support (#23405)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-22 17:46:34 +00:00 |
|
Didier Durand
|
22cf679aad
|
[Doc]: fix various typos in multiple files (#23179)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-08-22 10:38:46 -07:00 |
|
Yong Hoon Shin
|
b6d7d34fc6
|
Add unit tests for batched guided and non-guided requests (#23389)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-08-22 10:31:24 -07:00 |
|
Aziz
|
341923b982
|
fix(tests): Ensure reliable CUDA cache clearing in MoE test (#23416)
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-22 17:20:59 +00:00 |
|
bppps
|
424fb7a5d2
|
[BugFix] Fix the issue where image embeddings were incorrectly split.… (#23366)
Signed-off-by: bppps <bpppsaka@gmail.com>
Co-authored-by: zouyu.zzx <zouyu.zzx@alibaba-inc.com>
Co-authored-by: bppps <bpppsaka@gmail.com>
|
2025-08-22 16:56:46 +00:00 |
|
PapaGoose
|
88491c1b6b
|
[Speculators][Speculative Decoding] Fix Qwen 2 Eagle3 Support (#23337)
|
2025-08-22 16:39:19 +00:00 |
|
Martin Hickey
|
613a23b57f
|
[Bugfix]: Installing dev environment due to pydantic incompatible version (#23353)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
|
2025-08-22 16:22:29 +00:00 |
|
Burkhard Ringlein
|
51a215300b
|
[Fix] Bump triton version in rocm-build requirements (#21630)
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
|
2025-08-22 15:13:39 +00:00 |
|
Naman Lalit
|
ebe14621e3
|
[Bug fix] Dynamically setting the backend variable for genai_perf_tests in the run-nightly-benchmark script (#23375)
Signed-off-by: Naman Lalit <nl2688@nyu.edu>
|
2025-08-22 15:12:28 +00:00 |
|
Ning Xie
|
325aa3dee9
|
[Misc] local import code clean (#23420)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-22 14:01:35 +00:00 |
|
Chen Zhang
|
a073be6d87
|
[Doc] Update the doc for log probs + prefix caching (#23399)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-22 13:20:39 +00:00 |
|
杨朱 · Kiki
|
695e7adcd2
|
[misc] Remove outdate comment about runai_model_streamer (#23421)
Signed-off-by: carlory <baofa.fan@daocloud.io>
|
2025-08-22 13:08:53 +00:00 |
|
Russell Bryant
|
281710ef9a
|
[Attention] Allow V1 flash_attn to support cross-attention (#23297)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-08-22 12:10:16 +00:00 |
|
Woosuk Kwon
|
808d2e9aa0
|
[Misc] Move M-RoPE init logic to _init_mrope_positions (#23422)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-22 03:07:22 -07:00 |
|
Jee Jee Li
|
285178b3b8
|
[V0 Deprecation] Remove V0 LoRA test (#23418)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-22 09:56:51 +00:00 |
|
Li, Jiang
|
88016c372a
|
[Bugfix] Fix pooling models on CPU backend (#23392)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-08-22 09:47:17 +00:00 |
|
Benji Beck
|
998720859c
|
Migrate MiniCPMOAudioInputs to TensorSchema (#21847)
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-22 16:43:29 +08:00 |
|
Guillaume Calmettes
|
0ba1b54ac6
|
[gpt-oss] add input/output usage in responses api when harmony context is leveraged (#22667)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2025-08-22 08:32:24 +00:00 |
|
Flora Feng
|
53415653ff
|
[P/D][Nixl] Make kv cache register compatible with hybrid memory allocator (#23079)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2025-08-21 22:30:48 -07:00 |
|
Chen Zhang
|
17373dcd93
|
[Attention] Refactor AttentionMetadata Preparation for Encoder-only Models (#23154)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-22 05:05:59 +00:00 |
|
Bin Jia
|
5964069367
|
[New Model] Add Seed-Oss model (#23241)
Signed-off-by: jiabin.00 <jiabin.00@bytedance.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-22 04:58:10 +00:00 |
|
Philip Chung
|
de9c085e17
|
[Misc] Add gemma3 chat template with pythonic-style function calling (#17149)
Signed-off-by: Philip Chung <philip.f.chung@gmail.com>
|
2025-08-21 21:06:50 -07:00 |
|
Arjun Reddy
|
111692bb8c
|
[CI] Add end-to-end V1 min_tokens test coverage (#22495)
Signed-off-by: Arjun Reddy <189282188+arjunbreddy22@users.noreply.github.com>
Co-authored-by: Arjun Reddy <189282188+arjunbreddy22@users.noreply.github.com>
|
2025-08-21 22:04:07 -06:00 |
|
Wentao Ye
|
394591e343
|
[Feature] Enable DeepGEMM Linear on B200; 1.5% E2E throughput improvement (#23351)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-21 21:01:08 -07:00 |
|
Isotr0py
|
3ac849665d
|
[CI/Build] Skip Idefics3 and SmolVLM generation test again (#23356)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-22 03:39:46 +00:00 |
|
Benji Beck
|
0b9cc56fac
|
Migrate MllamaImagePixelInputs to TensorSchema (#22020)
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-22 11:28:49 +08:00 |
|
Cyrus Leung
|
8896eb72eb
|
[Deprecation] Remove prompt_token_ids arg fallback in LLM.generate and LLM.embed (#18800)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-22 10:56:57 +08:00 |
|
Matthew Bonanni
|
19fe1a0510
|
[Kernel] Add FP8 support with FlashMLA backend (#22668)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
|
2025-08-22 02:26:32 +00:00 |
|
22quinn
|
480bdf5a7b
|
[Core] Support custom executor qualname (#23314)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-22 09:40:54 +08:00 |
|
Kebe
|
5368f76855
|
[Feature][Responses API] Support logprobs(non-stream) (#23319)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-08-21 23:09:16 +00:00 |
|
tvalentyn
|
8ef6b8a38c
|
Always use cache mounts when installing vllm to avoid populating pip cache in the image. Also remove apt cache. (#23270)
Signed-off-by: Valentyn Tymofieiev <valentyn@google.com>
|
2025-08-21 18:01:03 -04:00 |
|
Michael Goin
|
3bbe11cc13
|
[Perf] Small optimizations for silu_mul_fp8_quant_deep_gemm (#23265)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-21 17:56:15 -04:00 |
|
Simon Mo
|
c5041f899f
|
[CI] improve pr comments bot (#23380)
|
2025-08-21 14:49:03 -07:00 |
|
Simon Mo
|
8b5fe6eb51
|
[CI] Clean up actions: remove helm, publish workflows and improve pr … (#23377)
|
2025-08-21 14:29:04 -07:00 |
|
Woosuk Kwon
|
800349c2a5
|
[Structured Outputs] Refactor bitmask construction into get_grammar_bitmask (#23361)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-21 20:53:33 +00:00 |
|
Elvir Crnčević
|
044931f97b
|
Make sure that vectorize_with_alignment produced vectorized global loads (#23182)
|
2025-08-21 20:06:54 +00:00 |
|
Pavani Majety
|
1d353b6352
|
[Core] Always use tensor cores for Flashinfer Decode Wrapper (#23214)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-08-21 16:02:11 -04:00 |
|
Ning Xie
|
3496274663
|
[Misc] Convert VLLM_TORCH_PROFILER_DIR path to absolute (#23191)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-21 15:49:09 -04:00 |
|
Chen Zhang
|
8a19303173
|
[BugFix][gpt-oss] Fix Chat Completion with Multiple Output Message (#23318)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-21 10:31:11 -07:00 |
|
Nick Hill
|
603fbbbce0
|
[Misc] Misc code cleanup/simplification (#23304)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-21 17:22:55 +00:00 |
|
Ming Yang
|
10f535c086
|
[Bugfix] Fix port conflict by obtaining a list of open ports upfront (#21894)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-08-21 10:22:18 -07:00 |
|
Wentao Ye
|
48bfb0c9b7
|
[Bug] Fix R1 Accuracy 0 Bug (#23294)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-21 13:11:28 -04:00 |
|
Lain
|
f8ce022948
|
add tg-mxfp4-moe-test (#22540)
Signed-off-by: siyuanf <siyuanf@nvidia.com>
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-21 17:05:47 +00:00 |
|
Yi Liu
|
0278f1ac3a
|
Fix nvfp4 swizzling (#23140)
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-08-21 16:54:50 +00:00 |
|
Benji Beck
|
a482e4e769
|
Migrate MolmoImageInputs to TensorSchema (#22022)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-08-21 16:54:08 +00:00 |
|