Cyrus Leung
|
5efd6905bc
|
[CLI][Doc] Formalize --mm-encoder-tp-mode (#23190)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-20 23:42:28 +08:00 |
|
shixianc
|
b17109beea
|
[Kernel] CUTLASS MoE FP8: Integrate cuda moe permute/unpermute (#23045)
Signed-off-by: Shixian Cui <shixian@amazon.com>
|
2025-08-20 10:35:26 -04:00 |
|
Cyrus Leung
|
4449235843
|
[Bugfix] Ensure correctness of HCXVision processing (#23254)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-20 14:19:30 +00:00 |
|
rongfu.leng
|
38217877aa
|
[Fix] fix offline env use local mode path (#22526)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-08-20 13:34:49 +00:00 |
|
Jee Jee Li
|
c6d80a7a96
|
[Model] Improve olmo and olmo2 (#23228)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-20 12:47:05 +00:00 |
|
xyxinyang
|
7cd17e22d7
|
[Model][V1] Support Ernie MTP (#22169)
Signed-off-by: zhouchong <zhouchong03@baidu.com>
Co-authored-by: zhouchong <zhouchong03@baidu.com>
|
2025-08-20 20:41:55 +08:00 |
|
Michael Goin
|
50df09fe13
|
Update to flashinfer-python==0.2.12 and disable AOT compile for non-release image (#23129)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-20 08:05:54 -04:00 |
|
Cyrus Leung
|
68fcd3fa73
|
[Bugfix] Ensure correctness of Cohere2Vision processing (#23245)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-20 11:09:18 +00:00 |
|
Xin Yang
|
83e69a09d6
|
[Model] Support deepseek with eagle (#21086)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2025-08-20 19:01:31 +08:00 |
|
Shiming Zhang
|
3aa8c10038
|
Fix missing quotes (#23242)
Signed-off-by: Shiming Zhang <wzshiming@hotmail.com>
|
2025-08-20 10:46:59 +00:00 |
|
Calvin Chen
|
103f1ec8d3
|
[Model] use autoWeightsLoader for gptoss (#22446)
Signed-off-by: calvin chen <wen.chen@dynamia.ai>
|
2025-08-20 10:16:27 +00:00 |
|
who who who
|
d983769c41
|
fix cuda graph (#22721)
Signed-off-by: fsx950223 <fsx950223@outlook.com>
|
2025-08-20 06:24:37 +00:00 |
|
Nick Hill
|
8fd920924c
|
[BugFix] Fix stuck stats/metrics after requests are aborted (#22995)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-20 13:50:29 +08:00 |
|
Cyrus Leung
|
de7b67a023
|
[CI/Build] Sync multimodal tests (#23181)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-20 05:06:42 +00:00 |
|
Zhewen Li
|
f729023272
|
[CI/Build] Also check DP in benchmarks throughput script (#23038)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-08-20 04:09:27 +00:00 |
|
길재은
|
1a3079a15e
|
chore: support pytorch format in lora (#22790)
Signed-off-by: jaeeun.kil <rha3122@naver.com>
Signed-off-by: 길재은 <rha3122@naver.com>
|
2025-08-20 04:02:50 +00:00 |
|
Louie Tsai
|
941f56858a
|
Fix a performance comparison issue in Benchmark Suite (#23047)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com>
|
2025-08-20 03:14:32 +00:00 |
|
Zebing Lin
|
a634733f67
|
[Attention] Optimize make_local_attention_virtual_batches for Flash Attention (#23185)
Signed-off-by: linzebing <linzebing1995@gmail.com>
|
2025-08-20 02:57:47 +00:00 |
|
Cyrus Leung
|
64ab3c7253
|
[Doc] Update V1 status of various pooling models (#23189)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-20 10:33:41 +08:00 |
|
Chenheli Hua
|
e58c5a9768
|
[Core] Add torch profiler CPU traces for AsyncLLM. (#21794)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-08-20 02:32:47 +00:00 |
|
Michael Goin
|
d46d417b58
|
[CI Perf] Only test bfloat16 for tests/compile/test_fusion_all_reduce.py (#23132)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-19 20:18:52 -06:00 |
|
633WHU
|
0167efe20d
|
[Core] Optimize scheduler request removal for single completions (#21917)
Signed-off-by: chiliu <chiliu@paypal.com>
Signed-off-by: chiliu <cliu_whu@yeah.net>
Co-authored-by: chiliu <chiliu@paypal.com>
|
2025-08-19 18:25:59 -07:00 |
|
Kyle Sayers
|
c32e6ad1f6
|
[Quantization] Bump Compressed Tensors Version (#23202)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-20 00:39:28 +00:00 |
|
Chenheli Hua
|
1630cc8d0f
|
[Benchmarks] Add video inputs to ShareGPTDataset. (#23199)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-08-19 23:42:31 +00:00 |
|
Lucas Wilkinson
|
14e2b0730b
|
[BugFix] fix CUTLASS MLA full cudagraph (#23200)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-08-19 22:17:08 +00:00 |
|
Michael Goin
|
0f4f0191d8
|
[CI/Build] Replace lm-eval gsm8k tests with faster implementation (#23002)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-19 15:07:30 -07:00 |
|
amirkl94
|
a38b8af4c3
|
[NVIDIA] Add SM100 Flashinfer Cutlass MoE fp8 backend (#22357)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
|
2025-08-19 18:01:53 -04:00 |
|
Michael Goin
|
21dce80ea9
|
[CI/Build] Add support for Python 3.13 (#13164)
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-19 13:49:34 -07:00 |
|
Woosuk Kwon
|
e61bac87ee
|
[Misc] Minor refactoring for FlashInfer backend (#23147)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-19 13:11:51 -07:00 |
|
Marko Rosenmueller
|
80141bbf2f
|
fix: use cache_salt for gpt-oss (#23186)
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
|
2025-08-19 18:12:25 +00:00 |
|
bnellnm
|
b94faf9d50
|
[Bugfix] Fix accuracy issue when using flashinfer cutlass moe, TP=1 and modelopt. (#23125)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-19 14:00:51 -04:00 |
|
Woosuk Kwon
|
5b5f350d67
|
[Misc] Enable yapf for FlashInfer backend (#23193)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-19 10:33:47 -07:00 |
|
22quinn
|
f7cf5b512e
|
[Frontend] Add /collective_rpc API endpoint (#23075)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-19 17:29:32 +00:00 |
|
Ruixiang Tan
|
03d4235fd2
|
[Misc] Fix the benchmark's README and improve the error messages for the benchmark's argument checks (#22654)
Signed-off-by: tanruixiang <tanruixiang0104@gmail.com>
|
2025-08-19 10:18:51 -07:00 |
|
Isotr0py
|
d6a1a20973
|
[CI/Build] Update transformers to v4.55.2 (#23093)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-19 10:06:17 -07:00 |
|
Benji Beck
|
a70d0bd0a3
|
Migrate LlavaOnevisionMultiInputs to TensorSchema (#21844)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-08-19 17:02:02 +00:00 |
|
Yuge Zhang
|
24f4d1a224
|
Add return_token_ids parameter to OpenAI API endpoints (#22587)
Signed-off-by: Yuge Zhang <scottyugochang@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-08-19 09:48:31 -07:00 |
|
yiz-liu
|
4f510bc2a1
|
[Model] Removes redundant all-reduce operation in Qwen3MoeSparseMoeBlock (#23169)
Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
|
2025-08-19 16:18:41 +00:00 |
|
TJian
|
1298c67795
|
[FEAT] [Performance] Enable DP for ViT in Qwen2.5VL (#22742)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-08-19 15:25:57 +00:00 |
|
Jee Jee Li
|
4d9c61993a
|
[Bugfix] Fix benchmark_moe.py (#23177)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-19 13:39:40 +00:00 |
|
myselvess
|
b87cb97a53
|
[Model] support new model ovis2.5 (#23084)
Signed-off-by: myselvess <244285088@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-19 13:12:59 +00:00 |
|
wang.yuqi
|
f856c33ce9
|
[Model] Add multi_label_classification support (#23173)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-19 12:54:30 +00:00 |
|
elvischenv
|
03752dba8f
|
[NVIDIA] Support Flashinfer TRTLLM FP8-q/kv/out Attention Kernel (#21716)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-08-19 08:22:15 -04:00 |
|
Woosuk Kwon
|
40f26734b9
|
[Misc] Fix seq_lens for graph capture (#23175)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-19 03:58:16 -07:00 |
|
Tialo
|
2c3f557f08
|
[Doc] use power of 2 (#23172)
|
2025-08-19 03:16:23 -07:00 |
|
Woosuk Kwon
|
21bcc8263f
|
[Misc] Avoid accessing req_ids inside a loop (#23159)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-19 09:39:38 +00:00 |
|
qizixi
|
5bfe0dea7a
|
[bug fix] Fix llama4 spec decoding (#22691)
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2025-08-19 08:53:24 +00:00 |
|
Isotr0py
|
31fd3265c8
|
[Bugfix] Fix broken Minimax-01-VL model (#22116)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-19 08:49:29 +00:00 |
|
hustxiayang
|
31436e8b4f
|
[Misc] Add request_id into benchmark_serve.py (#23065)
Signed-off-by: yangxia <yangxiast@gmail.com>
|
2025-08-19 08:32:18 +00:00 |
|
qizixi
|
4efd43e9b4
|
Fix GLM-4.5V-FP8 numerical issue (#22949)
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-19 07:56:31 +00:00 |
|