weichen
0431508388
Use request_id as the identifier when removing a request
...
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
2025-12-24 16:30:26 +08:00
weichen
0000d981d2
add ut for sjf scheduler policy
...
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
2025-12-24 16:30:26 +08:00
Harry Mellor
da9d153112
Delete now empty file
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
2025-12-24 16:30:26 +08:00
Harry Mellor
58615e5889
docstring
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
2025-12-24 16:30:26 +08:00
Harry Mellor
9e8d9e1231
Consolidate SJF code and remove global variable
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
2025-12-24 16:30:26 +08:00
Harry Mellor
53d57d9dca
Remove tuple stuff
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
2025-12-24 16:30:26 +08:00
Harry Mellor
601387735c
Fix removal from heap
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
2025-12-24 16:30:26 +08:00
Pr0Wh1teGivee
4fe722fae5
abstracting common code to HeapBasedRequestQueue
...
Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
2025-12-24 16:30:26 +08:00
Pr0Wh1teGivee
cc0a8ae572
naming
...
Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
2025-12-24 16:30:26 +08:00
Harry Mellor
ac674f6fc7
Move docstring
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
2025-12-24 16:30:26 +08:00
Pr0Wh1teGivee
6413793466
Update scheduler.py
...
Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
2025-12-24 16:30:26 +08:00
Pr0Wh1teGivee
779769ea97
Create __init__.py
...
Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
2025-12-24 16:30:26 +08:00
Pr0Wh1teGivee
ed2a808252
Update normalized_scorer.py
...
Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
2025-12-24 16:30:26 +08:00
Pr0Wh1teGivee
b04f678659
linting
...
Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
2025-12-24 16:30:26 +08:00
Pr0Wh1teGivee
1e8b313afb
linting
...
Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
2025-12-24 16:30:26 +08:00
Pr0Wh1teGivee
db3e0a576e
linting
...
Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
2025-12-24 16:30:26 +08:00
Pr0Wh1teGivee
dd0e1224bc
linting
...
Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
2025-12-24 16:30:26 +08:00
Pr0Wh1teGivee
379eabac7f
linting
...
Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
2025-12-24 16:30:26 +08:00
Pr0Wh1teGivee
e14d347982
use heap
...
Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
2025-12-24 16:30:26 +08:00
Pr0Wh1teGivee
0098c3fb93
[Feat][Sched] Add SJF Scheduling Policy
...
Co-authored-by: HiC4Sh1e <chenjie137@huawei.com>
Co-authored-by: JiahongZhang-Work <iscocheung@gmail.com>
Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
2025-12-24 16:30:26 +08:00
Roger Young
c02a2705f9
Update MiniMax-M2 ToolCall and add MiniMax-M2.1 in Docs ( #31083 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com>
Co-authored-by: xuebi <xuebi@minimaxi.com>
2025-12-22 05:28:40 +00:00
Kevin McKay
cf8eed7bef
[Bugfix][ROCm] Fix typo: is_linear_fp8_enaled -> is_linear_fp8_enabled ( #31109 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-21 21:14:58 -08:00
Kevin McKay
44ae85f725
[Misc] Fix typo: 'occured' -> 'occurred' ( #31120 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
2025-12-21 21:14:27 -08:00
Kevin McKay
14c3e6ade3
[Misc] Fix spelling typos in model comments ( #31117 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
2025-12-21 21:14:14 -08:00
Kevin McKay
42b42824ae
[Misc] Fix grammar errors in comments and messages ( #31115 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
2025-12-21 21:14:02 -08:00
Kevin McKay
ec58c10ce1
[Misc] Fix quantization-related typos ( #31116 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
2025-12-21 21:13:48 -08:00
Kevin McKay
8c084de59d
[Misc] Fix spelling typos in comments ( #31114 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
2025-12-21 21:13:14 -08:00
CedricHuang
19cc9468fd
[Feature]: Support NVIDIA ModelOpt HF FP8 variants FP8_PER_CHANNEL_PER_TOKEN and FP8_PB_WO in vLLM ( #30957 )
2025-12-21 22:34:49 -05:00
Jee Jee Li
097978a15d
[Kernel] Enable fused_qknorm_rope_kernel supports partial rope ( #30821 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-21 18:39:22 -08:00
Lucas Wilkinson
7e065eba59
[CI] Fix "2 Node Tests (4 GPUs in total)" ( #31090 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-22 10:32:40 +08:00
Steve Westerhouse
9d701e90d8
[Doc] Clarify FP8 KV cache computation workflow ( #31071 )
...
Signed-off-by: westers <steve.westerhouse@origami-analytics.com>
2025-12-22 08:41:37 +08:00
Michael Goin
06d490282f
[NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size ( #30897 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-21 09:41:57 -08:00
Robert Shaw
b471092d3a
[MoE Refactor][4/N] Marlin Fp8 Mk ( #31036 )
2025-12-21 12:37:42 -05:00
Ameen Patel
93cabc417c
ci: add nvidia-smi warmup before Prime-RL integration test ( #31093 )
...
Signed-off-by: AmeenP <ameenp360@gmail.com>
2025-12-21 15:43:01 +00:00
Chauncey
bb80f69bc9
add aarnphm and chaunceyjiang to the new tool_parser directory ( #31088 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-21 03:24:34 +00:00
汪志鹏
3e92b2b7ac
[BugFix]fix gpt-oss v1/completions response bug ( #30608 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: bbrowning <bbrownin@redhat.com>
2025-12-21 10:39:31 +08:00
Jinzhen Lin
7c73ceb581
[Quantization] add marlin w4a8/w8a8 check ( #31061 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
2025-12-20 21:58:11 +00:00
Lucas Wilkinson
ae0770fa6b
[CI] Fix H200 Distributed test ( #31054 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-20 16:48:49 -05:00
Jinzhen Lin
ee52d9901d
[Quantization] support logical_widths for fp8 marlin ( #30962 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-20 12:02:57 -08:00
baonudesifeizhai
54c8924384
[MoE Refactor][5/N] Isolate zero expert to LongCatFlash ( #28891 )
...
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: Dongjie Zou <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>
2025-12-20 18:22:04 +00:00
Yan Ma
560ae9638c
[XPU] enable fp8 online streaming quantization ( #30944 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com>
2025-12-20 13:45:27 +00:00
Jeffrey Wang
1501a4070e
[Bugfix] Read truncate_prompt_tokens from pooling_params in AsyncLLM.encode() ( #31013 )
...
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
2025-12-20 10:29:31 +00:00
Lucas Wilkinson
ff2168bca3
[CI] FIx fixture 'siglip_attention_config' not found ( #31053 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-20 03:46:15 +00:00
Gregory Shtrasberg
0be149524c
[ROCm][CI/Build] Update ROCm dockerfiles ( #30991 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-12-20 03:19:12 +00:00
zejunchen-zejun
d52c5096d7
[Bugfix] fix the alias bug of AttentionBackendEnum when register CUSTOM attention backend to vllm ( #30869 )
...
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
2025-12-20 09:03:35 +08:00
Yuxuan Zhang
8a7a414374
GLM-4.7 Tool Parser and Doc Update ( #30876 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
2025-12-20 00:09:58 +00:00
Robert Shaw
95befecc18
[MoE Refactor][2/N] Use Modular Kernels for Fp8 ( #30825 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-12-19 23:36:38 +00:00
Wentao Ye
4cf9429897
[Bug] Fix error 'Dynamo failed to run FX node with fake tensors for Deepseek V3.2 ( #31046 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-19 23:31:31 +00:00
Robert Shaw
83a317f650
[MoE Refactor][3/N] Deprecate cutlass block quant fp8 (b200) ( #30990 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-12-19 13:09:54 -08:00
Lucas Wilkinson
5f6477d1d0
[BugFix] Fix TypeError: unhashable type: 'dict' when serving deepseek32 ( #30924 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-19 16:07:54 -05:00