yewentao256
|
9e16220e4e
|
fix ubatch datatype issue
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-13 10:52:56 -07:00 |
|
yewentao256
|
5215c80a49
|
Merge commit '6e8d8c4afbddf725b34ef938616701869f5b3462' into sage/dbo-full-cudagraphsh
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-13 10:15:08 -07:00 |
|
yewentao256
|
dd2a94fd9d
|
fix assert error num_tokens_across_dp is None
or num_tokens_across_dp[dp_rank] == batchsize
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-11 13:31:27 -07:00 |
|
Sage Moore
|
e526b1c091
|
fix num_tokens_across_dp sizing issue
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-08-11 15:27:12 +00:00 |
|
yewentao256
|
44ead56ad5
|
fix set forward context error
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-11 14:07:29 +00:00 |
|
yewentao256
|
28e7c30b01
|
Fix pre-commit error
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-11 14:06:25 +00:00 |
|
Sage Moore
|
2cf200c5b8
|
remove debug logging
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-08-08 19:07:49 +00:00 |
|
Sage Moore
|
5bbfd95bdb
|
add support for multiple builders in the model runner
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-08-08 19:01:20 +00:00 |
|
Sage Moore
|
6b0c303ab4
|
misc fixes
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-08-05 19:23:23 +00:00 |
|
Sage Moore
|
4819bb8715
|
fix eager mode
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-08-05 18:01:25 +00:00 |
|
Sage Moore
|
0edaf752d7
|
[Attention][DBO] Add support for "splitting" the CommonAttentionMetadata (#21153)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-08-01 19:47:53 -07:00 |
|
Wentao Ye
|
6e8d8c4afb
|
[Test] Add Unit Test for Batched DeepGEMM (#21559)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-02 10:45:46 +08:00 |
|
Nick Hill
|
8d524ce79f
|
[BugFix] Improve internal DP load balancing (#21617)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-01 19:45:27 -07:00 |
|
Dipika Sikka
|
9f9c38c392
|
[Speculators][Speculative Decoding] Add Qwen Eagle3 Support (#21835)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
|
2025-08-01 19:43:37 -07:00 |
|
Varun Sundar Rabindranath
|
a65f46be5e
|
[Misc] DeepGemmExperts : Avoid JIT generation in the hot-path (#21955)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-08-01 19:42:03 -07:00 |
|
Nicolò Lucchesi
|
57393715e8
|
[Misc] VLLM_TARGET_DEVICE.lower() (#22101)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-08-01 19:41:40 -07:00 |
|
vllmellm
|
ee2eb6ecd8
|
[Model] Qwen2.5 VL SiLU-and-Mul (#22066)
Signed-off-by: kf <kuanfu.liu@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: kf <kuanfu.liu@embeddedllm.com>
|
2025-08-01 19:34:37 -07:00 |
|
fhl2000
|
23322431c8
|
[V1][CUDA] Full cudagraph support for FlashInfer (#21367)
|
2025-08-01 21:49:34 -04:00 |
|
JartX
|
3654847db5
|
feat: Add Support GPTQ Quantization MOE on ROCM vllm serve (#21733)
|
2025-08-01 21:12:19 -04:00 |
|
Wentao Ye
|
eefbf4a68b
|
[Perf] Optimize reshape_and_cache_flash CUDA Kernel (#22036)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-01 19:18:51 -04:00 |
|
Michael Goin
|
88faa466d7
|
[CI] Initial tests for SM100 Blackwell runner (#21877)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-01 16:18:38 -07:00 |
|
Nick Hill
|
881e1af43a
|
[BugFix] Harden distributed DP startup (#21538)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-01 21:40:45 +00:00 |
|
XiongfeiWei
|
d84b97a3e3
|
Add lora test for tp>1 case for TPU. (#21970)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-08-01 18:56:08 +00:00 |
|
Rui Qiao
|
d331759488
|
Introduce RayPPCommunicator for ray-based PP (#21660)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-08-01 11:50:58 -07:00 |
|
Animesh Jain
|
9659bc7f27
|
[compile][startup] Disable C++ compilation of symbolic shapes (#20836)
Signed-off-by: Animesh Jain <anijain@umich.edu>
|
2025-08-01 10:38:52 -07:00 |
|
Michael Goin
|
3277e8f9e1
|
Fix pre-commit failure for SECURTIY.md (#22102)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-01 10:36:07 -07:00 |
|
Jee Jee Li
|
8d705996df
|
[Misc] Minor enhancement of benchmark_moe (#22068)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-02 01:35:30 +08:00 |
|
Harry Mellor
|
38c8bce8b6
|
Enable headless models for pooling in the Transformers backend (#21767)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-01 10:31:29 -07:00 |
|
Varun Sundar Rabindranath
|
ac45c44d98
|
[Bugfix] [Performance] DeepEPHighThroughput + DeepSeek : Quant before Dispatch (#21837)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-08-01 10:14:38 -07:00 |
|
Huzaifa Sidhpurwala
|
d6664664b4
|
security policy: take 1 (#21119)
Signed-off-by: Huzaifa Sidhpurwala <huzaifas@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-08-01 10:09:49 -07:00 |
|
rongfu.leng
|
b879ecd6e2
|
[Bugfix] fix when skip tokenizer init (#21922)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-08-01 10:09:36 -07:00 |
|
Isotr0py
|
3f8e952179
|
[Bugfix] Fix glm4.1v video inference issue (#22067)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-08-01 09:33:30 -07:00 |
|
Harry Mellor
|
326a1b001d
|
Improve documentation of ModelConfig.try_get_generation_config to prevent future confusion (#21526)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-01 09:32:27 -07:00 |
|
Harry Mellor
|
2d7b09b998
|
Deprecate --disable-log-requests and replace with --enable-log-requests (#21739)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-01 17:16:37 +01:00 |
|
David Xia
|
97608dc276
|
[Docs] use uv in CPU installation docs (#22089)
Signed-off-by: David Xia <david@davidxia.com>
|
2025-08-01 07:55:55 -07:00 |
|
Nick Hill
|
3146519add
|
[BugFix] Don't change title of top-level process (#22032)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-01 07:37:55 -07:00 |
|
Richard Zou
|
8026a335a1
|
[BugFix] Update AttnFusionPass cache key (#21947)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-08-01 07:11:29 -07:00 |
|
Wentao Ye
|
a59cd9d9f7
|
[Refactor] Fix Compile Warning #1444-D (#21462)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-01 06:10:30 -07:00 |
|
Abirdcfly
|
5c54d9759d
|
[Bugfix][PD] set max_completion_tokens=1 if req has this value (#21841)
Signed-off-by: Abirdcfly <fp544037857@gmail.com>
|
2025-08-01 06:08:45 -07:00 |
|
Gamhang
|
0a6d305e0f
|
feat(multimodal): Add customizable background color for RGBA to RGB conversion (#22052)
Signed-off-by: Jinheng Li <ahengljh@gmail.com>
Co-authored-by: Jinheng Li <ahengljh@gmail.com>
|
2025-08-01 06:07:33 -07:00 |
|
Michael Goin
|
f81c1bb055
|
[Bugfix] Check NVIDIA artifactory is accessible before using flashinfer cubin kernels (#21893)
|
2025-08-01 08:28:45 -04:00 |
|
Harry Mellor
|
fb0e0d46fc
|
Fix get_kwargs for case where type hint is list[Union[str, type]] (#22016)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-01 05:26:42 -07:00 |
|
TJian
|
26b5f7bd2a
|
[BUG] [ROCm] Fix import bug on ROCm (#22083)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-08-01 05:25:20 -07:00 |
|
Dipika Sikka
|
dfbc1f8880
|
[Speculative Decoding] Add speculators config support (#21345)
|
2025-08-01 08:25:18 -04:00 |
|
Harry Mellor
|
87c94bc879
|
Revert "Update sampling_metadata.py (#21937)" (#22088)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-01 05:24:46 -07:00 |
|
Jee Jee Li
|
28b18cc741
|
[Quantization] Enable BNB support for InternS1 (#21953)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-01 11:09:54 +00:00 |
|
WeiQing Chen
|
4931486988
|
[Doc] Added warning of speculating with draft model (#22047)
Signed-off-by: Dilute-l <dilu2333@163.com>
Co-authored-by: Dilute-l <dilu2333@163.com>
|
2025-08-01 02:11:56 -07:00 |
|
Woosuk Kwon
|
0f81b310db
|
[Misc] Remove upper bound in openai package version (#22060)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-01 02:11:40 -07:00 |
|
wuhang
|
e6680f9e25
|
[Bugfix] Add log prefix in non-dp mode engine core (#21889)
Signed-off-by: wuhang <wuhang6@huawei.com>
|
2025-08-01 09:04:16 +00:00 |
|
Roger Wang
|
27a145e893
|
[Doc] Add example for Step3-VL (#22061)
Signed-off-by: Roger Wang <hey@rogerw.me>
|
2025-08-01 08:35:49 +00:00 |
|