Michael Goin
|
6a51530437
|
[Bugfix] Fix 3D input passed into cutlass_scaled_mm (#22278)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-06 10:35:20 +08:00 |
|
Michael Goin
|
35509fc5be
|
[Bugfix] Remove faulty test for oot attention backend (#22286)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-06 00:05:40 +00:00 |
|
Siyuan Liu
|
4b29d2784b
|
[CI][TPU] Fix docker clean up (#22271)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-08-05 23:54:56 +00:00 |
|
youkaichao
|
59a0b8554b
|
[bugfix] fix blackwell deepep installation (#22255)
|
2025-08-06 01:26:09 +08:00 |
|
Giancarlo Delfin
|
469b3ffaaa
|
[V1] port xformers backend to v1 (#21342)
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>
|
2025-08-05 10:04:46 -07:00 |
|
Wentao Ye
|
ae87ddd040
|
[Refactor] Remove Unused Environment Variable VLLM_NO_DEPRECATION_WARNING (#22199)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-05 09:40:23 -07:00 |
|
Michael Goin
|
a7cb6101ca
|
[CI/Build] Update flashinfer to 0.2.9 (#22233)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-05 09:39:38 -07:00 |
|
Michael Goin
|
c494f96fbc
|
Use UV_LINK_MODE=copy in Dockerfile to avoid hardlink fail (#22128)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-05 06:57:10 -07:00 |
|
Nicolò Lucchesi
|
0c275ad5ad
|
[V0 Deprecation][TPU] Remove V1 flag check from tests (#22248)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-08-05 06:53:23 -07:00 |
|
Ning Xie
|
74333ae2f6
|
[Misc] correct static type check for GroupCoordinator (#21946)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-05 03:17:46 -07:00 |
|
elvischenv
|
83156c7b89
|
[NVIDIA] Support Flashinfer TRT-LLM Prefill Attention Kernel (#22095)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-08-05 02:45:34 -07:00 |
|
Wentao Ye
|
4771df7b2b
|
[Feature] Non-contiguous Support for FP8 Quantization (#21961)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-08-05 02:36:43 -07:00 |
|
Benji Beck
|
05fae02175
|
Migrate KimiVLImagePixelInputs to TensorSchema (#21769)
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-08-05 02:36:18 -07:00 |
|
Nicolò Lucchesi
|
d1bf1b9711
|
[Docs][TPU] Highlight TPU Software version selection (#22242)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-08-05 02:33:46 -07:00 |
|
wang.yuqi
|
586f286789
|
[Model] Pooling model activation supports per request control by PoolingParams (#20538)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-05 00:37:00 -07:00 |
|
Cyrus Leung
|
811ac13d03
|
[Core] Factor out common logic for MM budget calculation (#22228)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-04 23:54:55 -07:00 |
|
Michael Goin
|
e79a12fc3a
|
[UX] Fail if an invalid attention backend is specified (#22217)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2025-08-04 23:54:52 -07:00 |
|
Cyrus Leung
|
cdfd6871a5
|
[Bugfix] Misaligned params in TreeAttentionImpl (#22226)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-04 22:40:09 -07:00 |
|
ZiTian.Zhao
|
4b3e4474d7
|
Optimize configuration access with LRU cache in custom ops (#22204)
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
|
2025-08-04 21:43:24 -07:00 |
|
Ning Xie
|
bd3db7f469
|
[Misc] log more detailed message for ensure_model_parallel_initialized (#22144)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-04 19:36:55 -07:00 |
|
Ning Xie
|
29b97c0995
|
[Doc] add backend to doc string of initialize_model_parallel (#22142)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-04 19:36:20 -07:00 |
|
elvischenv
|
7b455cf1c0
|
[Misc] Remove pass_config from CompilationConfig dump_json excluded (#21911)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-08-04 19:17:18 -07:00 |
|
tlipoca9
|
8a6e108e76
|
fix: kimi_k2 return empty tool call list (#22149)
Signed-off-by: tlipoca9 <tlipoca9@gmail.com>
|
2025-08-04 19:15:31 -07:00 |
|
Wentao Ye
|
d7b28f3415
|
[Log] DeepGEMM Update Log for Unaligned Problem Size (#22208)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-04 19:13:19 -07:00 |
|
Yuxuan Zhang
|
6fa41e0c32
|
self.gate dtype update for GLM-4.5 (#22203)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
|
2025-08-04 19:12:38 -07:00 |
|
Gregory Shtrasberg
|
031ca762d7
|
[ROCm][Bugfix] Compilation passes fix (#22202)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-08-04 19:12:28 -07:00 |
|
TJian
|
6ad6b8e115
|
[FEAT] Refactor ROPE into module (#22192)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-08-04 19:12:16 -07:00 |
|
lkchen
|
f4f4e7ef27
|
[V0 deprecation][P/D] Deprecate v0 KVConnectorBase code (1/2) (#21785)
Signed-off-by: Linkun Chen <github@lkchen.net>
|
2025-08-04 19:11:33 -07:00 |
|
Giancarlo Delfin
|
5ea71ff46f
|
[V1] reduce block size for tree attention correctness test to fix 'ou… (#22207)
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>
|
2025-08-04 19:11:06 -07:00 |
|
Woosuk Kwon
|
7175817637
|
Revert "[Bugfix] V1 Fix the cursor leakage issue during request scheduling." (#22223)
|
2025-08-04 18:37:06 -07:00 |
|
PiteXChen
|
2dffac464c
|
[Bugfix] V1 Fix the cursor leakage issue during request scheduling. (#21173)
Signed-off-by: CLFutureX <775523362@qq.com>
|
2025-08-04 18:34:10 -07:00 |
|
Po-Han Huang (NVIDIA)
|
bdcb42e45d
|
[NVIDIA] Auto detect modelopt quant and fix DSR1-FP4 weight loading (#22073)
|
2025-08-04 21:02:55 -04:00 |
|
Zhonghua Deng
|
c09efff976
|
[Bugfix][V1][P/D]Fix the uneven polling issue in the toy proxy for P2pNcclConnector (#21819)
Signed-off-by: Abatom <abzhonghua@gmail.com>
|
2025-08-04 20:17:05 +00:00 |
|
ericehanley
|
309c1bb822
|
[Bug] Update auto_tune.sh to separate benchmarking and profiling. (#21629)
Signed-off-by: Eric Hanley <ericehanley@google.com>
|
2025-08-04 15:12:06 +00:00 |
|
Woosuk Kwon
|
9af654cc38
|
[Responses API] Ignore store=True and process the request by default (#22185)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-04 05:12:48 -07:00 |
|
Raghav Ravishankar
|
a5fff3bd49
|
Fix Arcee model weight loading: Add custom load_weights (#21725)
Signed-off-by: alyosha-swamy <raghav@arcee.ai>
|
2025-08-04 04:09:56 -07:00 |
|
Cyrus Leung
|
1539ced93a
|
[Doc] Update pooling model docs (#22186)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-04 03:37:06 -07:00 |
|
22quinn
|
54de71d0df
|
[Sampler] Support returning all logprobs or logits (#21792)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-04 03:04:12 -07:00 |
|
Isotr0py
|
fed5849d3f
|
[Bugfix] Fix failing GGUF models test (#22174)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-04 01:27:02 -07:00 |
|
Weixiao Huang
|
c1b4eb048a
|
[feat] move WEIGHT_SCALE_SUPPORTED into raise block to accelerate RLHF weight loading (#21164)
Signed-off-by: huangweixiao <huangweixiao@msh.team>
|
2025-08-04 15:43:06 +08:00 |
|
Jee Jee Li
|
a7b8788d2c
|
[Misc] Modify the organization of GLM series (#22171)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-03 23:51:20 -07:00 |
|
Tyler Michael Smith
|
8ecb3e9e93
|
[CI Bugfix] Fix wNa16 kernel not found for test_shared_storage_connector_hashes (#22163)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-08-03 22:19:04 -07:00 |
|
Chenxi Yang
|
e5949e5ae0
|
Remove index_put from MM embeddings merging (#22105)
Co-authored-by: Chenxi Yang <cxyang@meta.com>
|
2025-08-03 22:15:14 -07:00 |
|
ZiTian.Zhao
|
49bcd893e7
|
[refactor] improve ConstantList exception specificity (#22156)
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
|
2025-08-03 22:14:49 -07:00 |
|
Giancarlo Delfin
|
aa7012eb6d
|
Add tree attention backend for v1 (part 1) (#20401)
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>
|
2025-08-03 22:13:26 -07:00 |
|
Ning Xie
|
c2e75b3c11
|
remove duplicate code within cleanup_dist_env_and_memory (#22147)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-03 20:03:58 -07:00 |
|
Abirdcfly
|
0d7db16a92
|
[PD] add test for chat completions endpoint (#21925)
Signed-off-by: Abirdcfly <fp544037857@gmail.com>
|
2025-08-03 19:57:03 -07:00 |
|
22quinn
|
845420ac2c
|
[RLHF] Fix torch.dtype not serializable in example (#22158)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-04 02:43:33 +00:00 |
|
ZiTian.Zhao
|
e27d25a0dc
|
[fix] fix correct assertion syntax error in attention utils. (#22154)
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
|
2025-08-03 19:24:02 -07:00 |
|
Seiji Eicher
|
6f5478298d
|
Use aiohttp connection pool for benchmarking (#21981)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-08-03 19:23:32 -07:00 |
|