Michael Goin
|
c494f96fbc
|
Use UV_LINK_MODE=copy in Dockerfile to avoid hardlink fail (#22128)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-05 06:57:10 -07:00 |
|
Nicolò Lucchesi
|
0c275ad5ad
|
[V0 Deprecation][TPU] Remove V1 flag check from tests (#22248)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-08-05 06:53:23 -07:00 |
|
Ning Xie
|
74333ae2f6
|
[Misc] correct static type check for GroupCoordinator (#21946)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-05 03:17:46 -07:00 |
|
elvischenv
|
83156c7b89
|
[NVIDIA] Support Flashinfer TRT-LLM Prefill Attention Kernel (#22095)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-08-05 02:45:34 -07:00 |
|
Wentao Ye
|
4771df7b2b
|
[Feature] Non-contiguous Support for FP8 Quantization (#21961)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-08-05 02:36:43 -07:00 |
|
Benji Beck
|
05fae02175
|
Migrate KimiVLImagePixelInputs to TensorSchema (#21769)
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-08-05 02:36:18 -07:00 |
|
Nicolò Lucchesi
|
d1bf1b9711
|
[Docs][TPU] Highlight TPU Software version selection (#22242)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-08-05 02:33:46 -07:00 |
|
wang.yuqi
|
586f286789
|
[Model] Pooling model activation supports per request control by PoolingParams (#20538)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-05 00:37:00 -07:00 |
|
Cyrus Leung
|
811ac13d03
|
[Core] Factor out common logic for MM budget calculation (#22228)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-04 23:54:55 -07:00 |
|
Michael Goin
|
e79a12fc3a
|
[UX] Fail if an invalid attention backend is specified (#22217)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2025-08-04 23:54:52 -07:00 |
|
Cyrus Leung
|
cdfd6871a5
|
[Bugfix] Misaligned params in TreeAttentionImpl (#22226)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-04 22:40:09 -07:00 |
|
ZiTian.Zhao
|
4b3e4474d7
|
Optimize configuration access with LRU cache in custom ops (#22204)
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
|
2025-08-04 21:43:24 -07:00 |
|
Ning Xie
|
bd3db7f469
|
[Misc] log more detailed message for ensure_model_parallel_initialized (#22144)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-04 19:36:55 -07:00 |
|
Ning Xie
|
29b97c0995
|
[Doc] add backend to doc string of initialize_model_parallel (#22142)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-04 19:36:20 -07:00 |
|
elvischenv
|
7b455cf1c0
|
[Misc] Remove pass_config from CompilationConfig dump_json excluded (#21911)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-08-04 19:17:18 -07:00 |
|
tlipoca9
|
8a6e108e76
|
fix: kimi_k2 return empty tool call list (#22149)
Signed-off-by: tlipoca9 <tlipoca9@gmail.com>
|
2025-08-04 19:15:31 -07:00 |
|
Wentao Ye
|
d7b28f3415
|
[Log] DeepGEMM Update Log for Unaligned Problem Size (#22208)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-04 19:13:19 -07:00 |
|
Yuxuan Zhang
|
6fa41e0c32
|
self.gate dtype update for GLM-4.5 (#22203)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
|
2025-08-04 19:12:38 -07:00 |
|
Gregory Shtrasberg
|
031ca762d7
|
[ROCm][Bugfix] Compilation passes fix (#22202)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-08-04 19:12:28 -07:00 |
|
TJian
|
6ad6b8e115
|
[FEAT] Refactor ROPE into module (#22192)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-08-04 19:12:16 -07:00 |
|
lkchen
|
f4f4e7ef27
|
[V0 deprecation][P/D] Deprecate v0 KVConnectorBase code (1/2) (#21785)
Signed-off-by: Linkun Chen <github@lkchen.net>
|
2025-08-04 19:11:33 -07:00 |
|
Giancarlo Delfin
|
5ea71ff46f
|
[V1] reduce block size for tree attention correctness test to fix 'ou… (#22207)
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>
|
2025-08-04 19:11:06 -07:00 |
|
Woosuk Kwon
|
7175817637
|
Revert "[Bugfix] V1 Fix the cursor leakage issue during request scheduling." (#22223)
|
2025-08-04 18:37:06 -07:00 |
|
PiteXChen
|
2dffac464c
|
[Bugfix] V1 Fix the cursor leakage issue during request scheduling. (#21173)
Signed-off-by: CLFutureX <775523362@qq.com>
|
2025-08-04 18:34:10 -07:00 |
|
Po-Han Huang (NVIDIA)
|
bdcb42e45d
|
[NVIDIA] Auto detect modelopt quant and fix DSR1-FP4 weight loading (#22073)
|
2025-08-04 21:02:55 -04:00 |
|
Zhonghua Deng
|
c09efff976
|
[Bugfix][V1][P/D]Fix the uneven polling issue in the toy proxy for P2pNcclConnector (#21819)
Signed-off-by: Abatom <abzhonghua@gmail.com>
|
2025-08-04 20:17:05 +00:00 |
|
ericehanley
|
309c1bb822
|
[Bug] Update auto_tune.sh to separate benchmarking and profiling. (#21629)
Signed-off-by: Eric Hanley <ericehanley@google.com>
|
2025-08-04 15:12:06 +00:00 |
|
Woosuk Kwon
|
9af654cc38
|
[Responses API] Ignore store=True and process the request by default (#22185)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-04 05:12:48 -07:00 |
|
Raghav Ravishankar
|
a5fff3bd49
|
Fix Arcee model weight loading: Add custom load_weights (#21725)
Signed-off-by: alyosha-swamy <raghav@arcee.ai>
|
2025-08-04 04:09:56 -07:00 |
|
Cyrus Leung
|
1539ced93a
|
[Doc] Update pooling model docs (#22186)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-04 03:37:06 -07:00 |
|
22quinn
|
54de71d0df
|
[Sampler] Support returning all logprobs or logits (#21792)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-04 03:04:12 -07:00 |
|
Isotr0py
|
fed5849d3f
|
[Bugfix] Fix failing GGUF models test (#22174)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-04 01:27:02 -07:00 |
|
Weixiao Huang
|
c1b4eb048a
|
[feat] move WEIGHT_SCALE_SUPPORTED into raise block to accelerate RLHF weight loading (#21164)
Signed-off-by: huangweixiao <huangweixiao@msh.team>
|
2025-08-04 15:43:06 +08:00 |
|
Jee Jee Li
|
a7b8788d2c
|
[Misc] Modify the organization of GLM series (#22171)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-03 23:51:20 -07:00 |
|
Tyler Michael Smith
|
8ecb3e9e93
|
[CI Bugfix] Fix wNa16 kernel not found for test_shared_storage_connector_hashes (#22163)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-08-03 22:19:04 -07:00 |
|
Chenxi Yang
|
e5949e5ae0
|
Remove index_put from MM embeddings merging (#22105)
Co-authored-by: Chenxi Yang <cxyang@meta.com>
|
2025-08-03 22:15:14 -07:00 |
|
ZiTian.Zhao
|
49bcd893e7
|
[refactor] improve ConstantList exception specificity (#22156)
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
|
2025-08-03 22:14:49 -07:00 |
|
Giancarlo Delfin
|
aa7012eb6d
|
Add tree attention backend for v1 (part 1) (#20401)
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>
|
2025-08-03 22:13:26 -07:00 |
|
Ning Xie
|
c2e75b3c11
|
remove duplicate code within cleanup_dist_env_and_memory (#22147)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-03 20:03:58 -07:00 |
|
Abirdcfly
|
0d7db16a92
|
[PD] add test for chat completions endpoint (#21925)
Signed-off-by: Abirdcfly <fp544037857@gmail.com>
|
2025-08-03 19:57:03 -07:00 |
|
22quinn
|
845420ac2c
|
[RLHF] Fix torch.dtype not serializable in example (#22158)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-04 02:43:33 +00:00 |
|
ZiTian.Zhao
|
e27d25a0dc
|
[fix] fix correct assertion syntax error in attention utils. (#22154)
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
|
2025-08-03 19:24:02 -07:00 |
|
Seiji Eicher
|
6f5478298d
|
Use aiohttp connection pool for benchmarking (#21981)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-08-03 19:23:32 -07:00 |
|
Isotr0py
|
6a39ba85fe
|
[Bugfix] Fix failing multimodal standard test (#22153)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-03 19:04:38 +00:00 |
|
Yuxuan Zhang
|
d3c18c9cb0
|
fuse fp32 for GLM-4.5 e_score_correction_bias (#22143)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
|
2025-08-03 09:04:54 -07:00 |
|
TankNee
|
83f7bbb318
|
Add chat doc in quick start (#21213)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-08-03 07:47:55 -07:00 |
|
Li, Jiang
|
b5dfb94fa0
|
[CI/Build][Bugfix] Fix Qwen2.5 tests in CPU CI via fallback silu_and_mul to torch native implementation (#22145)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-08-03 05:34:04 -07:00 |
|
Woosuk Kwon
|
6d98843b31
|
[Responses API] Disable response store by default (#22137)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-03 04:04:21 -07:00 |
|
David Ben-David
|
aefeea0fde
|
[V1] [P/D] Refactor KV Connector Path (#21980)
Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
|
2025-08-03 04:03:40 -07:00 |
|
H
|
24d1dffbeb
|
[executor] feat: add supports_pp attr to executors (#21786)
Signed-off-by: Haibin Lin <haibin.lin@bytedance.com>
|
2025-08-03 18:04:45 +08:00 |
|