rongfu.leng
|
aa54a7bf7b
|
[BugFix] fix data parallel construct ipv6 url addres (#18991)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-06-01 11:42:10 +08:00 |
|
Michael Goin
|
2ad6194a02
|
Let max_num_batched_tokens use human_readable_int for large numbers (#18968)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-01 11:41:29 +08:00 |
|
Reid
|
c594cbf565
|
[doc] small fix - mkdocs (#18996)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-31 20:23:43 -07:00 |
|
Isotr0py
|
a35ca765a5
|
[LoRA] Support dynamically initialize packed_modules_mapping for VLM with arbitrary components (#18987)
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-01 11:06:57 +08:00 |
|
Cyrus Leung
|
6aa8f9a4e7
|
[Core] Rework dtype resolution (#18751)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-01 11:04:23 +08:00 |
|
Benjamin Chislett
|
1bc86a3da1
|
[Bugfix] Fix EAGLE3 broken logits (#18909)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-05-31 19:58:07 -07:00 |
|
Ekagra Ranjan
|
bbfa0c61d1
|
[Misc][Benchmark] Add support for CustomDataset (#18511)
|
2025-05-31 19:07:38 +00:00 |
|
Reid
|
20079c6e36
|
[Misc] add return token strs for tokenize (#18941)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-31 18:00:11 +00:00 |
|
Nick Hill
|
9a1b9b99d7
|
[BugFix] Fix multi-node offline data-parallel (#18981)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>
|
2025-05-31 08:34:52 -07:00 |
|
ptarasiewiczNV
|
8bf507d766
|
[P/D] NixlConnector use cache device index for memory registration (#18969)
Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
|
2025-05-31 11:19:18 -04:00 |
|
Charlie Fu
|
306d60401d
|
[ROCm][Kernel] Add gfx950 support for skinny gemms (#18010)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-05-31 07:40:05 -07:00 |
|
Fred Reiss
|
f2c3f66d59
|
[Bugfix] Fix for issue 17396 (#18773)
Signed-off-by: Fred Reiss <frreiss@us.ibm.com>
|
2025-05-31 11:58:17 +00:00 |
|
vllmellm
|
0f5e0d567e
|
[FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 (#18825)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-05-31 03:39:31 -07:00 |
|
Luka Govedič
|
c55d804672
|
[BugFix] Pydantic part 2 (#18911)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-05-31 03:39:28 -07:00 |
|
Reid
|
749f5bdd38
|
[doc] fix the list rendering issue - security.md (#18982)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-31 10:39:21 +00:00 |
|
Satyajith Chilappagari
|
2a50ef5760
|
[Neuron] Add Multi-Modal model support for Neuron (#18921)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com>
Co-authored-by: Rohith Nallamaddi <nalrohit@amazon.com>
Co-authored-by: FeliciaLuo <luof@amazon.com>
Co-authored-by: Elaine Zhao <elaineyz@amazon.com>
|
2025-05-31 10:39:11 +00:00 |
|
Lucia Fang
|
b8b904795d
|
fix security issue of logging llm output (#18980)
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
|
2025-05-31 10:38:56 +00:00 |
|
Chauncey
|
ba5111f237
|
[Bugfix]: Fix the incompatibility issue with Structured Outputs when Thinking is disabled (#18879)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-05-31 09:20:54 +00:00 |
|
Yong Hoon Shin
|
1e123529d7
|
[Misc] Fix estimated max model len msg (#18966)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-05-31 16:43:44 +08:00 |
|
Pooya Davoodi
|
dff80b0e42
|
[Frontend] Add rerank support to run_batch endpoint (#16278)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
|
2025-05-31 07:40:01 +00:00 |
|
Yu Guo
|
7782464a17
|
create util function for batched arange (#18937)
|
2025-05-31 13:50:38 +08:00 |
|
Lukas Geiger
|
0f71e24034
|
[Docs] Correct multiprocessing design doc (#18964)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-05-31 01:30:15 +00:00 |
|
Will Eaton
|
1dab4d5718
|
Tool parser regex timeout handling (#18960)
Signed-off-by: Will Eaton <weaton@redhat.com>
|
2025-05-30 21:02:54 +00:00 |
|
rongfu.leng
|
7f21e8052b
|
[Misc] add group_size is -1 in awq quantization (#18910)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-05-30 17:34:22 +00:00 |
|
Isotr0py
|
5a8641638a
|
[VLM] Add PP support and fix GPTQ inference for Ovis models (#18958)
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-30 17:11:44 +00:00 |
|
Michael Goin
|
f49239cb45
|
Benchmark script for fp8 vs bf16 gemm (#17126)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-30 10:56:11 -06:00 |
|
Nick Hill
|
2dbe8c0774
|
[Perf] API-server scaleout with many-to-many server-engine comms (#17546)
|
2025-05-30 08:17:00 -07:00 |
|
Richard Zou
|
84ec470fca
|
Improve "failed to get the hash of the compiled graph" error (#18956)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-05-30 15:00:54 +00:00 |
|
Russell Bryant
|
b29ca5c4d5
|
[Docs] Update SECURITY.md with link to our security guide (#18961)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-30 07:37:27 -07:00 |
|
Reid
|
ec6833c5e9
|
[doc] show the count for fork and watch (#18950)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-30 06:45:59 -07:00 |
|
Shawn Huang
|
e1fadf1197
|
[Feature] minicpm eagle support (#18943)
Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com>
Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com>
|
2025-05-30 06:45:56 -07:00 |
|
Daniele
|
43ff405b90
|
[CI/Build] remove regex from build dependencies (#18945)
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-05-30 04:02:50 -07:00 |
|
Carol Zheng
|
fba02e3bd1
|
[Bugfix][TPU] Fix tpu model runner testcase failure (#18810)
Signed-off-by: Carol Zheng <cazheng@google.com>
|
2025-05-30 18:04:03 +08:00 |
|
Always-Naive
|
4577fc9abb
|
[Misc]Fix typo (#18947)
|
2025-05-30 02:21:35 -07:00 |
|
Rabi Mishra
|
5f1d0c8118
|
[Bugfix][Failing Test] Fix test_vllm_port.py (#18618)
Signed-off-by: rabi <ramishra@redhat.com>
|
2025-05-30 17:13:47 +08:00 |
|
Lukas Geiger
|
c3bb9f2331
|
[Model] Use in-place adds in SigLIP (#18922)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-05-30 17:12:59 +08:00 |
|
Reid
|
8f8900cee9
|
[doc] add mkdocs doc (#18930)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-30 07:58:44 +00:00 |
|
Rabi Mishra
|
6acb7a6285
|
[Misc]Fix benchmarks/README.md for speculative decoding (#18897)
Signed-off-by: rabi <ramishra@redhat.com>
|
2025-05-30 07:58:04 +00:00 |
|
Cyrus Leung
|
4f4a6b844a
|
[Deprecation] Remove mean pooling default for Qwen2EmbeddingModel (#18913)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-30 06:53:37 +00:00 |
|
Michael Goin
|
4d0a1541be
|
[Bugfix] Remove NVFP4 scales assertions to fix load_format=dummy (#18861)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-30 13:37:36 +08:00 |
|
vllmellm
|
77b6e74fe2
|
[ROCm] Remove unnecessary assertion of max_model_len in ROCM_AITER_MLA attention backend. (#18938)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-05-29 22:33:17 -07:00 |
|
H
|
5acf828d99
|
[docs] fix: fix markdown syntax (#18927)
|
2025-05-30 05:20:48 +00:00 |
|
iLeGend
|
3987e2ae96
|
[Model] Use AutoWeightsLoader for mamba2 (#18918)
Signed-off-by: iLeGend <824040212@qq.com>
|
2025-05-30 04:50:10 +00:00 |
|
Chauncey
|
77164dad5e
|
[Bugfix] Consistent ascii handling in tool parsers (#18883)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-05-30 04:44:43 +00:00 |
|
Wenhua Cheng
|
3de3eadf5b
|
improve the robustness of parsing vlms config in AutoRound (#18894)
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com>
|
2025-05-29 19:24:47 -07:00 |
|
Carol Zheng
|
3132290a14
|
[TPU][CI/CD] Clean up docker for TPU tests. (#18926)
Signed-off-by: Carol Zheng <cazheng@google.com>
|
2025-05-30 10:24:19 +08:00 |
|
Cyrus Leung
|
1aa2f81b43
|
[Misc] Update type annotation for rotary embedding base (#18914)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-30 10:17:01 +08:00 |
|
Michael Goin
|
d54af615d5
|
[Bugfix] Fix PP default fallback behavior for V1 (#18915)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-30 10:13:17 +08:00 |
|
Chengji Yao
|
a1cc9f33a3
|
[TPU] remove transpose ops in moe kernel (#18923)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-05-29 23:00:11 +00:00 |
|
Richard Zou
|
a521ef06e5
|
Use standalone_compile by default in torch >= 2.8.0 (#18846)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-05-30 06:41:58 +08:00 |
|