Yu Chin Fabian Lim
32ec9e2f2a
Mamba V2 Test not Asserting Failures. ( #21379 )
...
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
2025-07-23 01:40:27 -07:00
Lu Fang
accac82928
[Sampler] Introduce logprobs mode for logging ( #21398 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-07-23 01:39:25 -07:00
Michael Yao
23637dcdef
[Docs] Fix bullets and grammars in tool_calling.md ( #21440 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-07-23 01:23:20 -07:00
Sergio Paniego Blanco
6364af92f8
Fixed typo in profiling logs ( #21441 )
2025-07-23 01:18:54 -07:00
Guillaume Calmettes
7aaa2bd5a8
[Bugfix] ensure tool_choice is popped when tool_choice:null is passed in json payload ( #19679 )
...
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
2025-07-23 00:30:05 -07:00
youkaichao
2f5c14de6a
add clear messages for deprecated models ( #21424 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-07-23 00:03:16 -07:00
Michael Goin
f002e9a870
[Cleanup] Only log MoE DP setup warning if DP is enabled ( #21315 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-23 00:02:48 -07:00
Jialin Ouyang
a1f3610fc6
[Core] Add basic unit test for maybe_evict_cached_block ( #21400 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-07-23 00:02:02 -07:00
Isotr0py
4ecedd1806
[Bugfix] Fix nightly transformers CI failure ( #21427 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-23 00:01:01 -07:00
Alexei-V-Ivanov-AMD
107111a859
Changing "amdproduction" allocation. ( #21409 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2025-07-22 20:48:31 -07:00
elvischenv
2dec7c1a5d
[Bugfix][CUDA] fixes CUDA FP8 kv cache dtype supported ( #21420 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-07-22 20:34:50 -07:00
Chendi.Xue
08d2bd78da
[BUGFIX] deepseek-v2-lite failed due to fused_qkv_a_proj name update ( #21414 )
...
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
2025-07-22 20:33:57 -07:00
ericehanley
4f76a05f4f
[BugFix] Update python to python3 calls for image; fix prefix & input calculations. ( #21391 )
...
Signed-off-by: Eric Hanley <ericehanley@google.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-22 20:33:00 -07:00
Harry Mellor
f154bb9ff0
Simplify weight loading in Transformers backend ( #21382 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-22 20:29:43 -07:00
Gregory Shtrasberg
3ec7170ff1
[Bugfix][ROCm][Build] Fix build regression on ROCm ( #21393 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-07-22 20:27:41 -07:00
Cyrus Leung
c401c64b4c
[CI/Build] Fix model executor tests ( #21387 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-22 20:25:37 -07:00
Joe Runde
b77c7d327f
[BugFix] Fix ray import error mem cleanup bug ( #21381 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
2025-07-22 16:19:55 -07:00
Rui Qiao
35bc8bd5fb
[Misc] Copy HF_TOKEN env var to Ray workers ( #21406 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-07-22 16:18:42 -07:00
Yiheng Xu
4594fc3b28
[Model] Add Qwen3CoderToolParser ( #21396 )
...
Signed-off-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: simon-mo <xmo@berkeley.edu>
2025-07-22 15:05:57 -07:00
Xin Li
ae268b6326
Fix Flashinfer Allreduce+Norm enable disable calculation based on fi_allreduce_fusion_max_token_num ( #21325 )
...
Signed-off-by: XIn Li <xinli@nvidia.com>
2025-07-22 12:42:31 -07:00
Cyrus Leung
35366ae57c
[CI/Build] Fix test failure due to updated model repo ( #21375 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-22 08:39:35 -07:00
Aritra Roy Gosthipaty
2226d5bd85
[Bugfix] Decode Tokenized IDs to Strings for hf_processor in llm.chat() with model_impl=transformers ( #21353 )
...
Signed-off-by: ariG23498 <aritra.born2fly@gmail.com>
2025-07-22 08:27:28 -07:00
Wang Yijun
44554a0068
Add tokenization_kwargs to encode for embedding model truncation ( #21033 )
2025-07-22 08:24:00 -07:00
Wentao Ye
226b452a20
Revert "[Refactor] Fix Compile Warning #1444-D ( #21208 )" ( #21384 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-22 08:22:10 -07:00
Raushan Turganbay
f38ee34a0a
[feat] Enable mm caching for transformers backend ( #21358 )
...
Signed-off-by: raushan <raushan@huggingface.co>
2025-07-22 08:18:46 -07:00
Benjamin Bartels
b194557a6c
Adds parallel model weight loading for runai_streamer ( #21330 )
...
Signed-off-by: bbartels <benjamin@bartels.dev>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-07-22 08:15:53 -07:00
Wentao Ye
774d0c014b
[Perf] Cuda Kernel for Per Token Group Quant ( #21083 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-22 07:27:15 -07:00
Duncan Moss
2c8db17cfd
[feat]: add SM100 support for cutlass FP8 groupGEMM ( #20447 )
...
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-22 07:27:12 -07:00
Mickaël Seznec
4fb56914c5
[perf] Add fused MLA QKV + strided layernorm ( #21116 )
...
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-22 07:07:44 -07:00
Ning Xie
0df4d9b06b
[Misc] unify variable for LLM instance v2 ( #21356 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-07-22 06:32:36 -07:00
Jialin Ouyang
ed25054577
[Core] Introduce popleft_n and append_n in FreeKVCacheBlockQueue to further optimize block_pool ( #21222 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-07-22 06:17:47 -07:00
Jialin Ouyang
10904e6d75
[benchmark] Port benchmark request sent optimization to benchmark_serving ( #21209 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-07-22 05:28:00 -07:00
Jialin Ouyang
a32237665d
[Core] Optimize update checks in LogitsProcessor ( #21245 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-07-22 05:27:18 -07:00
Kebe
bc8a8ce5ec
[Misc] Remove deprecated args in v0.10 ( #21349 )
...
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-07-22 05:26:39 -07:00
Simon Mo
32142b3c62
[Bugfix] Fix eviction cached blocked logic ( #21357 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-07-22 01:18:40 -07:00
Raghav Ravishankar
82b8027be6
Add arcee model ( #21296 )
...
Signed-off-by: alyosha-swamy <raghav@arcee.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-22 00:57:43 -07:00
rongfu.leng
3779eb8c81
[Feature][eplb] add verify ep or tp or dp ( #21102 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-07-21 23:41:14 -07:00
Shu Wang
9e23ad9655
Update fp4 quantize API ( #21327 )
...
Signed-off-by: Shu Wang <shuw@nvidia.com>
2025-07-21 23:40:21 -07:00
Wentao Ye
e69a92a1ce
[Bug] DeepGemm: Fix Cuda Init Error ( #21312 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-21 23:36:18 -07:00
Varun Sundar Rabindranath
8425f785ad
[Misc] DeepEPHighThroughtput - Enable Inductor pass ( #21311 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-07-21 23:35:45 -07:00
Konrad Zawora
c17231e827
Fix kv_cache_dtype handling for out-of-tree HPU plugin ( #21302 )
...
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
2025-07-21 23:35:14 -07:00
Wentao Ye
6e5b5ca580
[Refactor] Fix Compile Warning #1444-D ( #21208 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-21 23:33:51 -07:00
Thomas Parnell
488d8a986a
[V1] [Hybrid] Add new test to verify that hybrid views into KVCacheTensor are compatible ( #21300 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-07-21 23:31:18 -07:00
Jialin Ouyang
af376ca19d
[Core] Minimize number of dict lookup in _maybe_evict_cached_block ( #21281 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-07-21 22:37:34 -07:00
Ming Yang
e7b2042681
Revert "[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE ( #20762 ) ( #21334 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-07-21 21:49:01 -07:00
Ratnam Parikh
90f1e55421
[Intel GPU] Ray Compiled Graph avoid NCCL for Intel GPU ( #21338 )
...
Signed-off-by: ratnampa <ratnam.parikh@intel.com>
2025-07-21 21:48:27 -07:00
Li, Jiang
5e70dcd6e6
[Doc] Fix CPU doc format ( #21316 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-21 21:47:49 -07:00
Chaojun Zhang
25d585ab7b
[XPU] Enable external_launcher to serve as an executor via torchrun ( #21021 )
...
Signed-off-by: chzhang <chaojun.zhang@intel.com>
2025-07-21 21:47:35 -07:00
Lu Fang
8d0a01a5f2
[v1][sampler] Inplace logprobs comparison to get the token rank ( #21283 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-07-21 13:47:47 -07:00
Himanshu Jaju
0ec82edda5
[perf] Speed up align sum kernels ( #21079 )
...
Signed-off-by: Himanshu Jaju <hj@mistral.ai>
2025-07-21 11:19:23 -07:00