Anton
|
e601efcb10
|
[Misc] Add fully interleaved support for multimodal 'string' content format (#14047)
Signed-off-by: drobyshev.anton <drobyshev.anton@wb.ru>
Co-authored-by: drobyshev.anton <drobyshev.anton@wb.ru>
|
2025-07-07 19:43:08 +00:00 |
|
jvlunteren
|
22dd9c2730
|
[Kernel] Optimize Prefill Attention in Unified Triton Attention Kernel (#20308)
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
|
2025-07-07 19:08:12 +00:00 |
|
Rui Qiao
|
a6d795d593
|
[DP] Copy environment variables to Ray DPEngineCoreActors (#20344)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-07-07 10:14:22 -07:00 |
|
ztang2370
|
a37d75bbec
|
[Front-end] microbatch tokenization (#19334)
Signed-off-by: zt2370 <ztang2370@gmail.com>
|
2025-07-07 17:54:10 +01:00 |
|
Peter Pan
|
edd270bc78
|
[Bugfix] Prevent IndexError for cached requests when pipeline parallelism is disabled (#20486)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
|
2025-07-07 09:41:15 -07:00 |
|
wang.yuqi
|
110df74332
|
[Model][Last/4] Automatic conversion of CrossEncoding model (#19675)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-07-07 14:46:04 +00:00 |
|
Abirdcfly
|
448acad31e
|
[Misc] remove unused jinaai_serving_reranking (#18878)
Signed-off-by: Abirdcfly <fp544037857@gmail.com>
|
2025-07-07 09:14:12 +00:00 |
|
Yan Ma
|
3112271f6e
|
[XPU] log clean up for XPU platform (#20553)
Signed-off-by: yan <yan.ma@intel.com>
|
2025-07-07 01:38:22 -07:00 |
|
Liangliang Ma
|
2c5ebec064
|
[XPU][CI] add v1/core test in xpu hardware ci (#20537)
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
|
2025-07-07 01:16:40 -07:00 |
|
Yang Yang
|
6e2c19ce22
|
[Refactor]Abstract Platform Interface for Distributed Backend and Add xccl Support for Intel XPU (#19410)
Signed-off-by: dbyoung18 <yang5.yang@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-07-07 04:32:32 +00:00 |
|
Woosuk Kwon
|
462b269280
|
Implement OpenAI Responses API [1/N] (#20504)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-06 18:32:13 -07:00 |
|
Cyrus Leung
|
c18b3b8e8b
|
[Bugfix] Add use_cross_encoder flag to use correct activation in ClassifierPooler (#20527)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-06 14:01:48 -07:00 |
|
Woosuk Kwon
|
9528e3a05e
|
[BugFix][Spec Decode] Fix spec token ids in model runner (#20530)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-06 19:44:52 +00:00 |
|
Cyrus Leung
|
9fb52e523a
|
[V1] Support any head size for FlexAttention backend (#20467)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-06 09:54:36 -07:00 |
|
Woosuk Kwon
|
e202dd2736
|
[V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2025-07-06 08:48:13 -07:00 |
|
Reid
|
43813e6361
|
[Misc] call the pre-defined func (#20518)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-06 10:25:29 +00:00 |
|
Brayden Zhong
|
cede942b87
|
[Benchmark] Add support for multiple batch size benchmark through CLI in benchmark_moe.py (#20516)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-07-06 09:20:11 +00:00 |
|
Flora Feng
|
fe1e924811
|
[Frontend] Support image object in llm.chat (#19635)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Flora Feng <4florafeng@gmail.com>
|
2025-07-06 06:47:13 +00:00 |
|
Chengji Yao
|
4548c03c50
|
[TPU][Bugfix] fix the MoE OOM issue (#20339)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-07-05 21:19:09 -07:00 |
|
Lucia Fang
|
432870829d
|
[Bugfix] Fix missing per_act_token parameter in compressed_tensors_moe (#20509)
Signed-off-by: Lu Fang <fanglu@fb.com>
|
2025-07-06 12:08:30 +08:00 |
|
Reid
|
8d763cb891
|
[Misc] remove unused import (#20517)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-05 19:17:06 -07:00 |
|
Reid
|
cf4cd53982
|
[Misc] Add logger.exception for TPU information collection failures (#20510)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-05 07:24:32 -07:00 |
|
Isotr0py
|
32c9be2200
|
[v1] Re-add fp32 support to v1 engine through FlexAttention (#19754)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-05 09:41:10 +00:00 |
|
Lucia Fang
|
8aeaa910a2
|
Fix unknown attribute of topk_indices_dtype in CompressedTensorsW8A8Fp8MoECutlassMethod (#20507)
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
|
2025-07-05 14:03:20 +08:00 |
|
Jee Jee Li
|
906e05d840
|
[Misc] Remove the unused LoRA test code (#20494)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-05 13:48:16 +08:00 |
|
Reid
|
7e90870491
|
[Misc] Add security warning for development mode endpoints (#20508)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-04 20:52:13 -07:00 |
|
Michael Goin
|
c108781c85
|
[CI Bugfix] Fix pre-commit failures on main (#20502)
|
2025-07-04 14:17:30 -07:00 |
|
Duncan Moss
|
3d184b95b8
|
[feat]: CUTLASS block scaled group gemm for SM100 (#19757)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Co-authored-by: Duncan Moss <dmoss@nvidia.com>
|
2025-07-04 12:58:04 -06:00 |
|
Thomas Parnell
|
2f35a022e6
|
Enable V1 for Hybrid SSM/Attention Models (#20016)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2025-07-04 17:46:53 +00:00 |
|
Chenheli Hua
|
ffe00ef77a
|
[Misc] Small: Remove global media connector. Each test should have its own test connector object. (#20395)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-07-04 08:15:03 -07:00 |
|
wang.yuqi
|
2e26f9156a
|
[Model][3/N] Automatic conversion of CrossEncoding model (#20168)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-07-04 05:47:39 -07:00 |
|
sangbumlikeagod
|
9e5452ee34
|
[Bug][Frontend] Fix structure of transcription's decoder_prompt (#18809)
Signed-off-by: sangbumlikeagod <oironese@naver.com>
|
2025-07-04 11:28:07 +00:00 |
|
Michael Goin
|
0e3fe896e2
|
Support Llama 4 for fused_marlin_moe (#20457)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-04 07:55:10 +00:00 |
|
Jee Jee Li
|
1caca5a589
|
[Misc] Add SPDX-FileCopyrightText (#20428)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-04 07:40:42 +00:00 |
|
Aaron Pham
|
4a98edff1f
|
[Structured Outputs][V1] Skipping with models doesn't contain tokenizers (#20365)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-07-04 15:05:49 +08:00 |
|
Gabriel Marinho
|
a4113b035c
|
[Platform] Add custom default max tokens (#18557)
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
|
2025-07-04 10:50:17 +08:00 |
|
Michael Goin
|
7e1665b089
|
[Misc] Change warn_for_unimplemented_methods to debug (#20455)
|
2025-07-04 02:35:08 +00:00 |
|
Seiji Eicher
|
8d1096e7db
|
[Bugfix] Register reducer even if transformers_modules not available (#19510)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-07-03 22:08:12 +00:00 |
|
Nicolò Lucchesi
|
8d775dd30a
|
[Misc] Fix Unable to detect current VLLM config. Defaulting to NHD kv cache layout warning (#20400)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-07-03 14:56:09 -07:00 |
|
bnellnm
|
78fe77534b
|
[Kernel] Enable fp8 support for pplx and BatchedTritonExperts. (#18864)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-07-03 14:55:40 -07:00 |
|
Yuxuan Zhang
|
2f2fcb31b8
|
[Misc] Remove _maybe_ignore_quant_config from GLM4.1v (#20432)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
|
2025-07-03 21:41:13 +00:00 |
|
Ning Xie
|
1dba2c4ebe
|
[Misc] adjust for ipv6 for mookcacke url parse (#20107)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-07-03 20:27:17 +00:00 |
|
Isotr0py
|
71d6de3a26
|
[Misc] Clean up InternVL family config registration (#19992)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-03 20:01:47 +00:00 |
|
Reid
|
619b9f5c7e
|
[Frontend] fix duplicate output for bench subcmd (#20446)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-03 08:02:06 -07:00 |
|
Reid
|
9854dc9040
|
[Frontend] improve vllm bench <bench_type> --help display (#20430)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-03 14:22:16 +00:00 |
|
wang.yuqi
|
6f1229f91d
|
[Model][2/N] Automatic conversion of CrossEncoding model (#19978)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-07-03 13:59:23 +00:00 |
|
Jee Jee Li
|
1819fbda63
|
[Quantization] Bump to use latest bitsandbytes (#20424)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-03 21:58:46 +08:00 |
|
Ning Xie
|
fb14d53cf6
|
[Kernel] refactor cpu worker v0 cache dtype (#20080)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-07-03 08:39:14 +00:00 |
|
Cyrus Leung
|
b024a42e93
|
[Core] Move multimodal placeholder from chat utils to model definition (#20355)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-03 08:18:30 +00:00 |
|
qscqesze
|
363528de27
|
[Feature] Support MiniMax-M1 function calls features (#20297)
Signed-off-by: QscQ <qscqesze@gmail.com>
Signed-off-by: qingjun <qingjun@minimaxi.com>
|
2025-07-03 06:48:27 +00:00 |
|