Isotr0py
|
f07a673eb2
|
[Misc] Allow AutoWeightsLoader to skip loading weights with specific substr in name (#18358)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-19 20:20:12 -07:00 |
|
Satyajith Chilappagari
|
dc1440cf9f
|
Neuron up mistral (#18222)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
|
2025-05-19 09:54:47 -07:00 |
|
Wenhua Cheng
|
e2ee1e8e9e
|
[Feature]Add support for models quantized with AutoRound (#17850)
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com>
|
2025-05-19 09:38:53 -07:00 |
|
Jee Jee Li
|
6781af5608
|
[Quantization] Pool model support bitsandbytes (#18087)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-19 09:03:43 -07:00 |
|
Nan Qin
|
221cfc2fea
|
Feature/vllm/input embedding completion api (#17590)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Nan2018 <nan@protopia.ai>
Co-authored-by: 临景 <linjing.yx@alibaba-inc.com>
Co-authored-by: Bryce1010 <bryceyx@gmail.com>
Co-authored-by: Andrew Sansom <andrew@protopia.ai>
Co-authored-by: Andrew Sansom <qthequartermasterman@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-05-18 20:18:05 -07:00 |
|
wwl2755
|
9da1095daf
|
[Spec Decode][V0] Fix spec decode correctness test in V0 eagle/medusa (#18175)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-05-18 19:49:46 -07:00 |
|
cascade
|
9ab2c02ff8
|
Support sequence parallelism combined with pipeline parallelism (#18243)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-05-17 22:47:25 +00:00 |
|
Jinzhen Lin
|
e73b7dfd69
|
[Bugfix] fix an illegal memory access was encountered of marlin kernel + act_order (#18245)
|
2025-05-16 16:02:44 -07:00 |
|
Bowen Wang
|
7fdfa01530
|
[Sampler] Adapt to FlashInfer 0.2.3 sampler API (#15777)
Signed-off-by: Bowen Wang <abmfy@icloud.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-05-16 15:14:03 -07:00 |
|
Isotr0py
|
390ec88905
|
[Misc] Consolidate Audio tests into multimodal common generation tests (#18214)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-16 09:18:08 +00:00 |
|
Seiji Eicher
|
541817670c
|
[Misc] Add Ray Prometheus logger to V1 (#17925)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-05-16 01:02:42 -07:00 |
|
Lucia Fang
|
3d2779c29a
|
[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827)
Signed-off-by: Lucia Fang <fanglu@fb.com>
|
2025-05-15 22:28:27 -07:00 |
|
Will Eaton
|
6b31c84aff
|
Throw better error for when running into k8s service discovery issue (#18209)
Signed-off-by: Will Eaton <weaton@redhat.com>
|
2025-05-15 21:07:28 -07:00 |
|
Harry Mellor
|
b18201fe06
|
Allow users to pass arbitrary JSON keys from CLI (#18208)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-15 21:05:34 -07:00 |
|
Lucas Wilkinson
|
4e1c6a0264
|
[Bugfix] fix rotary embedding test for _get_padded_tensor_shape (#18229)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-16 01:32:45 +00:00 |
|
Lucia Fang
|
8795eb9975
|
[Bugfix] Fix test_eagle test (#18223)
Signed-off-by: Lucia Fang <fanglu@fb.com>
|
2025-05-15 15:59:42 -07:00 |
|
Alexei-V-Ivanov-AMD
|
566ec04c3d
|
Adding "Basic Models Test" and "Multi-Modal Models Test (Extended) 3" in AMD Pipeline (#18106)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-05-15 08:49:23 -07:00 |
|
hustxiayang
|
451da4bcbd
|
add tools into TokenizeChatRequest (#18187)
Signed-off-by: yangxia <yangxiast@gmail.com>
|
2025-05-15 04:01:49 -07:00 |
|
omahs
|
a9944aabfa
|
fix: typos (#18151)
Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>
|
2025-05-15 02:16:15 -07:00 |
|
Russell Bryant
|
a8f5aec20a
|
[V1] Update zmq socket creation in nixl connector (#18148)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-14 23:17:57 -07:00 |
|
David Xia
|
de71fec81b
|
[CI] don't skip fixed test_kv_cache_events() (#18183)
Signed-off-by: David Xia <david@davidxia.com>
|
2025-05-14 23:17:16 -07:00 |
|
Ning Xie
|
420caf7557
|
[UT] Add ut for none hash (#17892)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-05-15 13:28:11 +08:00 |
|
Chenheli Hua
|
4f07a64075
|
Support custom implementations of VideoLoader backends. (#18091)
|
2025-05-15 13:26:49 +08:00 |
|
Thomas Parnell
|
e6b8e65d2d
|
[Bugfix] Fix fp8 tests for triton_unified_attention for Triton 3.3 (#18013)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-15 13:26:34 +08:00 |
|
Mark McLoughlin
|
65334ef3b9
|
[V1][Metrics] Remove unused code (#18158)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-14 20:13:17 -07:00 |
|
Chen Zhang
|
e60f550b38
|
[v1] Support multiple KV cache groups in GPU model runner (#17945)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-05-14 18:54:54 -07:00 |
|
Michael Goin
|
2142035b51
|
[V1] Support multiple kv connectors (#17564)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-05-14 16:28:02 -07:00 |
|
Russell Bryant
|
78aa341d12
|
[CI] Fix race condition in test_kv_cache_events test (#18169)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-14 16:27:48 -07:00 |
|
Jerry Zhang
|
7974736740
|
Add support for loading torchao models with AOPerModuleConfig (#17826)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-05-14 16:24:59 -07:00 |
|
Aaron Pham
|
2fc9075b82
|
[V1] Structured Outputs + Thinking compatibility (#16577)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-14 15:45:24 -07:00 |
|
Lucas Wilkinson
|
d93c976a0d
|
[Kernel] Have rotary embeddings support tensors (#18046)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-14 15:43:55 -07:00 |
|
Robert Shaw
|
856865008e
|
[CI] Disable Failing Tests (#18165)
|
2025-05-14 13:49:56 -07:00 |
|
bnellnm
|
f9c069c85e
|
Modularize fused experts and integrate PPLX kernels (#15956)
|
2025-05-14 13:11:54 -07:00 |
|
Nick Hill
|
59dd311cf5
|
[KVConnector] Keep KVTransferParams as a dict (#18033)
|
2025-05-14 08:05:57 -07:00 |
|
Cyrus Leung
|
d066e52013
|
[Bugfix] Fix chat utils tests (#18139)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-14 05:38:21 -07:00 |
|
Cyrus Leung
|
d62a076e84
|
[Model] GritLM supports other attention backends (#18109)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-14 03:33:19 -07:00 |
|
Jee Jee Li
|
259127f8b8
|
[Bugfix] Fix LoRA test (#18123)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-14 10:25:47 +00:00 |
|
TJian
|
612c2edb4f
|
[FEAT] [ROCm]: Add AITER CK 2 Stages MoE support (#17110)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-05-14 03:03:11 -07:00 |
|
rongfu.leng
|
82e7f9bb03
|
[Misc] replace does not exist model (#18119)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-05-14 02:13:47 -07:00 |
|
Cyrus Leung
|
8f5dc41481
|
[Bugfix] Fix entrypoints audio test failure (#18111)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-14 09:08:07 +00:00 |
|
wang.yuqi
|
63ad622233
|
[New Model]: support GTE NewModel (#17986)
|
2025-05-14 01:31:31 -07:00 |
|
lkchen
|
6685890d11
|
[Fix] Move "model_config" as keyword args in chat_utils.py (#18098)
Signed-off-by: Linkun <github@lkchen.net>
|
2025-05-13 23:27:26 -07:00 |
|
Charlie Fu
|
7b2f28deba
|
[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm (#18082)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-05-13 22:13:56 -07:00 |
|
vllmellm
|
2d912fb66f
|
[FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 (#17955)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-05-13 22:03:47 -07:00 |
|
Chen Zhang
|
f2ae883b67
|
[v1][KVCacheManager] pass num_new_computed_tokens to kv cache manager (#18001)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-05-13 19:09:39 -07:00 |
|
vllmellm
|
40de1ef455
|
[FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature (#14968)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-05-13 19:08:20 -07:00 |
|
Nick Hill
|
55aa7af994
|
[V1] DP scale-out (2/N): Decouple engine process management and comms (#15977)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-05-13 10:48:21 -07:00 |
|
Aaron Pham
|
cb528d0585
|
[Fix] check to make sure processor has chat templates (#18047)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-05-13 03:04:10 -07:00 |
|
Michael Goin
|
ea6ae8cb45
|
[Bugfix] Fix marlin moe fallback logic for llama4 (#18042)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-13 07:53:28 +00:00 |
|
Chen Zhang
|
f0d610a8ae
|
[v1][KVCacheManager] Avoid full cache hit by controlling max_length (#17999)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-05-13 06:50:38 +00:00 |
|