Ning Xie
|
d97841078b
|
[Misc] unify variable for LLM instance (#20996)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-07-21 12:18:33 +01:00 |
|
Harry Mellor
|
e6b90a2805
|
[Docs] Make tables more space efficient in supported_models.md (#21291)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-21 02:25:02 -07:00 |
|
Harry Mellor
|
be54a951a3
|
[Docs] Fix hardcoded links in docs (#21287)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-21 02:23:57 -07:00 |
|
Cyrus Leung
|
042af0c8d3
|
[Model][1/N] Support multiple poolers at model level (#21227)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-21 02:22:21 -07:00 |
|
Cyrus Leung
|
378d33c392
|
[Bugfix] Fix missing placeholder in logger debug (#21280)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-20 22:50:06 -07:00 |
|
Huy Do
|
940af1f03a
|
Add the instruction to run e2e validation manually before release (#21023)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-07-20 22:29:18 -07:00 |
|
Simon Mo
|
92615d7fe8
|
[Docs] Add RFC Meeting to Issue Template (#21279)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-07-20 21:58:07 -07:00 |
|
Kay Yan
|
8188196a1c
|
[CI] Cleanup modelscope version constraint in Dockerfile (#21243)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
|
2025-07-20 20:13:02 -07:00 |
|
Jiayi Yan
|
7ba34b1241
|
[bugfix] fix syntax warning caused by backslash (#21251)
|
2025-07-20 17:12:10 +00:00 |
|
Raushan Turganbay
|
9499e26e2a
|
[Model] Support VLMs with transformers backend (#20543)
Signed-off-by: raushan <raushan@huggingface.co>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-07-20 13:25:50 +00:00 |
|
Calvin Chen
|
51ba839555
|
[Model] use AutoWeightsLoader for bart (#18299)
Signed-off-by: calvin chen <120380290@qq.com>
|
2025-07-20 08:15:50 +00:00 |
|
Seiji Eicher
|
d1fb65bde3
|
Enable v1 metrics tests (#20953)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
v0.10.0rc1
|
2025-07-20 03:22:02 +00:00 |
|
Chengji Yao
|
3a1d8940ae
|
[TPU] support fp8 kv cache quantization (#19292)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-07-20 03:01:00 +00:00 |
|
Thomas Parnell
|
2b504eb770
|
[Docs] [V1] Update docs to remove enforce_eager limitation for hybrid models. (#21233)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-07-19 16:09:58 -07:00 |
|
Yuxuan Zhang
|
10eb24cc91
|
GLM-4 Update (#20736)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Lu Fang <fanglu@fb.com>
|
2025-07-19 22:40:31 +00:00 |
|
fhl2000
|
2e8cbb58f3
|
[BugFix] Fix full cuda graph slot_mapping (#21228)
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
|
2025-07-19 14:13:18 -07:00 |
|
Woosuk Kwon
|
752c6ade2e
|
[V0 Deprecation] Deprecate BlockSparse Attention & Phi3-Small (#21217)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-19 13:53:17 -07:00 |
|
Thomas Parnell
|
881e3cbe3b
|
[V1] [Hybrid] Enable piecewise CUDA Graph for mamba layers (#21194)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-07-19 19:27:21 +00:00 |
|
kourosh hakhamaneshi
|
9f414a12ad
|
[BugFix] Make PD work with Ray (#21072)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
|
2025-07-19 08:46:50 -07:00 |
|
Jiayi Yan
|
6a971ed692
|
[Docs] Update the link to the 'Prometheus/Grafana' example (#21225)
|
2025-07-19 06:58:07 -07:00 |
|
Sungjae Lee
|
da6579bf41
|
[CI/CD][bugfix]fix: error argument to loads has incompatible type (#21223)
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Signed-off-by: Sungjae Lee <sung-jae.lee@navercorp.com>
|
2025-07-19 05:16:48 -07:00 |
|
Rabi Mishra
|
c81259d33a
|
Fix/remove some broken model executor tests (#21224)
Signed-off-by: Rabi Mishra <ramishra@redhat.com>
|
2025-07-19 12:15:07 +00:00 |
|
Li, Jiang
|
e3a0e43d7f
|
[bugfix] Fix auto thread-binding when world_size > 1 in CPU backend and refactor code (#21032)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-19 05:13:55 -07:00 |
|
22quinn
|
b3d82108e7
|
[Bugfix][Frontend] Fix openai CLI arg middleware (#21220)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-07-19 02:40:38 -07:00 |
|
Kaixi Hou
|
6d0734c562
|
[NVIDIA] Add SM100 Flashinfer MoE blockscale fp8 backend for low latency (#20645)
Signed-off-by: kaixih <kaixih@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-19 02:33:01 -07:00 |
|
shixianc
|
7d94577138
|
Add torch golden impl for moe_align_block_size kernel test (#20653)
Signed-off-by: Shixian Cui <shixian@amazon.com>
Co-authored-by: Shixian Cui <shixian@amazon.com>
|
2025-07-19 02:32:36 -07:00 |
|
Lucas Wilkinson
|
59f935300c
|
[BugFix] Fix potential cuda-graph IMA (#21196)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-07-19 02:18:47 -07:00 |
|
Isotr0py
|
18e519ec86
|
[Bugfix] Fix ndarray video color from VideoAsset (#21064)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-19 02:17:16 -07:00 |
|
Jee Jee Li
|
1eaff27815
|
[V0 deprecation] Remove long context LoRA (#21169)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-19 02:15:41 -07:00 |
|
Huy Do
|
cf8cc32674
|
Fix a couple of Voxtral tests (#21218)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-07-19 09:13:41 +00:00 |
|
Chenyaaang
|
3a2cb2649d
|
[Misc][Tools][Benchmark] Add readme file for auto_tune script (#20779)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-07-19 09:06:59 +00:00 |
|
김종곤
|
3e04107d97
|
[Model] EXAONE 4.0 model support (#21060)
Signed-off-by: Deepfocused <rlawhdrhs27@gmail.com>
Signed-off-by: woongsik <rlawhdrhs27@gmail.com>
|
2025-07-19 14:25:44 +08:00 |
|
Wentao Ye
|
37bd8d6e4c
|
[Bug] DeepGemm: Fix TypeError: per_block_cast_to_fp8() missing 1 required positional argument: 'use_ue8m0' for SM100 (#21187)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-18 23:25:22 -07:00 |
|
Lucas Wilkinson
|
468e2400fe
|
[BugFix][CPU] Fix TorchSDPABackendImpl doesn't have use_irope (#21200)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-07-18 23:18:48 -07:00 |
|
Varun Sundar Rabindranath
|
dcc6cfb991
|
[Kernel][Performance] Tweak MoE Batched silu_mul_fp8_quant_deep_gemm kernel (#21193)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-18 23:09:51 -07:00 |
|
Woosuk Kwon
|
dd572c0ab3
|
[V0 Deprecation] Remove V0 Spec Decode workers (#21152)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-18 21:47:50 -07:00 |
|
Varun Sundar Rabindranath
|
9ffe905a41
|
[Bugfix][Model] Fix LoRA for Mistral-Small-3.1-24B-Instruct-2503 (#21183)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-07-18 21:15:03 -07:00 |
|
Lucia Fang
|
9a9fda1423
|
[Core] Support Local Chunked Attention for Hybrid KV Cache (#19351)
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Lu Fang <fanglu@meta.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lu Fang <fanglu@meta.com>
|
2025-07-18 20:48:38 -07:00 |
|
Jee Jee Li
|
466e878f2a
|
[Quantization] Enable BNB support for more MoE models (#21100)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-18 17:52:02 -07:00 |
|
Rui Qiao
|
217937221b
|
Elastic Expert Parallel Initial Support (#20775)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-07-18 17:46:09 -07:00 |
|
hax0r31337
|
5782581acf
|
[Bugfix] Voxtral on Blackwell GPUs (RTX 50 series) (#21077)
Signed-off-by: hax0r31337 <liulihaocaiqwq@gmail.com>
|
2025-07-18 18:40:18 -04:00 |
|
JialinOuyang-Meta
|
0f199f197b
|
[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue (#21005)
Signed-off-by: Jialin Ouyang <jialino@meta.com>
|
2025-07-18 12:34:40 -07:00 |
|
Richard Zou
|
b2eb2b5ad7
|
[Kernel] Apply torch.Tag.needs_fixed_stride_order only for torch==2.6.0 (#19346)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-07-18 14:10:21 -04:00 |
|
Richard Zou
|
21274ab476
|
[CI] Update CODEOWNERS for vllm/compilation (#21185)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-07-18 06:51:12 -07:00 |
|
Thomas Parnell
|
ed8cbfedf8
|
Let GraniteMoeAttention use YaRN (#21174)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-07-18 05:52:52 -07:00 |
|
Cyrus Leung
|
45badd05d0
|
[Core] Set pooling params based on task and model (#21128)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-18 05:41:17 -07:00 |
|
ElizaWszola
|
4adc66f64d
|
[Bugfix] Allocate less memory in non-batched CUTLASS MoE (#21121)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
|
2025-07-18 18:55:52 +08:00 |
|
Cyrus Leung
|
55ad648715
|
[Doc] Fix typo in model name (#21178)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-18 03:55:10 -07:00 |
|
wang.yuqi
|
5895afd780
|
[Bugfix] The special_tokens in tokenizer should also be controlled by do_lower_case in encoder_config. (#20750)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-07-18 09:10:47 +00:00 |
|
wang.yuqi
|
ca4eb82bcb
|
[Model] Re-add the implicit conversion feature for as_seq_cls_model (#21103)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-07-18 07:15:07 +00:00 |
|