Cyrus Leung
4d4d6bad19
[Chore] Separate out vllm.utils.importlib ( #27022 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-17 00:48:59 +00:00
wangxiyuan
8f4b313c37
[Misc] rename torch_dtype to dtype ( #26695 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-10-15 12:11:48 +00:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y ( #26633 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Harry Mellor
7c12763b24
Fix some typing issues found by mypy==1.18.2 ( #26596 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-10 18:21:25 +00:00
Isotr0py
d1ddf340c8
[V0 deprecation] Remove QKVCrossParallelLinear implementation ( #26475 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-09 10:52:27 +00:00
Luciano Martins
1317028aa8
[Model] Gemma3: Fix GGUF loading and quantization ( #26189 )
...
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-09 07:00:53 +00:00
Naveenraj Kamalakannan
e614ab7806
Separate MLAAttention class from Attention ( #25103 )
...
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-10-08 17:11:11 -07:00
Utkarsh Sharma
335b28f7d1
[TPU] Rename tpu_commons to tpu_inference ( #26279 )
...
Signed-off-by: Utkarsh Sharma <utksharma@google.com>
Co-authored-by: Utkarsh Sharma <utksharma@google.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
2025-10-07 23:30:52 -07:00
liangel-02
b32260ab85
[torchao] safetensors integration ( #25969 )
...
Signed-off-by: Angel Li <liangel@meta.com>
2025-10-07 20:12:35 -06:00
ahao-anyscale
4dbdf4a294
[BUG] Fix file parsing for load_format runai_streamer_sharded ( #26324 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
2025-10-07 11:23:07 +08:00
Harry Mellor
6c04638214
Fix per file ruff ignores related to line length ( #26262 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-06 05:12:40 +00:00
Harry Mellor
4e256cadc2
Remove all references to yapf as it's no longer used ( #26251 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 09:18:11 -07:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort ( #26247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Pavani Majety
a26917332f
[Quantization/NVFP4] Speed up TRTLLM NVFP4 MOE weight loading and fix K/V scale loading for MLA Attn ( #25968 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-10-03 19:35:06 +00:00
Jerry Zhang
c31246800c
Support RL online quantization with torchao ( #23014 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
2025-10-01 16:39:29 -07:00
Cyrus Leung
d346ec695e
[CI/Build] Consolidate model loader tests and requirements ( #25765 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-26 21:45:20 -07:00
Cyrus Leung
89fa54e6f7
[Optimization] Use a cheaper cache key in get_model_architecture ( #25682 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-25 17:54:20 -04:00
Harry Mellor
8c853050e7
[Docs] Enable fail_on_warning for the docs build in CI ( #25580 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-24 19:30:33 +00:00
Nicolò Lucchesi
4cf71cc88a
[TPU] Deprecate xm.mark_step in favor of `torch_xla.sync ( #25254 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-09-22 10:12:57 +00:00
Woosuk Kwon
52c2a8d4ad
[V0 Deprecation] Remove LLMEngine ( #25033 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-20 17:56:30 -07:00
Manoel Marques
bf8b26cad1
Generate _ModelInfo properties file when loading to improve loading speed ( #23558 )
...
Signed-off-by: Manoel Marques <manoel.marques@ibm.com>
Signed-off-by: Manoel Marques <manoelmrqs@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-09-20 11:51:13 +00:00
Cyrus Leung
c60e6137f0
[Optimization] Avoid repeated model architecture conversion for pooling models ( #25261 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-20 13:30:22 +08:00
Harry Mellor
aed16879a9
Move ModelConfig from config/__init__.py to config/model.py ( #25252 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-19 16:22:33 +00:00
Hyogeun Oh (오효근)
41f17bf290
[Docs] Fix warnings in mkdocs build (continued) ( #24740 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-09-12 06:43:15 -07:00
Jee Jee Li
60a0951924
[Bugfix] Fix BNB name match ( #24735 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-12 11:12:01 +00:00
Maximilien de Bayser
e090b7b45b
Enable conversion of multimodal models to pooling tasks ( #24451 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-09-12 03:30:41 +00:00
Harry Mellor
d6249d0699
Fix typing for safetensors_load_strategy ( #24641 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-11 10:41:39 +00:00
shengshiqi-google
41329a0ff9
[Core] feat: Add --safetensors-load-strategy flag for faster safetensors loading from Lustre ( #24469 )
...
Signed-off-by: Shiqi Sheng <shengshiqi@google.com>
Signed-off-by: shengshiqi-google <160179165+shengshiqi-google@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-10 23:10:01 -07:00
Harry Mellor
f36355abfd
Move LoadConfig from config/__init__.py to config/load.py ( #24566 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-10 06:14:18 -07:00
pwschuurman
4377b1ae3b
[Bugfix] Update Run:AI Model Streamer Loading Integration ( #23845 )
...
Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai>
Signed-off-by: Peter Schuurman <psch@google.com>
Co-authored-by: Omer Dayan (SW-GPU) <omer@run.ai>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-09-09 21:37:17 -07:00
Yang Kaiyong
43d9ad03ba
[Model loader]: support multi-thread model weight loading ( #23928 )
...
Signed-off-by: Yang Kaiyong <yangkaiyong.yky@antgroup.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2025-09-08 18:49:39 +00:00
Jee Jee Li
8d7f39b48c
[Model] Remove quantized mixtral ( #24437 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-08 11:02:14 -07:00
Li Wang
5e537f45b4
[Bugfix] Fix get_quant_config when using modelscope ( #24421 )
...
Signed-off-by: wangli <wangli858794774@gmail.com>
2025-09-08 11:03:02 +00:00
Woosuk Kwon
4172235ab7
[V0 deprecation] Deprecate V0 Neuron backend ( #21159 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-06 16:15:18 -07:00
Isotr0py
53b19ccdd5
[Core] Allow disabling TP sharding for parallel Linear layer ( #23024 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-05 22:53:58 -07:00
Didier Durand
02d411fdb2
[Doc]: fix typos in Python comments ( #24115 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-02 21:14:07 -07:00
Woosuk Kwon
5685370271
[Chore][V0 Deprecation] Move LogProb to a separate file ( #24055 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-01 12:07:53 -07:00
Harry Mellor
5674a40366
[Misc] Make download_weights_from_hf more reliable ( #23863 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-29 12:37:24 -07:00
Isotr0py
4645024d3a
[Quantization] Allow GGUF quantization to skip unquantized layer ( #23188 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-22 13:04:22 -06:00
杨朱 · Kiki
695e7adcd2
[misc] Remove outdate comment about runai_model_streamer ( #23421 )
...
Signed-off-by: carlory <baofa.fan@daocloud.io>
2025-08-22 13:08:53 +00:00
Cyrus Leung
0c31e28e95
[Bugfix] Fix extra whitespace in strings caused by newline ( #23272 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-20 22:03:00 -07:00
Chengji Yao
e9d6a3db69
[TPU] make ptxla not imported when using tpu_commons ( #23081 )
...
Signed-off-by: Chengji Yao <chengjiyao@gmail.com>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Chengji Yao <chengjiyao@gmail.com>
2025-08-19 11:46:42 +08:00
Gh0u1L5
b159c0a67a
Fix GGUF loader for Qwen3 MoE. ( #22785 )
...
Signed-off-by: Gh0u1L5 <Gh0u1L5@outlook.com>
2025-08-13 06:08:23 -07:00
Andy Chen
9b94d6ec8f
Enable 4bit bnb prequant MOE ( #21548 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-11 19:02:14 -07:00
Zhiyu
d57dc2364e
Add ModelOpt Qwen3 nvfp4 support ( #20101 )
...
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
2025-08-07 19:18:19 -07:00
Syed Muhammad Bin Asif
609b533cb6
[Bugfix] Add proper comparison for package versions ( #22314 )
...
Signed-off-by: Syed Muhammad Bin Asif <syedmba7@connect.hku.hk>
2025-08-06 20:31:03 -07:00
Jee Jee Li
28b18cc741
[Quantization] Enable BNB support for InternS1 ( #21953 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-01 11:09:54 +00:00
Cyrus Leung
86ae693f20
[Deprecation][2/N] Replace --task with --runner and --convert ( #21470 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-27 19:42:40 -07:00
Ning Xie
adaf2c6d4f
[Bugfix] fix modelscope snapshot_download serialization ( #21536 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-07-24 22:44:38 -07:00
Harry Mellor
13abd0eaf9
[Model] Officially support Emu3 with Transformers backend ( #21319 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-24 03:22:12 -07:00