4658 Commits

Author SHA1 Message Date
Russell Bryant
8eafe5eaea
[CI/Build] Ignore ruff warning up007 (#13182)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-13 11:48:31 +08:00
Murali Andoorveedu
4c0d93f4b2
[V1][Bugfix] Copy encoder input ids to fix set iteration issue during VLM abort (#13173)
Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>
2025-02-12 12:58:11 -08:00
Michael Goin
14b7899d10
[CI] Fix failing FP8 cpu offload test (#13170)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-02-12 19:16:06 +00:00
Michael Goin
09972e716c
[Bugfix] Allow fallback to AWQ from AWQMarlin at per-layer granularity (#13119) 2025-02-12 09:19:53 -08:00
Qubitium-ModelCloud
36a08630e8
[CORE] [QUANT] Support for GPTQModel's dynamic quantization per module override/control (#7086) 2025-02-12 09:19:43 -08:00
Russell Bryant
2c2b560f48
[CI/Build] Use mypy matcher for pre-commit CI job (#13162)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-12 17:12:22 +00:00
Lu Fang
042c3419fa
Introduce VLLM_CUDART_SO_PATH to allow users specify the .so path (#12998)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-02-12 09:06:13 -08:00
Jee Jee Li
82cabf53a3
[Misc] Delete unused LoRA modules (#13151) 2025-02-12 08:58:24 -08:00
Rafael Vasquez
314cfade02
[Frontend] Generate valid tool call IDs when using tokenizer-mode=mistral (#12332) 2025-02-12 08:29:56 -08:00
Cyrus Leung
985b4a2b19
[Bugfix] Fix num video tokens calculation for Qwen2-VL (#13148)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-02-12 11:55:23 +00:00
bnellnm
f4d97e4fc2
[Bug] [V1] Try fetching stop_reason from EngineOutput before checking the request (#13108) 2025-02-12 02:39:16 -08:00
Shiyan Deng
f1042e86f0
[Misc] AMD Build Improvements (#12923) 2025-02-12 02:36:10 -08:00
Maximilien de Bayser
7c4033acd4
Further reduce the HTTP calls to huggingface.co (#13107) 2025-02-12 02:34:09 -08:00
dependabot[bot]
d59def4730
Bump actions/setup-python from 5.3.0 to 5.4.0 (#12672) 2025-02-12 16:41:22 +08:00
dependabot[bot]
0c7d9effce
Bump helm/chart-testing-action from 2.6.1 to 2.7.0 (#12463) 2025-02-12 16:41:06 +08:00
dependabot[bot]
dd3b4a01f8
Bump actions/stale from 9.0.0 to 9.1.0 (#12462) 2025-02-12 00:40:25 -08:00
dependabot[bot]
a0597c6b75
Bump helm/kind-action from 1.10.0 to 1.12.0 (#11612) 2025-02-12 00:40:19 -08:00
Lingfan Yu
e92694b6fe
[Neuron][Kernel] Support Longer Sequences in NKI-based Flash PagedAttention and Improve Efficiency (#12921)
Signed-off-by: Lingfan Yu <lingfany@amazon.com>
2025-02-11 21:12:37 -08:00
Kevin H. Luu
842b0fd402
[ci] Add more source file dependencies for some tests (#13123)
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>
2025-02-11 20:38:10 -08:00
Christian Pinto
974dfd4971
[Model] IBM/NASA Prithvi Geospatial model (#12830) 2025-02-11 20:34:30 -08:00
Keyun Tong
3ee696a63d
[RFC][vllm-API] Support tokenizer registry for customized tokenizer in vLLM (#12518)
Signed-off-by: Keyun Tong <tongkeyun@gmail.com>
2025-02-12 12:25:58 +08:00
Russell Bryant
72c2b68dc9
[Misc] Move pre-commit suggestion back to the end (#13114)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-11 22:34:16 +00:00
Yuan Tang
14ecab5be2
[Bugfix] Guided decoding falls back to outlines when fails to import xgrammar (#12976)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-11 18:17:44 +00:00
Harry Mellor
deb6c1c6b4
[Doc] Improve OpenVINO installation doc (#13102)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-11 18:02:46 +00:00
Li, Jiang
565c1efa65
[CI/Build][Bugfix] Fix CPU backend default threads num (#13077) 2025-02-11 16:55:56 +00:00
Szymon Ożóg
2b25b7d2e1
Fix initializing GGUF weights for ColumnParallelLinear when using tensor parallel > 1 (#13023) 2025-02-11 08:38:48 -08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
6c4dbe23eb
[BugFix] Pop instead of del CUDA_VISIBLE_DEVICES (#12962)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-02-12 00:21:50 +08:00
MoonRide303
21f5d50fa5
[Bugfix] Do not use resource module on Windows (#12858) (#13029) 2025-02-11 08:21:18 -08:00
Jewon Lee
bf3e05215c
[Misc] Fix typo at comments at metrics.py (#13024) 2025-02-11 08:20:37 -08:00
Harry Mellor
ad9776353e
Set torch_dtype in TransformersModel (#13088)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-11 23:51:19 +08:00
Mark McLoughlin
75e6e14516
[V1][Metrics] Add several request timing histograms (#12644)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-02-11 10:14:00 -05:00
மனோஜ்குமார் பழனிச்சாமி
110f59a33e
[Bugfix] fix flaky test (#13089)
Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
2025-02-11 14:41:20 +00:00
wangxiyuan
2e3b969ec0
[Platform] add pre_register_and_update function (#12432)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-02-11 22:06:46 +08:00
Yuhong Guo
da317197dd
[Build] Fix cuda link target of cumem_allocator in CPU env (#12863)
Signed-off-by: YuhongGuo <yuhong.gyh@antgroup.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-02-11 21:55:57 +08:00
Gregory Shtrasberg
7539bbc6a6
[ROCm] Using a more precise memory profiling (#12624)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-02-11 21:47:10 +08:00
Mengqing Cao
9cf4759493
[executor] init local_rank as device index (#13027)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
2025-02-11 21:20:53 +08:00
Cody Yu
41c5dd45b9
[V1][Metrics] Add GPU prefix cache hit rate % gauge (#12592) 2025-02-11 08:27:25 +00:00
Ce Gao
fc6485d277
[Bugfix]: Reasoning output bug according to the chat template change (#13025)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
2025-02-11 15:49:03 +08:00
Varun Sundar Rabindranath
78a141d768
[Misc] LoRA - Refactor Punica ops tests (#12970)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-02-11 07:26:03 +00:00
Russell Bryant
c320ca8edd
[Core] Don't do platform detection at import time (#12933)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-11 07:25:25 +00:00
Woosuk Kwon
58047c6f04
[Benchmark] Add BurstGPT to benchmark_serving (#13063)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2025-02-10 21:25:30 -08:00
Florian Greinacher
cb080f32e3
[Bugfix] Support missing tool parameters in mistral tokenizer (#12884)
Signed-off-by: Florian Greinacher <florian.greinacher@siemens.com>
2025-02-11 03:33:33 +00:00
Simon Mo
2c0f58203c
[Docs] Annouce Meta Meetup (#13065)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-02-10 18:24:29 -08:00
Woosuk Kwon
2ff4857678
[V1][Minor] Move scheduler outputs to a separate file (#13062)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-02-11 02:10:06 +00:00
Kevin H. Luu
91e876750e
[misc] Fix setup.py condition to avoid AMD from being mistaken with CPU (#13022)
Signed-off-by: kevin <kevin@anyscale.com>
2025-02-10 18:06:16 -08:00
Farzad Abdolhosseini
08b2d845d6
[Model] Ultravox Model: Support v0.5 Release (#12912)
Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>
2025-02-10 22:02:48 +00:00
மனோஜ்குமார் பழனிச்சாமி
2ae889052c
Fix seed parameter behavior in vLLM (#13007)
Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
2025-02-10 23:26:50 +08:00
Cyrus Leung
51f0b5f7f6
[Bugfix] Clean up and fix multi-modal processors (#13012)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-02-10 10:45:21 +00:00
Kevin H. Luu
fde71262e0
[misc] Add retries with exponential backoff for HF file existence check (#13008) 2025-02-10 01:15:02 -08:00
Yuan Tang
243137143c
[Doc] Add link to tool_choice tracking issue in tool_calling.md (#13003)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-10 06:09:33 +00:00