94 Commits

Author SHA1 Message Date
Vensen
0ce743f4e1
Fix(llm): Abort orphaned requests when llm.chat() batch fails Fixes #26081 (#27420)
Signed-off-by: vensenmu <vensenmu@gmail.com>
2025-11-02 16:24:01 +00:00
Junpu Fan
b186149e8e
[Bugfix][Frontend] validate arg priority in frontend LLM class before add request (#27596)
Signed-off-by: Junpu Fan <junpufan@gmail.com>
2025-10-28 14:02:43 +00:00
Russell Bryant
58fab50d82
[Frontend] Require flag for loading text and image embeds (#27204)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-22 15:52:02 +00:00
Tahsin Tunan
43721bc67f
[CI] Replace large models with tiny alternatives in tests (#24057)
Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-16 15:51:27 +01:00
Mark McLoughlin
e519281920
[Metrics] Add test for multi-modal cache stats logging (#26588)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-10-10 16:00:50 +00:00
Cyrus Leung
ad430a67ca
[Metrics] Log multi-modal cache stats and fix reset (#26285)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-10 01:45:55 -07:00
Thomas Parnell
31a4b3e6c4
Revert #24446 and #26168 (#26332)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-10-07 16:38:19 -06:00
Cyrus Leung
1e4ecca1d0
[V0 Deprecation] Remove VLLM_USE_V1 from tests (#26341)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-07 15:42:31 +00:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Yannick Schnider
f05fea1f5e
[Core] Enable decode of context length equal to max model length (#26168)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
2025-10-04 09:59:26 +00:00
HUIJONG JEONG
3e70e3d4d5
add(v1): RequestStatesStats to RequestOutput (#24947)
Signed-off-by: huijjj <huijong.jeong@squeezebits.com>
2025-10-03 08:56:25 +00:00
Woosuk Kwon
52c2a8d4ad
[V0 Deprecation] Remove LLMEngine (#25033)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-20 17:56:30 -07:00
Aaron Pham
29283e8976
[Chore] Cleanup guided namespace, move to structured outputs config (#22772)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-18 09:20:27 +00:00
wang.yuqi
a8b0361c92
[CI] Split pooling from entrypoints Test (#24632)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-09-11 01:53:09 -07:00
dsinghvi
70549c1245
[CI/Build] Serve images used by multimodal tests through local HTTP Server (#23907)
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-09-03 16:13:11 +08:00
wang.yuqi
d9e00dbd1f
[Performance] V1 Classify Models E2E Performance Optimization (#23541)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-08-29 03:12:32 -07:00
Maximilien de Bayser
2554b27baa
[V0 Deprecation] Remove pooling model support in V0 (#23434)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-29 00:04:02 -07:00
Jee Jee Li
b4f9e9631c
[CI/Build] Clean up LoRA test (#23890)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-28 23:28:35 -07:00
Cyrus Leung
8896eb72eb
[Deprecation] Remove prompt_token_ids arg fallback in LLM.generate and LLM.embed (#18800)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-22 10:56:57 +08:00
Harry Mellor
839ab00349
Re-enable Xet on TPU tests now that hf_xet has been updated (#22666)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-11 19:54:40 -07:00
wang.yuqi
84cf78acee
[Model] Pooling models default to using chunked prefill & prefix caching if supported. (#20930)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-08-11 09:41:37 -07:00
wang.yuqi
586f286789
[Model] Pooling model activation supports per request control by PoolingParams (#20538)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-08-05 00:37:00 -07:00
Reza Barazesh
37efc63b64
[V0 deprecation] Guided decoding (#21347)
Signed-off-by: Reza Barazesh <rezabarazesh@meta.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-29 03:15:30 -07:00
QiliangCui
07d80d7b0e
[TPU][TEST] HF_HUB_DISABLE_XET=1 the test 3. (#21539)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-07-24 15:33:04 -07:00
Chengji Yao
3a1d8940ae
[TPU] support fp8 kv cache quantization (#19292)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-07-20 03:01:00 +00:00
QiliangCui
99b4f080d8
Renable google/gemma-3-1b-it accuracy test. (#20866)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-07-12 21:48:56 -07:00
QiliangCui
b4f0b5f9aa
Temporarily suspend google/gemma-3-1b-it. (#20722)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-07-11 11:21:26 +00:00
Nathan Hoos
d6902ce79f
[V0][V1][Core] Add outlines integration for V1, and update V0 integration. (#15975)
Signed-off-by: Nathan Hoos <thwackyy.y@gmail.com>
2025-07-10 15:30:26 -04:00
Maximilien de Bayser
799397ee4f
Support embedding models in V1 (#16188)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-06-18 21:36:33 -07:00
Lu Fang
2b1e2111b0
Fix test_max_model_len in tests/entrypoints/llm/test_generate.py (#19451)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-06-11 12:54:59 +08:00
22quinn
c1c7dbbeeb
[Bugfix][Core] Prevent token lengths exceeding max_model_len in V0 (#19348)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-06-09 23:01:29 +08:00
Simon Mo
02f0c7b220
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Cyrus Leung
c29034037d
[Deprecation] Disallow pos-args other than model when initializing LLM (#18802)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-29 09:36:58 -07:00
Feng XiaoLong
4fc1bf813a
[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454)
Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com>
Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>
2025-05-23 16:16:26 -07:00
Russell Bryant
ec54d73c31
[CI] Fix test_collective_rpc (#17858)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-05-08 16:47:12 +00:00
Harry Mellor
a6977dbd15
Simplify (and fix) passing of guided decoding backend options (#17008)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-29 19:02:23 +00:00
Cyrus Leung
88ad9ec6b2
[Frontend] Support chat_template_kwargs in LLM.chat (#17356)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-29 22:03:35 +08:00
Nick Hill
70116459c3
[BugFix][Frontend] Fix LLM.chat() tokenization (#16081)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-04-25 22:20:05 +00:00
Sangyeon Cho
6aae216b4e
[Bugfix] remove fallback in guided_json (int range, patterns) (#16725)
Signed-off-by: csy1204 <josang1204@gmail.com>
Co-authored-by: 조상연[플레이스 AI] <sang-yeon.cho@navercorp.com>
2025-04-25 06:54:43 +00:00
Travis Johnson
3cde34a4a4
[Frontend] Support guidance:no-additional-properties for compatibility with xgrammar (#15949)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2025-04-23 18:34:41 +00:00
Russell Bryant
dc1b4a6f13
[Core][V0] Enable regex support with xgrammar (#13228)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-04-14 10:13:38 +08:00
Cyrus Leung
4ebc0b9640
[Bugfix] Proper input validation for multi-modal encoder-decoder models (#16156)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-08 09:45:21 -07:00
leon-seidel
24f1c01e0f
[Bugfix][V0] XGrammar structured output supports Enum (#15878)
Signed-off-by: Leon Seidel <leon.seidel@fau.de>
2025-04-07 22:38:25 +00:00
iefgnoix
b6be6f8d1e
[TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. (#15732)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-04-03 14:23:28 -07:00
Alexander Matveev
9a2160fa55
[V1] TPU CI - Add basic perf regression test (#15414)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-03-31 13:25:20 -04:00
Varun Sundar Rabindranath
1286211f57
[Bugfix] LoRA V1: add and fix entrypoints tests (#15715)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-28 21:10:41 -07:00
Russell Bryant
7329ff5468
[V1] Support disable_any_whtespace for guidance backend (#15584)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-28 23:46:45 +08:00
youkaichao
f68cce8e64
[ci/build] fix broken tests in LLM.collective_rpc (#15350)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-03-23 14:49:48 +08:00
Russell Bryant
1f16b7fe74
[Core][V0] Add guidance backend for structured output (#14589)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <lohuynh@microsoft.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
2025-03-19 21:33:51 -07:00
Cyrus Leung
f690372b68
[Core] Update dtype detection and defaults (#14858)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-19 13:49:33 +08:00