xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-01-28 22:17:13 +08:00

Author	SHA1	Message	Date
Vensen	0ce743f4e1	Fix(llm): Abort orphaned requests when llm.chat() batch fails Fixes #26081 (#27420 ) Signed-off-by: vensenmu <vensenmu@gmail.com>	2025-11-02 16:24:01 +00:00
Junpu Fan	b186149e8e	[Bugfix][Frontend] validate arg priority in frontend LLM class before add request (#27596 ) Signed-off-by: Junpu Fan <junpufan@gmail.com>	2025-10-28 14:02:43 +00:00
Russell Bryant	58fab50d82	[Frontend] Require flag for loading text and image embeds (#27204 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-22 15:52:02 +00:00
Tahsin Tunan	43721bc67f	[CI] Replace large models with tiny alternatives in tests (#24057 ) Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-16 15:51:27 +01:00
Mark McLoughlin	e519281920	[Metrics] Add test for multi-modal cache stats logging (#26588 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-10-10 16:00:50 +00:00
Cyrus Leung	ad430a67ca	[Metrics] Log multi-modal cache stats and fix reset (#26285 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-10 01:45:55 -07:00
Thomas Parnell	31a4b3e6c4	Revert #24446 and #26168 (#26332 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-10-07 16:38:19 -06:00
Cyrus Leung	1e4ecca1d0	[V0 Deprecation] Remove `VLLM_USE_V1` from tests (#26341 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 15:42:31 +00:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Yannick Schnider	f05fea1f5e	[Core] Enable decode of context length equal to max model length (#26168 ) Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>	2025-10-04 09:59:26 +00:00
HUIJONG JEONG	3e70e3d4d5	add(v1): RequestStatesStats to RequestOutput (#24947 ) Signed-off-by: huijjj <huijong.jeong@squeezebits.com>	2025-10-03 08:56:25 +00:00
Woosuk Kwon	52c2a8d4ad	[V0 Deprecation] Remove LLMEngine (#25033 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-20 17:56:30 -07:00
Aaron Pham	29283e8976	[Chore] Cleanup guided namespace, move to structured outputs config (#22772 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-18 09:20:27 +00:00
wang.yuqi	a8b0361c92	[CI] Split pooling from entrypoints Test (#24632 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-09-11 01:53:09 -07:00
dsinghvi	70549c1245	[CI/Build] Serve images used by multimodal tests through local HTTP Server (#23907 ) Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com> Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-09-03 16:13:11 +08:00
wang.yuqi	d9e00dbd1f	[Performance] V1 Classify Models E2E Performance Optimization (#23541 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-08-29 03:12:32 -07:00
Maximilien de Bayser	2554b27baa	[V0 Deprecation] Remove pooling model support in V0 (#23434 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-29 00:04:02 -07:00
Jee Jee Li	b4f9e9631c	[CI/Build] Clean up LoRA test (#23890 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-28 23:28:35 -07:00
Cyrus Leung	8896eb72eb	[Deprecation] Remove `prompt_token_ids` arg fallback in `LLM.generate` and `LLM.embed` (#18800 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-22 10:56:57 +08:00
Harry Mellor	839ab00349	Re-enable Xet on TPU tests now that `hf_xet` has been updated (#22666 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-11 19:54:40 -07:00
wang.yuqi	84cf78acee	[Model] Pooling models default to using chunked prefill & prefix caching if supported. (#20930 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-08-11 09:41:37 -07:00
wang.yuqi	586f286789	[Model] Pooling model activation supports per request control by PoolingParams (#20538 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-08-05 00:37:00 -07:00
Reza Barazesh	37efc63b64	[V0 deprecation] Guided decoding (#21347 ) Signed-off-by: Reza Barazesh <rezabarazesh@meta.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-29 03:15:30 -07:00
QiliangCui	07d80d7b0e	[TPU][TEST] HF_HUB_DISABLE_XET=1 the test 3. (#21539 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-07-24 15:33:04 -07:00
Chengji Yao	3a1d8940ae	[TPU] support fp8 kv cache quantization (#19292 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-07-20 03:01:00 +00:00
QiliangCui	99b4f080d8	Renable google/gemma-3-1b-it accuracy test. (#20866 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-07-12 21:48:56 -07:00
QiliangCui	b4f0b5f9aa	Temporarily suspend google/gemma-3-1b-it. (#20722 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-07-11 11:21:26 +00:00
Nathan Hoos	d6902ce79f	[V0][V1][Core] Add outlines integration for V1, and update V0 integration. (#15975 ) Signed-off-by: Nathan Hoos <thwackyy.y@gmail.com>	2025-07-10 15:30:26 -04:00
Maximilien de Bayser	799397ee4f	Support embedding models in V1 (#16188 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-18 21:36:33 -07:00
Lu Fang	2b1e2111b0	Fix test_max_model_len in tests/entrypoints/llm/test_generate.py (#19451 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-06-11 12:54:59 +08:00
22quinn	c1c7dbbeeb	[Bugfix][Core] Prevent token lengths exceeding `max_model_len` in V0 (#19348 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-09 23:01:29 +08:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Cyrus Leung	c29034037d	[Deprecation] Disallow pos-args other than `model` when initializing `LLM` (#18802 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-29 09:36:58 -07:00
Feng XiaoLong	4fc1bf813a	[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454 ) Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com> Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>	2025-05-23 16:16:26 -07:00
Russell Bryant	ec54d73c31	[CI] Fix test_collective_rpc (#17858 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-08 16:47:12 +00:00
Harry Mellor	a6977dbd15	Simplify (and fix) passing of guided decoding backend options (#17008 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-29 19:02:23 +00:00
Cyrus Leung	88ad9ec6b2	[Frontend] Support `chat_template_kwargs` in `LLM.chat` (#17356 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-29 22:03:35 +08:00
Nick Hill	70116459c3	[BugFix][Frontend] Fix `LLM.chat()` tokenization (#16081 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-25 22:20:05 +00:00
Sangyeon Cho	6aae216b4e	[Bugfix] remove fallback in guided_json (int range, patterns) (#16725 ) Signed-off-by: csy1204 <josang1204@gmail.com> Co-authored-by: 조상연[플레이스 AI] <sang-yeon.cho@navercorp.com>	2025-04-25 06:54:43 +00:00
Travis Johnson	3cde34a4a4	[Frontend] Support guidance:no-additional-properties for compatibility with xgrammar (#15949 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2025-04-23 18:34:41 +00:00
Russell Bryant	dc1b4a6f13	[Core][V0] Enable regex support with xgrammar (#13228 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-14 10:13:38 +08:00
Cyrus Leung	4ebc0b9640	[Bugfix] Proper input validation for multi-modal encoder-decoder models (#16156 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-08 09:45:21 -07:00
leon-seidel	24f1c01e0f	[Bugfix][V0] XGrammar structured output supports Enum (#15878 ) Signed-off-by: Leon Seidel <leon.seidel@fau.de>	2025-04-07 22:38:25 +00:00
iefgnoix	b6be6f8d1e	[TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. (#15732 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-04-03 14:23:28 -07:00
Alexander Matveev	9a2160fa55	[V1] TPU CI - Add basic perf regression test (#15414 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-03-31 13:25:20 -04:00
Varun Sundar Rabindranath	1286211f57	[Bugfix] LoRA V1: add and fix entrypoints tests (#15715 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-28 21:10:41 -07:00
Russell Bryant	7329ff5468	[V1] Support disable_any_whtespace for guidance backend (#15584 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-28 23:46:45 +08:00
youkaichao	f68cce8e64	[ci/build] fix broken tests in LLM.collective_rpc (#15350 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-03-23 14:49:48 +08:00
Russell Bryant	1f16b7fe74	[Core][V0] Add guidance backend for structured output (#14589 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <lohuynh@microsoft.com> Co-authored-by: Michal Moskal <michal@moskal.me> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-03-19 21:33:51 -07:00
Cyrus Leung	f690372b68	[Core] Update dtype detection and defaults (#14858 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-19 13:49:33 +08:00

1 2

94 Commits