xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-21 12:07:15 +08:00

Author	SHA1	Message	Date
Reza Barazesh	37efc63b64	[V0 deprecation] Guided decoding (#21347 ) Signed-off-by: Reza Barazesh <rezabarazesh@meta.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-29 03:15:30 -07:00
Keyang Ru	9ace2eaf35	[Bugfix] Improve JSON extraction in LlamaToolParser (#19024 ) Signed-off-by: keru <keyang.ru@oracle.com> Co-authored-by: keru <keyang.ru@oracle.com>	2025-07-28 12:36:58 +00:00
Hongsheng Liu	7656cf4cf3	[Bugfix] [issue-21565] Fix the incompatibility issue with stream and named function calling when Thinking is disabled (#21573 ) Signed-off-by: wangzi <3220100013@zju.edu.cn> Co-authored-by: wangzi <3220100013@zju.edu.cn>	2025-07-27 22:43:50 -07:00
Cyrus Leung	86ae693f20	[Deprecation][2/N] Replace `--task` with `--runner` and `--convert` (#21470 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-27 19:42:40 -07:00
mgazz	e189b50f53	Add support for Prithvi in Online serving mode (#21518 ) Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-07-25 07:01:27 -07:00
Cyrus Leung	34ddcf9ff4	[Frontend] `run-batch` supports V1 (#21541 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-24 20:05:55 -07:00
QiliangCui	07d80d7b0e	[TPU][TEST] HF_HUB_DISABLE_XET=1 the test 3. (#21539 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-07-24 15:33:04 -07:00
Julien Denize	6d8d0a24c0	Add think chunk (#21333 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai>	2025-07-23 21:51:32 -07:00
Liangliang Ma	13e4ee1dc3	[XPU][UT] increase intel xpu CI test scope (#21492 ) Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>	2025-07-23 20:24:04 -07:00
Michael Goin	82ec66f514	[V0 Deprecation] Remove Prompt Adapters (#20588 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-23 16:36:48 -07:00
Chengji Yao	3a1d8940ae	[TPU] support fp8 kv cache quantization (#19292 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-07-20 03:01:00 +00:00
22quinn	b3d82108e7	[Bugfix][Frontend] Fix openai CLI arg `middleware` (#21220 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-07-19 02:40:38 -07:00
Asher	5a7fb3ab9e	[Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820 ) Signed-off-by: Asher Zhang <asherszhang@tencent.com>	2025-07-17 09:10:09 +00:00
Michael Goin	4e7dfbe7b4	Update PyTorch to `torch==2.7.1` for CUDA (#21011 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-17 02:30:44 +00:00
Mac Misiura	18bdcf4113	feat - add a new endpoint `get_tokenizer_info` to provide tokenizer/chat-template information (#20575 ) Signed-off-by: m-misiura <mmisiura@redhat.com>	2025-07-16 21:52:14 +08:00
Maximilien de Bayser	6ebf313790	Avoid direct comparison of floating point numbers (#21002 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-07-15 21:12:14 -07:00
Patrick von Platen	cfbcb9ed87	[Voxtral] Add more tests (#21010 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-15 21:11:49 -07:00
Harry Mellor	1e36c8687e	[Deprecation] Remove `nullable_kvs` (#20969 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-15 17:21:50 +00:00
Patrick von Platen	e7e3e6d263	Voxtral (#20970 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-07-15 07:35:30 -07:00
Nicolò Lucchesi	80305c1b24	[CI] Fix flaky `test_streaming_response` test (#20913 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-07-14 20:15:15 -07:00
Nicolò Lucchesi	149f2435a5	[Misc] Relax translations tests (#20856 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-07-14 20:08:36 +00:00
QiliangCui	99b4f080d8	Renable google/gemma-3-1b-it accuracy test. (#20866 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-07-12 21:48:56 -07:00
QiliangCui	b4f0b5f9aa	Temporarily suspend google/gemma-3-1b-it. (#20722 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-07-11 11:21:26 +00:00
Cyrus Leung	cbd14ed561	[Bugfix] Refactor `/invocations` to be task-agnostic (#20764 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-11 03:20:54 -07:00
Alex Brooks	41060c6e08	[Core] Add Support for Default Modality Specific LoRAs [generate / chat completions] (#19126 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-07-10 21:09:37 +01:00
Nathan Hoos	d6902ce79f	[V0][V1][Core] Add outlines integration for V1, and update V0 integration. (#15975 ) Signed-off-by: Nathan Hoos <thwackyy.y@gmail.com>	2025-07-10 15:30:26 -04:00
Chauncey	8f2720def9	[Frontend] Support Tool Calling with both `tool_choice='required'` and `$defs`. (#20629 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-07-10 13:56:35 +08:00
Chauncey	2155e95ef1	[Bugfix] Fix the issue where `reasoning_content` is `None` when Thinkng is enabled and `tool_choice` is set to `'required'`. (#20662 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-07-09 07:39:58 +00:00
kourosh hakhamaneshi	baed180aa0	[tech debt] Revisit lora request model checker (#20636 ) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>	2025-07-09 09:42:41 +08:00
Sanger Steel	72d14d0eed	[Frontend] [Core] Integrate Tensorizer in to S3 loading machinery, allow passing arbitrary arguments during save/load (#19619 ) Signed-off-by: Sanger Steel <sangersteel@gmail.com> Co-authored-by: Eta <esyra@coreweave.com>	2025-07-07 22:47:43 -07:00
Anton	e601efcb10	[Misc] Add fully interleaved support for multimodal 'string' content format (#14047 ) Signed-off-by: drobyshev.anton <drobyshev.anton@wb.ru> Co-authored-by: drobyshev.anton <drobyshev.anton@wb.ru>	2025-07-07 19:43:08 +00:00
ztang2370	a37d75bbec	[Front-end] microbatch tokenization (#19334 ) Signed-off-by: zt2370 <ztang2370@gmail.com>	2025-07-07 17:54:10 +01:00
Woosuk Kwon	462b269280	Implement OpenAI Responses API [1/N] (#20504 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-06 18:32:13 -07:00
Flora Feng	fe1e924811	[Frontend] Support image object in llm.chat (#19635 ) Signed-off-by: sfeng33 <4florafeng@gmail.com> Signed-off-by: Flora Feng <4florafeng@gmail.com>	2025-07-06 06:47:13 +00:00
sangbumlikeagod	9e5452ee34	[Bug][Frontend] Fix structure of transcription's decoder_prompt (#18809 ) Signed-off-by: sangbumlikeagod <oironese@naver.com>	2025-07-04 11:28:07 +00:00
wang.yuqi	6f1229f91d	[Model][2/N] Automatic conversion of CrossEncoding model (#19978 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-07-03 13:59:23 +00:00
Cyrus Leung	b024a42e93	[Core] Move multimodal placeholder from chat utils to model definition (#20355 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-03 08:18:30 +00:00
Chenheli Hua	2e7cbf2d7d	[Frontend] Support configurable mm placeholder strings & flexible video sampling policies via CLI flags. (#20105 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-07-01 23:34:03 -07:00
Yuxuan Zhang	ed70f3c64f	Add GLM4.1V model (Draft) (#19331 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-07-01 12:48:26 +00:00
fyuan1316	e28533a16f	[Bugfix] Fix include prompt in stream response when echo=true (#15233 ) Signed-off-by: Yuan Fang <yuanfang@alauda.io>	2025-07-01 01:30:14 +00:00
Yazan Sharaya	6e244ae091	[Perf][Frontend] eliminate api_key and x_request_id headers middleware overhead (#19946 ) Signed-off-by: Yazan-Sharaya <yazan.sharaya.yes@gmail.com>	2025-06-27 00:44:14 -04:00
Nicolò Lucchesi	e795d723ed	[Frontend] Add `/v1/audio/translations` OpenAI API endpoint (#19615 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-06-25 17:54:14 +00:00
Alex Brooks	ead2110297	[Core][Bugfix] Fix Online MM Beam Search (#19688 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-06-19 17:18:07 +00:00
Maximilien de Bayser	799397ee4f	Support embedding models in V1 (#16188 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-18 21:36:33 -07:00
nguyenhoangthuan99	ede5c4ebdf	[Frontend] add chunking audio for > 30s audio (#19597 ) Signed-off-by: nguyenhoangthuan99 <thuanhppro12@gmail.com>	2025-06-17 11:34:00 +08:00
wang.yuqi	f40f763f12	[CI] Add mteb testing for rerank models (#19344 )	2025-06-16 01:36:43 -07:00
Ning Xie	2f1c19b245	[CI] change spell checker from codespell to typos (#18711 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-11 19:57:10 -07:00
Lu Fang	2b1e2111b0	Fix test_max_model_len in tests/entrypoints/llm/test_generate.py (#19451 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-06-11 12:54:59 +08:00
22quinn	c1c7dbbeeb	[Bugfix][Core] Prevent token lengths exceeding `max_model_len` in V0 (#19348 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-09 23:01:29 +08:00
Lu Fang	6e0cd10f72	[Easy][Test] Simplify test_function_tool_use with multiple parametrizes (#19269 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-06-07 09:19:09 +08:00

1 2 3 4 5 ...

369 Commits