xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-25 01:55:41 +08:00

Author	SHA1	Message	Date
Wentao Ye	e6c22d2b2f	[Perf] Apply torch.compile for `per_block_cast_to_fp8` (#24611 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-03 13:35:54 -07:00
Luka Govedič	6dbbecd5b2	[torch.compile] Cleanup compilation tests and custom passes, add debug utils, fix DCE bug (#23091 ), fix test (#24376 ), and prep for custom op matching (#24604 ) (#24542 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: luka <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-03 13:35:53 -07:00
Harry Mellor	44be2b7349	Make `mypy` behave like a proper pre-commit hook (#25313 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-03 13:35:53 -07:00
Cyrus Leung	ddf4e1f56f	[Misc] Remove unused encoder-decoder error strings (#25374 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-03 13:35:53 -07:00
Woosuk Kwon	a815d820ee	Remove V0 attention backends (#25351 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-03 13:35:53 -07:00
Wenlong Wang	dad5f4d16d	[Docs] Fix warnings in mkdocs build (continued) (#25042 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-03 13:35:53 -07:00
Nick Hill	d897924b45	[BugFix] Exclude self when checking for port collision (#25286 ) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-03 13:35:53 -07:00
Andrew Sansom	9a4600e4dc	[CORE] Prompt Embeddings Support for v1 Engine (#24278 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai> Signed-off-by: Andrew Sansom <qthequartermasterman@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-09-19 08:03:09 +08:00
elvischenv	e67a79db03	[Bugfix] Refactor Flashinfer TRTLLM attention kernel selection logic (#24600 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-17 15:36:29 -07:00
Wentao Ye	de2cc3d867	[Deprecation] Remove DeepGEMM Old Symbol Wrapper (#24902 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-15 20:03:29 -06:00
Lukas Geiger	1da0f1441d	[Core][Multimodal] Cache `supports_kw` (#24773 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-09-13 07:27:04 +00:00
dongluw	a5b84f1cbf	[Core] Shared memory based object store for Multimodal data caching and IPC (#20452 ) Signed-off-by: donglu <donglu@cohere.com>	2025-09-12 07:54:17 -07:00
Xiaozhu Meng	e42af78b18	[flashinfer] [kernel] support for fp8 kv cache for trtllm prefill attention (#24197 ) Signed-off-by: Xiaozhu <mxz297@gmail.com>	2025-09-11 14:20:09 -07:00
Boyuan Feng	94e6b2d55f	Allow users to specify kv cache memory size (#21489 ) Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-11 13:41:07 +00:00
Charlie Fu	73e688cb79	[ROCm][Feature] Enable Pipeline Parallelism with Ray Compiled Graph on ROCm (#24275 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-09-09 23:27:35 +00:00
22quinn	0cdd213641	[Misc] Improve Worker process title and logging prefix (#22205 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-09-08 21:43:48 -07:00
Zebing Lin	82dfb12e52	[Core] Use sha256 bytes instead of BlockHash to reduce GC overhead (#23673 ) Signed-off-by: linzebing <linzebing1995@gmail.com>	2025-09-08 21:34:37 -07:00
Nick Hill	752d2e1c36	[Minor] Fix some random typos in comments (#24009 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-31 16:42:17 -07:00
Gabriel Marinho	5b8077b8ac	Fix wrong truncate_prompt_tokens type hint (#22761 ) Signed-off-by: Gabriel Marinho <gmarinho@ibm.com> Signed-off-by: Gabriel Marinho <104592062+gmarinho2@users.noreply.github.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com>	2025-08-30 20:39:38 +00:00
dubejf	5b31cb1781	[Bugfix] Fix --config arg expansion called from api_server.py (#23944 ) Signed-off-by: Jean-Francois Dube <dubejf+gh@gmail.com> Co-authored-by: Jean-Francois Dube <dubejf+gh@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-08-29 21:36:39 -07:00
Adit Chawdhary	4f7cde7272	Adds `json_count_leaves` utility function (#23899 ) Signed-off-by: aditchawdhary <aditxy@hotmail.com>	2025-08-29 05:28:13 -07:00
Wentao Ye	d3d2aad5a2	[Log] Use Debug Once for DeepGEMM E8M0 When not Enabled (#23858 )	2025-08-28 22:18:10 +00:00
Wentao Ye	3af47c3cc6	[Feature] Add Hopper DeepGEMM E8M0 for DeepSeekV3.1 scale_fmt (#23666 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-08-27 14:09:08 +00:00
rongfu.leng	8dbf6ed7be	[Bugfix] fix when config.yaml config value is list parse error (#23528 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-08-27 05:54:39 +00:00
nvjullin	f66673a39d	[Kernel] Added flashinfer fp8 per-tensor gemms (#22895 ) Signed-off-by: Julien Lin <jullin@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-08-26 06:54:04 -07:00
Wentao Ye	56dcf4e7e9	[Bug] Fix DeepGEMM Env Control (#23591 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-25 18:41:21 -07:00
Shiyan Deng	da65bec309	add an env var for path to pre-downloaded flashinfer cubin files (#22675 )	2025-08-22 19:25:45 +00:00
Didier Durand	22cf679aad	[Doc]: fix various typos in multiple files (#23179 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-08-22 10:38:46 -07:00
Li, Jiang	88016c372a	[Bugfix] Fix pooling models on CPU backend (#23392 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-08-22 09:47:17 +00:00
Wentao Ye	394591e343	[Feature] Enable DeepGEMM Linear on B200; 1.5% E2E throughput improvement (#23351 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-21 21:01:08 -07:00
Ming Yang	10f535c086	[Bugfix] Fix port conflict by obtaining a list of open ports upfront (#21894 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-08-21 10:22:18 -07:00
elvischenv	03752dba8f	[NVIDIA] Support Flashinfer TRTLLM FP8-q/kv/out Attention Kernel (#21716 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-08-19 08:22:15 -04:00
afeldman-nm	bf7f470b22	[V1] Logits processors extensibility (#19912 ) Signed-off-by: Andrew Feldman <afeldman@redhat.com> Signed-off-by: Andrew Feldman <afeld2012@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Andrew Feldman <afeld2012@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-16 12:59:17 -07:00
Nicolò Lucchesi	070da660c1	[Kernel] Simplify `get_kv_cache_layout` and cache `use_trtllm_attention` env-dependent bit (#22735 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-16 00:14:08 +00:00
Nick Hill	ad0297d113	[Misc] Support passing multiple request ids at once to `AsyncLLM.abort()` (#22944 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-15 17:00:36 -07:00
Yichen Yan	236b864e4f	[BugFix] Make `run_once` thread-safe (#22978 ) Signed-off-by: <wenji.yyc@alibaba-inc.com> Signed-off-by: Yichen Yan <wenji.yyc@alibaba-inc.com>	2025-08-15 16:56:17 -07:00
Or Ozeri	c280066f9d	[v1] Move block_hashes from KVCacheManager to Request.block_hashes (#19728 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-08-15 16:52:52 -07:00
Thomas Parnell	75531a6c13	[V1] [Hybrid] Support using float32 for state in Hybrid Models (Mamba2, Mamba1, Minimax) (#22928 ) Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Daniel Afrimi <danielafrimi8@gmail.com> Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-08-15 12:57:06 +00:00
nvjullin	279a5f31b3	[Kernel] Add nvfp4 gemm flashinfer backends (#22346 ) Signed-off-by: Julien Lin <jullin@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-08-14 16:03:55 -04:00
Nick Hill	eb08487b18	[BugFix] Threadsafe close async zmq sockets (#22877 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-14 03:44:29 -07:00
HWH	9bd9294f0e	[Bugfix] Fix MiniCPMV Image input inference failed (#22813 ) Signed-off-by: HWH <67449739+jio-H@users.noreply.github.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-08-13 09:41:41 -07:00
Michael Goin	c6b928798e	Force TRTLLM attention for gpt-oss on SM100 (#22678 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-12 21:22:16 -07:00
Wentao Ye	f7dcce7a4a	[Feature] Add `VLLM_USE_DEEP_GEMM_E8M0` Env to Control E8M0 Scale (#21968 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-11 09:39:08 -07:00
Cyrus Leung	951b038298	[Misc] Move jsontree to utils (#22622 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-11 03:49:32 -07:00
Harry Mellor	bc1d02ac85	[Docs] Add comprehensive CLI reference for all large `vllm` subcommands (#22601 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-11 00:13:33 -07:00
Benji Beck	b4e2916721	Migrate LlavaNextImageInputs to TensorSchema (#21774 ) Signed-off-by: Benji Beck <benjibeck@meta.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-10 09:05:21 -07:00
Harry Mellor	56186474f6	[Docs] Reduce noise in docs and `--help` from the JSON tip (#22567 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-09 08:31:32 -07:00
Wentao Ye	3157aebb63	[Log] Add Warning for Deprecation of DeepGEMM old version (#22194 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-08 23:07:48 -07:00
Yongye Zhu	e789cad6b8	[gpt-oss] triton kernel mxfp4 (#22421 ) Signed-off-by: <zyy1102000@gmail.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>	2025-08-08 08:24:07 -07:00
Harry Mellor	e5ebeeba53	Remove exception for Python 3.8 typing from linter (#22506 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-08 03:06:46 -07:00

1 2

95 Commits