xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-01-29 12:47:14 +08:00

Author	SHA1	Message	Date
Michael Goin	c494f96fbc	Use UV_LINK_MODE=copy in Dockerfile to avoid hardlink fail (#22128 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-05 06:57:10 -07:00
Nicolò Lucchesi	0c275ad5ad	[V0 Deprecation][TPU] Remove V1 flag check from tests (#22248 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-05 06:53:23 -07:00
Ning Xie	74333ae2f6	[Misc] correct static type check for GroupCoordinator (#21946 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-08-05 03:17:46 -07:00
elvischenv	83156c7b89	[NVIDIA] Support Flashinfer TRT-LLM Prefill Attention Kernel (#22095 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-08-05 02:45:34 -07:00
Wentao Ye	4771df7b2b	[Feature] Non-contiguous Support for FP8 Quantization (#21961 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-08-05 02:36:43 -07:00
Benji Beck	05fae02175	Migrate KimiVLImagePixelInputs to TensorSchema (#21769 ) Signed-off-by: Benji Beck <benjibeck@meta.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-08-05 02:36:18 -07:00
Nicolò Lucchesi	d1bf1b9711	[Docs][TPU] Highlight TPU Software version selection (#22242 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-05 02:33:46 -07:00
wang.yuqi	586f286789	[Model] Pooling model activation supports per request control by PoolingParams (#20538 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-08-05 00:37:00 -07:00
Cyrus Leung	811ac13d03	[Core] Factor out common logic for MM budget calculation (#22228 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-04 23:54:55 -07:00
Michael Goin	e79a12fc3a	[UX] Fail if an invalid attention backend is specified (#22217 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2025-08-04 23:54:52 -07:00
Cyrus Leung	cdfd6871a5	[Bugfix] Misaligned params in TreeAttentionImpl (#22226 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-04 22:40:09 -07:00
ZiTian.Zhao	4b3e4474d7	Optimize configuration access with LRU cache in custom ops (#22204 ) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>	2025-08-04 21:43:24 -07:00
Ning Xie	bd3db7f469	[Misc] log more detailed message for ensure_model_parallel_initialized (#22144 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-08-04 19:36:55 -07:00
Ning Xie	29b97c0995	[Doc] add backend to doc string of initialize_model_parallel (#22142 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-08-04 19:36:20 -07:00
elvischenv	7b455cf1c0	[Misc] Remove pass_config from CompilationConfig dump_json excluded (#21911 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-08-04 19:17:18 -07:00
tlipoca9	8a6e108e76	fix: kimi_k2 return empty tool call list (#22149 ) Signed-off-by: tlipoca9 <tlipoca9@gmail.com>	2025-08-04 19:15:31 -07:00
Wentao Ye	d7b28f3415	[Log] DeepGEMM Update Log for Unaligned Problem Size (#22208 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-04 19:13:19 -07:00
Yuxuan Zhang	6fa41e0c32	self.gate dtype update for GLM-4.5 (#22203 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>	2025-08-04 19:12:38 -07:00
Gregory Shtrasberg	031ca762d7	[ROCm][Bugfix] Compilation passes fix (#22202 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-08-04 19:12:28 -07:00
TJian	6ad6b8e115	[FEAT] Refactor ROPE into module (#22192 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-08-04 19:12:16 -07:00
lkchen	f4f4e7ef27	[V0 deprecation][P/D] Deprecate v0 `KVConnectorBase` code (1/2) (#21785 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-08-04 19:11:33 -07:00
Giancarlo Delfin	5ea71ff46f	[V1] reduce block size for tree attention correctness test to fix 'ou… (#22207 ) Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>	2025-08-04 19:11:06 -07:00
Woosuk Kwon	7175817637	Revert "[Bugfix] V1 Fix the cursor leakage issue during request scheduling." (#22223 )	2025-08-04 18:37:06 -07:00
PiteXChen	2dffac464c	[Bugfix] V1 Fix the cursor leakage issue during request scheduling. (#21173 ) Signed-off-by: CLFutureX <775523362@qq.com>	2025-08-04 18:34:10 -07:00
Po-Han Huang (NVIDIA)	bdcb42e45d	[NVIDIA] Auto detect modelopt quant and fix DSR1-FP4 weight loading (#22073 )	2025-08-04 21:02:55 -04:00
Zhonghua Deng	c09efff976	[Bugfix][V1][P/D]Fix the uneven polling issue in the toy proxy for P2pNcclConnector (#21819 ) Signed-off-by: Abatom <abzhonghua@gmail.com>	2025-08-04 20:17:05 +00:00
ericehanley	309c1bb822	[Bug] Update auto_tune.sh to separate benchmarking and profiling. (#21629 ) Signed-off-by: Eric Hanley <ericehanley@google.com>	2025-08-04 15:12:06 +00:00
Woosuk Kwon	9af654cc38	[Responses API] Ignore `store=True` and process the request by default (#22185 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-04 05:12:48 -07:00
Raghav Ravishankar	a5fff3bd49	Fix Arcee model weight loading: Add custom load_weights (#21725 ) Signed-off-by: alyosha-swamy <raghav@arcee.ai>	2025-08-04 04:09:56 -07:00
Cyrus Leung	1539ced93a	[Doc] Update pooling model docs (#22186 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-04 03:37:06 -07:00
22quinn	54de71d0df	[Sampler] Support returning all logprobs or logits (#21792 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-04 03:04:12 -07:00
Isotr0py	fed5849d3f	[Bugfix] Fix failing GGUF models test (#22174 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-04 01:27:02 -07:00
Weixiao Huang	c1b4eb048a	[feat] move WEIGHT_SCALE_SUPPORTED into raise block to accelerate RLHF weight loading (#21164 ) Signed-off-by: huangweixiao <huangweixiao@msh.team>	2025-08-04 15:43:06 +08:00
Jee Jee Li	a7b8788d2c	[Misc] Modify the organization of GLM series (#22171 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-03 23:51:20 -07:00
Tyler Michael Smith	8ecb3e9e93	[CI Bugfix] Fix wNa16 kernel not found for test_shared_storage_connector_hashes (#22163 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-08-03 22:19:04 -07:00
Chenxi Yang	e5949e5ae0	Remove index_put from MM embeddings merging (#22105 ) Co-authored-by: Chenxi Yang <cxyang@meta.com>	2025-08-03 22:15:14 -07:00
ZiTian.Zhao	49bcd893e7	[refactor] improve ConstantList exception specificity (#22156 ) Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>	2025-08-03 22:14:49 -07:00
Giancarlo Delfin	aa7012eb6d	Add tree attention backend for v1 (part 1) (#20401 ) Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>	2025-08-03 22:13:26 -07:00
Ning Xie	c2e75b3c11	remove duplicate code within cleanup_dist_env_and_memory (#22147 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-08-03 20:03:58 -07:00
Abirdcfly	0d7db16a92	[PD] add test for chat completions endpoint (#21925 ) Signed-off-by: Abirdcfly <fp544037857@gmail.com>	2025-08-03 19:57:03 -07:00
22quinn	845420ac2c	[RLHF] Fix torch.dtype not serializable in example (#22158 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-04 02:43:33 +00:00
ZiTian.Zhao	e27d25a0dc	[fix] fix correct assertion syntax error in attention utils. (#22154 ) Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>	2025-08-03 19:24:02 -07:00
Seiji Eicher	6f5478298d	Use `aiohttp` connection pool for benchmarking (#21981 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-08-03 19:23:32 -07:00
Isotr0py	6a39ba85fe	[Bugfix] Fix failing multimodal standard test (#22153 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-03 19:04:38 +00:00
Yuxuan Zhang	d3c18c9cb0	fuse fp32 for GLM-4.5 e_score_correction_bias (#22143 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>	2025-08-03 09:04:54 -07:00
TankNee	83f7bbb318	Add chat doc in quick start (#21213 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-08-03 07:47:55 -07:00
Li, Jiang	b5dfb94fa0	[CI/Build][Bugfix] Fix Qwen2.5 tests in CPU CI via fallback silu_and_mul to torch native implementation (#22145 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-08-03 05:34:04 -07:00
Woosuk Kwon	6d98843b31	[Responses API] Disable response store by default (#22137 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-03 04:04:21 -07:00
David Ben-David	aefeea0fde	[V1] [P/D] Refactor KV Connector Path (#21980 ) Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com>	2025-08-03 04:03:40 -07:00
H	24d1dffbeb	[executor] feat: add supports_pp attr to executors (#21786 ) Signed-off-by: Haibin Lin <haibin.lin@bytedance.com>	2025-08-03 18:04:45 +08:00

1 2 3 4 5 ...

8299 Commits