xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-16 12:55:02 +08:00

Author	SHA1	Message	Date
wxsm	f4135232b9	feat(distributed): add `get_required_kvcache_layout` class method to kv connector api (#20433 ) Signed-off-by: wxsm <wxsms@foxmail.com>	2025-07-30 16:41:51 +00:00
Chenguang Zheng	4904e53c32	[Bugfix] SharedStorage Connector for V1 PD multimodal (#21611 ) Signed-off-by: fake0fan <645327136@qq.com> Signed-off-by: herotai214 <herotai214@gmail.com> Co-authored-by: herotai214 <herotai214@gmail.com>	2025-07-30 09:18:37 -07:00
Cyrus Leung	004203e953	[CI/Build] Fix registry tests (#21934 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-30 09:10:41 -07:00
633WHU	5c765aec65	[Bugfix] Fix TypeError in scheduler when comparing mixed request_id types (#21816 ) Signed-off-by: chiliu <chiliu@paypal.com> Co-authored-by: chiliu <chiliu@paypal.com>	2025-07-30 08:54:44 -07:00
Yong Hoon Shin	ad510309ee	Override attention metadata for fast prefill in some KV sharing setups (#21590 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-07-30 08:54:15 -07:00
Cyrus Leung	366f6b3a4d	[Bugfix] Fix multi-api server not working for text models (#21933 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-30 08:42:05 -07:00
Isotr0py	6e599eebe8	[Bugfix] Fix OOM tests in initialization test (#21921 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-07-30 07:35:47 -07:00
Harry Mellor	88edf5994c	[Docs] Reduce the size of the built docs (#21920 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-30 07:35:08 -07:00
Po-Han Huang (NVIDIA)	ff08e51940	[NVIDIA] Fix Llama4 Scout FP4 functionality issues (#21499 ) Signed-off-by: Po-Han Huang <pohanh@nvidia.com>	2025-07-30 07:33:40 -07:00
Ruixiang Tan	8f4a1c9a04	[Misc] Improve code readability of KVCacheManager (#21673 ) Signed-off-by: tanruixiang <tanruixiang0104@gmail.com> Signed-off-by: Ruixiang Tan <819464715@qq.com> Signed-off-by: GitHub <noreply@github.com>	2025-07-30 07:20:43 -07:00
Harry Mellor	36ede45989	Reduce time wasted in GitHub Actions using `concurrency` (#21919 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-30 07:18:02 -07:00
Cyrus Leung	0e40b26073	[CI/Build] Only run markdownlint in CI (#21892 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-30 07:17:14 -07:00
Wentao Ye	0271c2ff2f	[Test] Add Benchmark and Unit Test for `per_token_group_quant` (#21860 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-30 07:15:02 -07:00
youkaichao	e91d3c9cda	[misc] skip p2p check by default (#21904 )	2025-07-30 22:05:04 +08:00
Yan Pashkovsky	bf668b5bf5	[Feature] Support multiple api keys in server (#18548 ) Signed-off-by: Yan Pashkovsky <yanp.bugz@gmail.com>	2025-07-30 07:03:23 -07:00
rongfu.leng	da3e0bd6e5	[Bugfix] we should use metavar is not choices (#21902 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-07-30 06:51:58 -07:00
Cyrus Leung	fcfd1eb9c5	[Doc] Remove vLLM prefix and add citation for PagedAttention (#21910 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-30 06:36:34 -07:00
aladerran	d979dd6beb	[Feature][EPLB] Add eplb support for Qwen3 (#20815 ) Signed-off-by: aladerran <aladerran@gmail.com>	2025-07-30 06:27:57 -07:00
Eric Curtin	b876860c62	[Hardware][CPU] Build fix for ARM without BF16 (#21848 ) Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-07-30 06:22:00 -07:00
Patrick von Platen	13986365a9	Add @patrickvonplaten as maintainer of mistral's related files. (#21928 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>	2025-07-30 20:42:51 +08:00
Hongsheng Liu	5c8fe389d6	[Docs] Fix the example code of streaming chat completions in reasoning (#21825 ) Signed-off-by: wangzi <3220100013@zju.edu.cn> Co-authored-by: wangzi <3220100013@zju.edu.cn> Co-authored-by: Zi Wang <66560864+BruceW-07@users.noreply.github.com>	2025-07-30 12:11:58 +00:00
Cyrus Leung	5bbaf492a6	[Doc] Update partial support (#21916 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-30 01:32:39 -07:00
Peter Pan	533db0935d	[benchmark] add max-concurrency in result table (#21095 ) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>	2025-07-30 01:15:43 -07:00
Jee Jee Li	fc91da5499	[Model] Remove DSV2 unused code (#21903 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-30 00:55:03 -07:00
Varun Vinayak Shenoy	547795232d	[Tests] Fixing bug inside MultiModalProfiler. (#21842 ) Signed-off-by: Varun Shenoy <varun.vinayak.shenoy@oracle.com>	2025-07-30 00:44:15 -07:00
Kebe	30ef30ed5a	[CI] rollback lint-and-deploy pipeline using amd machine (#21912 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-07-30 00:37:59 -07:00
Jee Jee Li	02f82fe438	[Doc] Update Intern-S1 info (#21908 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-29 23:58:57 -07:00
Cyrus Leung	2ca5f82c2a	[Misc] Remove redundant config definitions (#21891 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-29 23:54:18 -07:00
Louie Tsai	6f8d261882	Update vLLM Benchmark Suite for Xeon based on 0.9.2 release (#21486 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com>	2025-07-30 05:57:03 +00:00
Ricardo Decal	4cd7fe6cea	[Docs] Expand introduction to Ray in Multi-node deployment section (#21584 ) Signed-off-by: Ricardo Decal <rdecal@anyscale.com>	2025-07-29 22:07:28 -07:00
Cyrus Leung	16f3250527	[CI/Build] Fix pre-commit failure in docs (#21897 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-29 21:53:08 -07:00
Tao He	e3bc17ceea	Add @sighingnow as maintainer of qwen's related files. (#21895 ) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>	2025-07-29 21:30:44 -07:00
Kunshang Ji	05cbbe20c5	[XPU] use `ZE_AFFINITY_MASK` for device select on xpu (#21815 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-07-30 03:56:14 +00:00
wang.yuqi	65f311ce59	[Frontend] Add LLM.reward specific to reward models (#21720 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-07-29 20:56:03 -07:00
Wentao Ye	1b0a155534	[Perf] Using `__nv_fp8_e4m3` instead of `c10::e4m3` for `per_token_group_quant` (#21867 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-29 21:50:46 -06:00
Cyrus Leung	44bc46da60	[Bugfix] Actually disable processing cache when API server is scaled out (#21839 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-29 20:36:04 -07:00
MingzhenHan	b7b23da4d2	[Bugfix] Fix comment typo of get_num_common_prefix_blocks() (#21827 ) Signed-off-by: MingzhenHan <hanmingzhen2002@outlook.com>	2025-07-29 20:35:33 -07:00
Areeb Syed	fdde18229e	[Bugfix] Fix shape mismatch assertion error when loading Gemma3n model with BitsAndBytes quantization (#21808 ) Signed-off-by: sydarb <areebsyed237@gmail.com>	2025-07-30 11:35:21 +08:00
Csrayz	b917da442b	Expose PyTorch profiler configuration to environment variables (#21803 ) Signed-off-by: Csrayz <33659823+Csrayz@users.noreply.github.com>	2025-07-29 19:46:31 -07:00
Michael Goin	fb58e3a651	[Docs] Update docker.md with HF_TOKEN, new model, and podman fix (#21856 )	2025-07-29 19:45:41 -07:00
Chen Zhang	76080cff79	[DOC] Fix path of v1 related figures (#21868 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-29 19:45:18 -07:00
Harry Mellor	ba5c5e5404	[Docs] Switch to better markdown linting pre-commit hook (#21851 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-29 19:45:08 -07:00
Chen Zhang	555e7225bc	[v1][attention] Support Hybrid Allocator + FlashInfer (#21412 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-07-30 01:45:29 +00:00
milesial	0e36abf993	[Bugfix] Correct max tokens for non-contiguous embeds (#21798 ) Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com> Co-authored-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>	2025-07-30 01:16:25 +00:00
Simon Mo	452b2a3180	[ci] mark blackwell test optional for now (#21878 )	2025-07-29 18:03:27 -07:00
Simon Mo	0d0cc9e150	[ci] add b200 test placeholder (#21866 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-07-29 17:11:50 -07:00
Yong Hoon Shin	9266d98048	[BugFix] Fix interleaved sliding window not set for Gemma3n (#21863 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-07-29 16:34:19 -07:00
Gregory Shtrasberg	176bbce1db	Revert "[AMD][CI/Build] Fix the AMD issue caused by inappropriate of symbol exposure (#21647 )" (#21850 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-07-29 21:56:29 +00:00
Doug Smith	a1873db23d	docker: docker-aware precompiled wheel support (#21127 ) Signed-off-by: dougbtv <dosmith@redhat.com>	2025-07-29 14:45:19 -07:00
Michael Goin	a33ea28b1b	Add `flashinfer_python` to CUDA wheel requirements (#21389 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-29 12:51:58 -07:00

... 2 3 4 5 6 ...

8299 Commits