xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-02 07:57:08 +08:00

Author	SHA1	Message	Date
Nick Hill	b07bf83c7d	[BugFix] Avoid race conditions in zero-copy tensor transmission (#17203 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-26 06:00:07 +00:00
Zijing Liu	53e8cf53a4	[V1][Metrics] Allow V1 AsyncLLM to use custom logger (#14661 ) Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-25 22:05:40 -07:00
Charlie Fu	54271bb766	[ROCm][Misc] Follow-ups for Skinny Gemms on ROCm. (#17011 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-04-25 22:05:10 -07:00
Shu Wang	9e96f56efb	Allocate kv_cache with stride order (#16605 ) Signed-off-by: shuw <shuw@nvidia.com>	2025-04-25 22:03:31 -07:00
Woosuk Kwon	b278911229	[Minor][Models] Fix Return Types of Llama & Eagle (#17220 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-25 21:54:47 -07:00
yarongmu-google	7bd0c7745c	[Doc] Minor fix for the vLLM TPU setup page (#17206 ) Signed-off-by: Yarong Mu <ymu@google.com>	2025-04-26 04:39:56 +00:00
Woosuk Kwon	1cf0719ebd	[Minor][Spec Decode] Add use_eagle to SpeculativeConfig (#17213 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-25 21:08:15 -07:00
Reid	537d5ee025	[doc] add Anything LLM integration (#17216 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-25 21:03:23 -07:00
Lu Fang	c8e5be35f7	[MISC][AMD] Add unused annotation to rocm kernel file (#17097 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-04-25 20:33:35 -07:00
James Wu	a6e72e1e4f	[Bugfix] [pytorch] Patch AOTAutogradCache._get_shape_env (#17142 ) Signed-off-by: James Wu <jjwu@meta.com>	2025-04-26 11:28:20 +08:00
Yihua Cheng	5e83a7277f	[v1] [P/D] Adding LMCache KV connector for v1 (#16625 )	2025-04-26 03:03:38 +00:00
rasmith	68af5f6c5c	[AMD][FP8][BugFix] Remove V1 check in arg_utils.py for FP8 since it is not necessary (#17215 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-04-25 19:55:05 -07:00
Chen Zhang	8de2901fea	[Bugfix] gemma[2,3] interleaved attention when sliding window is disabled (#17180 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-04-25 19:53:51 -07:00
Rui Qiao	c53e0730cb	[Misc] Refine ray_serve_deepseek example (#17204 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-04-25 16:06:59 -07:00
Benjamin Chislett	a0e619e62a	[V1][Spec Decode] EAGLE-3 Support (#16937 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Co-authored-by: Bryan Lu <yuzhelu@amazon.com>	2025-04-25 15:43:07 -07:00
Nick Hill	70116459c3	[BugFix][Frontend] Fix `LLM.chat()` tokenization (#16081 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-25 22:20:05 +00:00
Christian Heimes	65e262b93b	Fix Python packaging edge cases (#17159 ) Signed-off-by: Christian Heimes <christian@python.org>	2025-04-26 06:15:07 +08:00
Cyrus Leung	43faa0461a	[Bugfix] Fix hybrid model tests (#17182 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-25 15:14:37 -07:00
Daniel Li	48cb2109b6	[V1] Move usage stats to worker and start logging TPU hardware (#16211 )	2025-04-25 14:06:01 -06:00
Russell Bryant	a5450f11c9	[Security] Use safe serialization and fix zmq setup for mooncake pipe (#17192 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-04-25 16:53:23 +00:00
Cyrus Leung	9d98ab5ec6	[Misc] Inline Molmo requirements (#17190 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-25 16:41:44 +00:00
Reid	df5c879527	[doc] update wrong hf model links (#17184 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-25 16:40:54 +00:00
Harry Mellor	423e9f1cbe	Use Transformers helper `get_text_config()` instead of checking for `text_config` (#17105 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-25 08:47:35 -07:00
Harry Mellor	0bd7f8fca5	Bump Transformers to 4.51.3 (#17116 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-25 08:34:34 -07:00
Jasmond L	d5615af9ae	[Bugfix] Fix Mistral ChatCompletionRequest Body Exception (#16769 ) Signed-off-by: Jasmond Loh <Jasmond.Loh@hotmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-04-25 07:26:30 -07:00
Cyrus Leung	19dcc02a72	[Bugfix] Fix mistral model tests (#17181 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-25 06:03:34 -07:00
Alex Brooks	7feae92c1f	[Doc] Move todo out of beam search docstring (#17183 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-04-25 04:44:58 -07:00
Michael Yao	f851b84266	[Doc] Add two links to disagg_prefill.md (#17168 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-04-25 10:23:57 +00:00
Lu Fang	fc966e9cc6	Only turn on FastIncrementalDetokenizer when tokenizers >= 0.21.1 (#17158 )	2025-04-25 17:10:32 +08:00
Michael Yao	ef19e67d2c	[Doc] Add headings to improve gptqmodel.md (#17164 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-04-25 01:13:13 -07:00
rasmith	a41351f363	[Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (#15734 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com>	2025-04-25 00:45:02 -07:00
Sangyeon Cho	6aae216b4e	[Bugfix] remove fallback in guided_json (int range, patterns) (#16725 ) Signed-off-by: csy1204 <josang1204@gmail.com> Co-authored-by: 조상연[플레이스 AI] <sang-yeon.cho@navercorp.com>	2025-04-25 06:54:43 +00:00
yexin(叶鑫)	b22980a1dc	[Perf]Optimize rotary_emb implementation to use Triton operator for improved inference performance (#16457 ) Signed-off-by: cynthieye <yexin93@qq.com> Co-authored-by: MagnetoWang <magnetowang@outlook.com>	2025-04-25 14:52:28 +08:00
Lucas Wilkinson	881f735827	[Misc] Benchmark Serving Script Support Appending Results (#17028 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-04-24 22:53:55 -07:00
Mengqing Cao	2f54045508	[Bugfix][Misc] Use TritonPlaceholderModule to defensively import triton (#15099 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-04-24 22:51:02 -07:00
Lifu Huang	5aa6efb9a5	[Misc] Clean up redundant code in uniproc_executor.py (#16762 ) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>	2025-04-24 22:49:30 -07:00
Harry Mellor	6ca0234478	Move missed `SchedulerConfig` args into scheduler config group in `EngineArgs` (#17131 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 22:48:53 -07:00
Michael Goin	649818995f	[Docs] Fix True->true in supported_models.md (#17141 )	2025-04-25 04:20:04 +00:00
Varun Sundar Rabindranath	7a0a9da72b	[Doc] V1 : Update LoRA status (#17133 ) Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com> Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>	2025-04-24 20:17:22 -07:00
Zaida Zhou	69bff9bc89	fix float16 support for kimi-vl (#17156 ) Co-authored-by: zhouzaida <zhouzaida@msh.team>	2025-04-24 20:16:32 -07:00
Lucas Wilkinson	41ca7eb491	[Attention] FA3 decode perf improvement - single mma warp group support for head dim 128 (#16864 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-04-24 20:12:21 -07:00
vllmellm	eef364723c	[FEAT] [ROCm]: AITER Fused MOE V1 Support (#16752 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-04-25 11:06:50 +08:00
jglaser	0d6e187e88	Use custom address for listening socket (#15988 ) Signed-off-by: Jens Glaser <glaserj@ornl.gov>	2025-04-25 01:57:16 +00:00
Michael Goin	9420a1fc30	Better error message for missing mistral params.json (#17132 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-24 23:43:08 +00:00
Rui Qiao	583e900996	[Misc] Add example to run DeepSeek with Ray Serve LLM (#17134 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-04-24 22:25:21 +00:00
Maximilien de Bayser	05e1fbfc52	Add chat template for Llama 4 models (#16428 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-04-24 20:19:36 +00:00
Yinghai Lu	fe92176321	Add collective_rpc to llm engine (#16999 ) Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai>	2025-04-24 20:16:52 +00:00
Russell Bryant	6d0df0ebeb	[Docs] Generate correct github links for decorated functions (#17125 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-24 10:39:43 -07:00
Harry Mellor	0fa939e2d1	Improve configs - `LoRAConfig` + `PromptAdapterConfig` (#16980 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 10:29:34 -07:00
Harry Mellor	0422ce109f	Add `:markdownhelp:` to `EngineArgs` docs so markdown docstrings render properly (#17124 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 10:28:45 -07:00

1 2 3 4 5 ...

6073 Commits