xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-26 08:09:37 +08:00

Author	SHA1	Message	Date
Nick Hill	df6f3ce883	[Core] Remove prompt string from engine core data structures (#17214 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-25 23:41:05 -07:00
Woosuk Kwon	513f074766	[CI/test] Fix Eagle Correctness Test (#17209 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-25 23:40:36 -07:00
Nick Hill	b07bf83c7d	[BugFix] Avoid race conditions in zero-copy tensor transmission (#17203 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-26 06:00:07 +00:00
Zijing Liu	53e8cf53a4	[V1][Metrics] Allow V1 AsyncLLM to use custom logger (#14661 ) Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-25 22:05:40 -07:00
Charlie Fu	54271bb766	[ROCm][Misc] Follow-ups for Skinny Gemms on ROCm. (#17011 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-04-25 22:05:10 -07:00
Shu Wang	9e96f56efb	Allocate kv_cache with stride order (#16605 ) Signed-off-by: shuw <shuw@nvidia.com>	2025-04-25 22:03:31 -07:00
Woosuk Kwon	b278911229	[Minor][Models] Fix Return Types of Llama & Eagle (#17220 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-25 21:54:47 -07:00
Woosuk Kwon	1cf0719ebd	[Minor][Spec Decode] Add use_eagle to SpeculativeConfig (#17213 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-25 21:08:15 -07:00
James Wu	a6e72e1e4f	[Bugfix] [pytorch] Patch AOTAutogradCache._get_shape_env (#17142 ) Signed-off-by: James Wu <jjwu@meta.com>	2025-04-26 11:28:20 +08:00
Yihua Cheng	5e83a7277f	[v1] [P/D] Adding LMCache KV connector for v1 (#16625 )	2025-04-26 03:03:38 +00:00
rasmith	68af5f6c5c	[AMD][FP8][BugFix] Remove V1 check in arg_utils.py for FP8 since it is not necessary (#17215 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-04-25 19:55:05 -07:00
Chen Zhang	8de2901fea	[Bugfix] gemma[2,3] interleaved attention when sliding window is disabled (#17180 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-04-25 19:53:51 -07:00
Benjamin Chislett	a0e619e62a	[V1][Spec Decode] EAGLE-3 Support (#16937 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Co-authored-by: Bryan Lu <yuzhelu@amazon.com>	2025-04-25 15:43:07 -07:00
Nick Hill	70116459c3	[BugFix][Frontend] Fix `LLM.chat()` tokenization (#16081 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-25 22:20:05 +00:00
Christian Heimes	65e262b93b	Fix Python packaging edge cases (#17159 ) Signed-off-by: Christian Heimes <christian@python.org>	2025-04-26 06:15:07 +08:00
Daniel Li	48cb2109b6	[V1] Move usage stats to worker and start logging TPU hardware (#16211 )	2025-04-25 14:06:01 -06:00
Russell Bryant	a5450f11c9	[Security] Use safe serialization and fix zmq setup for mooncake pipe (#17192 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-04-25 16:53:23 +00:00
Harry Mellor	423e9f1cbe	Use Transformers helper `get_text_config()` instead of checking for `text_config` (#17105 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-25 08:47:35 -07:00
Jasmond L	d5615af9ae	[Bugfix] Fix Mistral ChatCompletionRequest Body Exception (#16769 ) Signed-off-by: Jasmond Loh <Jasmond.Loh@hotmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-04-25 07:26:30 -07:00
Cyrus Leung	19dcc02a72	[Bugfix] Fix mistral model tests (#17181 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-25 06:03:34 -07:00
Alex Brooks	7feae92c1f	[Doc] Move todo out of beam search docstring (#17183 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-04-25 04:44:58 -07:00
Lu Fang	fc966e9cc6	Only turn on FastIncrementalDetokenizer when tokenizers >= 0.21.1 (#17158 )	2025-04-25 17:10:32 +08:00
rasmith	a41351f363	[Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (#15734 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com>	2025-04-25 00:45:02 -07:00
Sangyeon Cho	6aae216b4e	[Bugfix] remove fallback in guided_json (int range, patterns) (#16725 ) Signed-off-by: csy1204 <josang1204@gmail.com> Co-authored-by: 조상연[플레이스 AI] <sang-yeon.cho@navercorp.com>	2025-04-25 06:54:43 +00:00
yexin(叶鑫)	b22980a1dc	[Perf]Optimize rotary_emb implementation to use Triton operator for improved inference performance (#16457 ) Signed-off-by: cynthieye <yexin93@qq.com> Co-authored-by: MagnetoWang <magnetowang@outlook.com>	2025-04-25 14:52:28 +08:00
Mengqing Cao	2f54045508	[Bugfix][Misc] Use TritonPlaceholderModule to defensively import triton (#15099 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-04-24 22:51:02 -07:00
Lifu Huang	5aa6efb9a5	[Misc] Clean up redundant code in uniproc_executor.py (#16762 ) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>	2025-04-24 22:49:30 -07:00
Harry Mellor	6ca0234478	Move missed `SchedulerConfig` args into scheduler config group in `EngineArgs` (#17131 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 22:48:53 -07:00
Zaida Zhou	69bff9bc89	fix float16 support for kimi-vl (#17156 ) Co-authored-by: zhouzaida <zhouzaida@msh.team>	2025-04-24 20:16:32 -07:00
vllmellm	eef364723c	[FEAT] [ROCm]: AITER Fused MOE V1 Support (#16752 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-04-25 11:06:50 +08:00
jglaser	0d6e187e88	Use custom address for listening socket (#15988 ) Signed-off-by: Jens Glaser <glaserj@ornl.gov>	2025-04-25 01:57:16 +00:00
Michael Goin	9420a1fc30	Better error message for missing mistral params.json (#17132 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-24 23:43:08 +00:00
Maximilien de Bayser	05e1fbfc52	Add chat template for Llama 4 models (#16428 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-04-24 20:19:36 +00:00
Yinghai Lu	fe92176321	Add collective_rpc to llm engine (#16999 ) Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai>	2025-04-24 20:16:52 +00:00
Harry Mellor	0fa939e2d1	Improve configs - `LoRAConfig` + `PromptAdapterConfig` (#16980 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 10:29:34 -07:00
Mark McLoughlin	340d7b1b21	[V1][Spec Decoding] Add num_drafts and num_accepted_tokens_per_position metrics (#16665 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-04-24 08:57:40 -07:00
wang.yuqi	67309a1cb5	[Frontend] Using matryoshka_dimensions control the allowed output dimensions. (#16970 )	2025-04-24 07:06:28 -07:00
Shanshan Shen	b724afe343	[V1][Structured Output] Clear xgrammar compiler object when engine core shut down to avoid nanobind leaked warning (#16954 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-24 06:15:03 -07:00
Harry Mellor	21f4f1c9a4	Improve static type checking in `LoRAModelRunnerMixin` (#17104 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 06:14:47 -07:00
Isotr0py	b0c1f6202d	[Misc] Remove OLMo2 config copy (#17066 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-24 06:14:32 -07:00
Rui Qiao	c0dfd97519	[V1][PP] Optimization: continue scheduling prefill chunks (#17080 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-04-24 05:27:08 -07:00
Harry Mellor	0a05ed57e6	Simplify `TokenizerGroup` (#16790 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 04:43:56 -07:00
Woosuk Kwon	b411418ff0	[Chore] Remove Sampler from Model Code (#17084 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-24 02:49:33 -07:00
张宇	6167c0e5d2	[Bugfix][Core] add seq_id_to_seq_group clearing to avoid memory leak when s… (#16472 ) Signed-off-by: 开哲 <kaizhe.zy@alibaba-inc.com> Co-authored-by: 开哲 <kaizhe.zy@alibaba-inc.com>	2025-04-24 11:25:37 +08:00
Areeb Syed	ed2e464653	Addendum Fix to support FIPS enabled machines with MD5 hashing (#17043 ) Signed-off-by: sydarb <areebsyed237@gmail.com>	2025-04-23 19:55:00 -07:00
Harry Mellor	2c8ed8ee48	More informative error when using Transformers backend (#16988 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-23 19:54:03 -07:00
Michael Goin	ed50f46641	[Bugfix] Enable V1 usage stats (#16986 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-23 19:54:00 -07:00
Woosuk Kwon	46e678bcff	[Minor] Use larger batch sizes for A100/B100/B200/MI300x (#17073 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-23 19:18:59 -07:00
Chen Xia	6b2427f995	[Quantization]add prefix for commandA quantized model (#17017 )	2025-04-23 17:32:40 -07:00
Woosuk Kwon	41fb013d29	[V1][Spec Decode] Always use argmax for sampling draft tokens (#16899 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-23 14:57:43 -07:00

1 2 3 4 5 ...

4114 Commits