xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-04 23:17:15 +08:00

Author	SHA1	Message	Date
Lucas Wilkinson	cc5befbced	[BugFix] Fix cascade attention - RuntimeError: scheduler_metadata must have shape (metadata_size) (#17283 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-04-28 13:55:50 -07:00
Lucas Wilkinson	d8bccde686	[BugFix] Fix vllm_flash_attn install issues (#17267 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-04-27 17:27:56 -07:00
Lily Liu	20e489eaa1	[V1][Spec Decode] Make eagle compatible with prefix caching. (#17137 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-04-27 09:29:43 -07:00
Cyrus Leung	4213475ec7	[Metrics] Fix minor inconsistencies in bucket progression (#17262 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-27 16:19:39 +00:00
cascade	690fe019f0	[Feature] support sequence parallelism using compilation pass (#16155 ) Signed-off-by: cascade812 <cascade812@outlook.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-04-27 06:29:35 -07:00
Flex Wang	18445edd0f	[Misc] Change buckets of histogram_iteration_tokens to [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens (#17033 ) Signed-off-by: sfc-gh-zhwang <flex.wang@snowflake.com>	2025-04-27 12:30:53 +00:00
Chen Zhang	838cedade7	[Bugfix] Get a specific type of layer from forward context (#17222 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-04-27 00:58:05 -07:00
Ning Xie	fd11a325b8	[MISC] rename interval to max_recent_requests (#14285 )	2025-04-26 16:59:18 +00:00
Ning Xie	dc2ceca5c5	[BUGFIX] use random for NONE_HASH only when PYTHONHASHSEED not set (#17088 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-04-26 14:34:24 +00:00
Russell Bryant	f8acd01ff7	[V1] Add `structural_tag` support using xgrammar (#17085 )	2025-04-26 14:06:37 +00:00
Nick Hill	df6f3ce883	[Core] Remove prompt string from engine core data structures (#17214 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-25 23:41:05 -07:00
Nick Hill	b07bf83c7d	[BugFix] Avoid race conditions in zero-copy tensor transmission (#17203 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-26 06:00:07 +00:00
Zijing Liu	53e8cf53a4	[V1][Metrics] Allow V1 AsyncLLM to use custom logger (#14661 ) Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-25 22:05:40 -07:00
Woosuk Kwon	1cf0719ebd	[Minor][Spec Decode] Add use_eagle to SpeculativeConfig (#17213 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-25 21:08:15 -07:00
Benjamin Chislett	a0e619e62a	[V1][Spec Decode] EAGLE-3 Support (#16937 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Co-authored-by: Bryan Lu <yuzhelu@amazon.com>	2025-04-25 15:43:07 -07:00
Daniel Li	48cb2109b6	[V1] Move usage stats to worker and start logging TPU hardware (#16211 )	2025-04-25 14:06:01 -06:00
Lu Fang	fc966e9cc6	Only turn on FastIncrementalDetokenizer when tokenizers >= 0.21.1 (#17158 )	2025-04-25 17:10:32 +08:00
Sangyeon Cho	6aae216b4e	[Bugfix] remove fallback in guided_json (int range, patterns) (#16725 ) Signed-off-by: csy1204 <josang1204@gmail.com> Co-authored-by: 조상연[플레이스 AI] <sang-yeon.cho@navercorp.com>	2025-04-25 06:54:43 +00:00
Yinghai Lu	fe92176321	Add collective_rpc to llm engine (#16999 ) Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai>	2025-04-24 20:16:52 +00:00
Mark McLoughlin	340d7b1b21	[V1][Spec Decoding] Add num_drafts and num_accepted_tokens_per_position metrics (#16665 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-04-24 08:57:40 -07:00
Shanshan Shen	b724afe343	[V1][Structured Output] Clear xgrammar compiler object when engine core shut down to avoid nanobind leaked warning (#16954 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-24 06:15:03 -07:00
Harry Mellor	21f4f1c9a4	Improve static type checking in `LoRAModelRunnerMixin` (#17104 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 06:14:47 -07:00
Rui Qiao	c0dfd97519	[V1][PP] Optimization: continue scheduling prefill chunks (#17080 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-04-24 05:27:08 -07:00
Harry Mellor	0a05ed57e6	Simplify `TokenizerGroup` (#16790 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 04:43:56 -07:00
Woosuk Kwon	b411418ff0	[Chore] Remove Sampler from Model Code (#17084 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-24 02:49:33 -07:00
Michael Goin	ed50f46641	[Bugfix] Enable V1 usage stats (#16986 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-23 19:54:00 -07:00
Woosuk Kwon	41fb013d29	[V1][Spec Decode] Always use argmax for sampling draft tokens (#16899 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-23 14:57:43 -07:00
Yong Hoon Shin	32d4b669d0	[BugFix][V1] Fix int32 token index overflow when preparing input ids (#16806 )	2025-04-23 12:12:35 -07:00
Travis Johnson	3cde34a4a4	[Frontend] Support guidance:no-additional-properties for compatibility with xgrammar (#15949 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2025-04-23 18:34:41 +00:00
Harry Mellor	bdb3660312	Use `@property` and private field for `data_parallel_rank_local` (#17053 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-23 08:50:08 -07:00
Harry Mellor	53c0fa1e25	Ensure that `pid` passed to `kill_process_tree` is `int` for `mypy` (#17051 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-23 07:32:26 -07:00
Lucas Wilkinson	d0da99fb70	[BugFix] llama4 fa3 fix - RuntimeError: scheduler_metadata must have shape (metadata_size) (#16998 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-04-22 21:49:24 -07:00
Nick Hill	b2f195c429	[V1] Avoid socket errors during shutdown when requests are in in-flight (#16807 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-23 12:36:29 +08:00
Nick Hill	1e013fa388	[V1][DP] More robust DP/EP dummy request coordination (#16277 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-22 19:12:15 -07:00
Chenyaaang	83d933718c	[Core][V1][TPU] Enable structured decoding on TPU V1 (#16499 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-04-22 18:05:23 -06:00
Nick Hill	5175b884f7	[BugFix] Remove default multiproc executor `collective_rpc` timeout (#17000 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-22 23:27:14 +00:00
Nick Hill	e4d6144232	[BugFix] Fix incremental detokenization perf issue (#16963 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-22 08:16:19 +00:00
Woosuk Kwon	c4ab9f3e71	[V1] Remove pre-allocation for KV cache (#16941 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-22 00:52:18 -07:00
Chauncey	acba33a0f1	[Bugfix] Fix the issue where llm.generate cannot be called repeatedly after setting GuidedDecodingParams (#16767 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-04-22 06:02:20 +00:00
SnowCharm	a114bf20a3	[Perf] Optimize `_update_states` for GPU model runner (#16910 ) Signed-off-by: snowcharm <snowcharmqq@gmail.com>	2025-04-22 14:01:54 +08:00
Jeffrey Li	0e4254492f	[Bugfix]: fix issue with n>1 sampling on v1 requests overriding each other (#16863 ) Signed-off-by: Jeffrey Li <jeffrey.dot.li@gmail.com>	2025-04-22 11:40:19 +08:00
Woosuk Kwon	1311913f55	[BugFix][Spec Decode] No in-place update to draft probs (#16952 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-21 19:54:19 -07:00
Nicolò Lucchesi	fa3bba2a53	[TPU][V1] Enable Top-P (#16843 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-04-22 00:46:07 +00:00
Michael Goin	986537f1c3	[V1] V1 FlashInfer Attention (#16684 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Aurick Qiao <qiao@aurick.net>	2025-04-22 00:38:41 +00:00
Nicolò Lucchesi	210207525e	[TPU][V1] Capture multimodal encoder during model compilation (#15051 ) Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Siyuan Liu <lsiyuan@google.com>	2025-04-21 18:36:59 -06:00
Chengji Yao	471fe65630	[TPU][V1] Implicitly adjust page size when there's SMEM OOM (#16871 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-21 15:43:13 -06:00
Woosuk Kwon	3a0fba5cf4	[V1][Spec Decode] Handle draft tokens beyond max_model_len (#16087 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-21 12:38:50 -07:00
Han Zhang	d41faaf9df	Restore buffers when wake up from level 2 sleep (#16564 ) (#16889 ) Signed-off-by: Han <zh950713@gmail.com>	2025-04-21 20:18:28 +08:00
Staszek Paśko	87aaadef73	Serialize tensors using int8 views (#16866 ) Signed-off-by: Staszek Pasko <staszek@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-19 10:28:34 -07:00
vie-serendipity	d9737ca1c6	[V1][Misc] stop update prefix cache stats when logs_stats is disabled (#16460 ) Signed-off-by: vie-serendipity <2733147505@qq.com>	2025-04-19 02:25:19 -07:00

1 2 3 4 5 ...

538 Commits