xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-14 00:17:26 +08:00

Author	SHA1	Message	Date
Lucia Fang	3d2779c29a	[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827 ) Signed-off-by: Lucia Fang <fanglu@fb.com>	2025-05-15 22:28:27 -07:00
Harry Mellor	b18201fe06	Allow users to pass arbitrary JSON keys from CLI (#18208 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-15 21:05:34 -07:00
Sky Lee	f4937a51c1	[Model] vLLM v1 supports Medusa (#17956 ) Signed-off-by: lisiqi23 <lisiqi23@xiaomi.com> Signed-off-by: skylee-01 <497627264@qq.com> Co-authored-by: lisiqi23 <lisiqi23@xiaomi.com>	2025-05-15 21:05:31 -07:00
Nick Hill	55aa7af994	[V1] DP scale-out (2/N): Decouple engine process management and comms (#15977 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-13 10:48:21 -07:00
Cyrus Leung	61e0a506a3	[Bugfix] Avoid repeatedly creating dummy data during engine startup (#17935 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-12 22:40:19 -07:00
Tao He	60f7624334	Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844 )	2025-05-12 19:52:47 -07:00
Reid	009b3d5382	[Misc] not show --model in vllm serve --help (#16691 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-11 08:47:58 +00:00
Gregory Shtrasberg	06c0922a69	[FP8][ROCm][Attention] Enable FP8 KV cache on ROCm for V1 (#17870 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-05-11 15:58:45 +08:00
Kuntai Du	9112155283	[Perf] Use small max_num_batched_tokens for A100 (#17885 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2025-05-11 07:53:23 +00:00
Harry Mellor	4b2ed7926a	Improve configs - the rest! (#17562 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-09 15:18:44 -07:00
vllmellm	3c9396a64f	[FEAT][ROCm]: Support AITER MLA on V1 Engine (#17523 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: qli88 <qiang.li2@amd.com> Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>	2025-05-09 10:42:05 +08:00
Vadim Markovtsev	b2da14a05a	Improve exception reporting in MP engine (#17800 ) Signed-off-by: Vadim Markovtsev <vadim@poolside.ai>	2025-05-08 05:32:39 +00:00
Harry Mellor	646a31e51e	Fix and simplify `deprecated=True` CLI `kwarg` (#17781 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-07 16:51:06 +01:00
Satyajith Chilappagari	043e4c4955	Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com> Co-authored-by: Aaron Dou <yzdou@amazon.com> Co-authored-by: Shashwat Srijan <sssrijan@amazon.com> Co-authored-by: Chongming Ni <chongmni@amazon.com> Co-authored-by: Amulya Ballakur <amulyaab@amazon.com> Co-authored-by: Patrick Lange <patlange@amazon.com> Co-authored-by: Elaine Zhao <elaineyz@amazon.com> Co-authored-by: Lin Lin Pan <tailinpa@amazon.com> Co-authored-by: Navyadhara Gogineni <navyadha@amazon.com> Co-authored-by: Yishan McNabb <yishanm@amazon.com> Co-authored-by: Mrinal Shukla <181322398+mrinalks@users.noreply.github.com>	2025-05-07 00:07:30 -07:00
Jee Jee Li	ba7703e659	[Misc] Remove qlora_adapter_name_or_path (#17699 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-06 23:10:37 -07:00
Gregory Shtrasberg	de906b95f9	[Bugfix] Fix for the condition to accept empty encoder inputs for mllama (#17732 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-05-06 19:59:06 +00:00
Aaron Pham	175bda67a1	[Feat] Add deprecated=True to CLI args (#17426 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-05-06 08:11:27 -07:00
Li, Jiang	a6fed02068	[V1][PP] Support PP for MultiprocExecutor (#14219 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: jiang.li <jiang1.li@intel.com>	2025-05-06 07:58:05 -07:00
Michael Goin	d419aa5dc4	[V1] Enable TPU V1 backend by default (#17673 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-06 06:49:49 -07:00
Harry Mellor	d6484ef3c3	Add full API docs and improve the UX of navigating them (#17485 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-03 19:42:43 -07:00
Cyrus Leung	46fae69cf0	[Misc] V0 fallback for `--enable-prompt-embeds` (#17615 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-03 22:59:24 +00:00
Cyrus Leung	887d7af882	[Core] Gate `prompt_embeds` behind a feature flag (#17607 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-04 00:19:20 +08:00
Chenyaaang	87baebebd8	[Frontend][TPU] Add TPU default max-num-batched-tokens based on device name (#17508 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-05-02 21:42:44 -07:00
Cyrus Leung	cb234955df	[Misc] Clean up input processing (#17582 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 08:11:53 -07:00
Harry Mellor	785d75a03b	Automatically tell users that dict args must be valid JSON in CLI (#17577 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-02 05:24:55 -07:00
Andrew Sansom	cc2a77d7f1	[Core] [Bugfix] Add Input Embeddings (#15428 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: 临景 <linjing.yx@alibaba-inc.com> Co-authored-by: Bryce1010 <bryceyx@gmail.com> Co-authored-by: Nan2018 <nan@protopia.ai> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 01:06:39 -07:00
Jerry Zhang	109e15a335	Add `pt_load_map_location` to allow loading to cuda (#16869 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-05-01 23:23:42 -07:00
Chen Xia	61c299f81f	[Misc]add configurable cuda graph size (#17201 ) Signed-off-by: CXIAAAAA <cxia0209@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-01 11:04:50 -07:00
Harry Mellor	6768ff4a22	Move the last arguments in `arg_utils.py` to be in their final groups (#17531 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-01 10:31:44 -07:00
Chauncey	98060b001d	[Feature][Frontend]: Deprecate --enable-reasoning (#17452 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-05-01 06:46:16 -07:00
Harry Mellor	a257d9bccc	Improve configs - `ObservabilityConfig` (#17453 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-01 03:52:05 -07:00
Alec	0be6d05b5e	[V1][Metrics] add support for kv event publishing (#16750 ) Signed-off-by: alec-flowers <aflowers@nvidia.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-04-30 07:44:45 -07:00
Harry Mellor	13698db634	Improve configs - `ModelConfig` (#17130 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-30 10:38:22 +08:00
Gabriel Marinho	1c2bc7ead0	Truncation control for embedding models (#14776 ) Signed-off-by: Gabriel Marinho <gmarinho@ibm.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com>	2025-04-30 09:24:57 +08:00
Harry Mellor	a6977dbd15	Simplify (and fix) passing of guided decoding backend options (#17008 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-29 19:02:23 +00:00
Harry Mellor	2ef5d106bb	Improve literal dataclass field conversion to argparse argument (#17391 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-29 16:25:08 +00:00
Hyogeun Oh (오효근)	193e78e35d	[Fix] Documentation spacing in compilation config help text (#17342 ) Signed-off-by: Zerohertz <ohg3417@gmail.com>	2025-04-29 00:16:17 -07:00
Cyrus Leung	ebb3930d28	[Misc] Move config fields to MultiModalConfig (#17343 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-29 06:37:21 +00:00
Harry Mellor	f94886946e	Improve conversion from dataclass configs to argparse arguments (#17303 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-28 16:22:12 +00:00
Cyrus Leung	aec9674dbe	[Core] Remove legacy input mapper/processor from V0 (#15686 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-28 15:38:48 +08:00
Lucas Wilkinson	d8bccde686	[BugFix] Fix vllm_flash_attn install issues (#17267 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-04-27 17:27:56 -07:00
Cyrus Leung	4213475ec7	[Metrics] Fix minor inconsistencies in bucket progression (#17262 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-27 16:19:39 +00:00
Flex Wang	18445edd0f	[Misc] Change buckets of histogram_iteration_tokens to [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens (#17033 ) Signed-off-by: sfc-gh-zhwang <flex.wang@snowflake.com>	2025-04-27 12:30:53 +00:00
changjun.lee	10fd1d7380	[Bugfix] fix error due to an uninitialized tokenizer when using `skip_tokenizer_init` with `num_scheduler_steps` (#9276 ) Signed-off-by: changjun.lee <pord7457@gmail.com>	2025-04-26 11:51:17 -04:00
rasmith	68af5f6c5c	[AMD][FP8][BugFix] Remove V1 check in arg_utils.py for FP8 since it is not necessary (#17215 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-04-25 19:55:05 -07:00
Benjamin Chislett	a0e619e62a	[V1][Spec Decode] EAGLE-3 Support (#16937 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Co-authored-by: Bryan Lu <yuzhelu@amazon.com>	2025-04-25 15:43:07 -07:00
rasmith	a41351f363	[Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (#15734 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com>	2025-04-25 00:45:02 -07:00
Harry Mellor	6ca0234478	Move missed `SchedulerConfig` args into scheduler config group in `EngineArgs` (#17131 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 22:48:53 -07:00
Yinghai Lu	fe92176321	Add collective_rpc to llm engine (#16999 ) Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai>	2025-04-24 20:16:52 +00:00
Harry Mellor	0fa939e2d1	Improve configs - `LoRAConfig` + `PromptAdapterConfig` (#16980 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 10:29:34 -07:00

1 2 3 4 5 ...

687 Commits