xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-22 10:47:19 +08:00

Author	SHA1	Message	Date
Michael Goin	ed50f46641	[Bugfix] Enable V1 usage stats (#16986 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-23 19:54:00 -07:00
Woosuk Kwon	46e678bcff	[Minor] Use larger batch sizes for A100/B100/B200/MI300x (#17073 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-23 19:18:59 -07:00
Chen Xia	6b2427f995	[Quantization]add prefix for commandA quantized model (#17017 )	2025-04-23 17:32:40 -07:00
Sangyeon Cho	b07d741661	[CI/Build] workaround for CI build failure (#17070 ) Signed-off-by: csy1204 <josang1204@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-04-23 16:14:18 -07:00
Woosuk Kwon	41fb013d29	[V1][Spec Decode] Always use argmax for sampling draft tokens (#16899 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-23 14:57:43 -07:00
Yong Hoon Shin	32d4b669d0	[BugFix][V1] Fix int32 token index overflow when preparing input ids (#16806 )	2025-04-23 12:12:35 -07:00
Travis Johnson	3cde34a4a4	[Frontend] Support guidance:no-additional-properties for compatibility with xgrammar (#15949 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2025-04-23 18:34:41 +00:00
Harry Mellor	bdb3660312	Use `@property` and private field for `data_parallel_rank_local` (#17053 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-23 08:50:08 -07:00
Harry Mellor	f3a21e9c68	`CacheConfig.block_size` should always be `int` when used (#17052 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-23 08:50:05 -07:00
Harry Mellor	8e630d680e	Improve Transformers backend model loading QoL (#17039 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-23 07:33:51 -07:00
Russell Bryant	af869f6dff	[CI] Update structured-output label automation (#17055 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-23 07:33:14 -07:00
Harry Mellor	53c0fa1e25	Ensure that `pid` passed to `kill_process_tree` is `int` for `mypy` (#17051 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-23 07:32:26 -07:00
Michael Yao	f7912cba3d	[Doc] Add top anchor and a note to quantization/bitblas.md (#17042 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-04-23 07:32:16 -07:00
Michael Goin	6317a5174a	Categorize `tests/kernels/` based on kernel type (#16799 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-23 09:21:07 -04:00
Michael Goin	aa72d9a4ea	Mistral-format support for compressed-tensors (#16803 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-23 08:46:23 -04:00
Russell Bryant	ce17db8085	[CI] Run v1/test_serial_utils.py in CI (#16996 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-23 01:13:34 -07:00
Chauncey	8c87a9ad46	[Bugfix] Fix AssertionError: skip_special_tokens=False is not supported for Mistral tokenizers (#16964 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-04-23 07:24:09 +00:00
huafeng	ec69124eb4	[Misc] Improve readability of get_open_port function. (#17024 ) Signed-off-by: gitover22 <qidizou88@gmail.com>	2025-04-23 06:16:53 +00:00
Lucas Wilkinson	d0da99fb70	[BugFix] llama4 fa3 fix - RuntimeError: scheduler_metadata must have shape (metadata_size) (#16998 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-04-22 21:49:24 -07:00
Nick Hill	b2f195c429	[V1] Avoid socket errors during shutdown when requests are in in-flight (#16807 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-23 12:36:29 +08:00
vllmellm	047797ef90	[Bugfix] Triton FA function takes no keyword arguments (#16902 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-04-22 21:35:24 -07:00
Reid	eb8ef4224d	[doc] add download path tips (#17013 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-23 04:06:30 +00:00
Chendi.Xue	56a735261c	[INTEL-HPU][v0] Port delayed sampling to upstream (#16949 ) Signed-off-by: Michal Adamczyk <michal.adamczyk@intel.com> Signed-off-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: Michal Adamczyk <madamczyk@habana.ai>	2025-04-22 20:14:11 -07:00
youkaichao	e1cf90e099	[misc] tune some env vars for GB200 (#16992 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-04-23 10:59:48 +08:00
Chauncey	6bc1e30ef9	Revert "[Misc] Add S3 environment variables for better support of MinIO." (#17021 )	2025-04-22 19:22:29 -07:00
vllmellm	7e081ba7ca	[BugFix] Revert ROCm Custom Paged Attention Env Flag Check (#17022 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-04-22 19:17:48 -07:00
Nick Hill	1e013fa388	[V1][DP] More robust DP/EP dummy request coordination (#16277 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-22 19:12:15 -07:00
Aleksandr Malyshev	bc7c4d206b	[Kernel][ROCM] Upstream prefix prefill speed up for vLLM V1 (#13305 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com> Signed-off-by: maleksan85 <maleksan@amd.com> Signed-off-by: <> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: qli88 <qiang.li2@amd.com> Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>	2025-04-22 19:11:56 -07:00
Yang Wang	f67e9e9f22	add Dockerfile build vllm against torch nightly (#16936 ) Signed-off-by: Yang Wang <elainewy@meta.com>	2025-04-22 19:08:27 -07:00
Guillaume Calmettes	36fe78769f	[Bugfix] validate urls object for multimodal content parts (#16990 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-04-23 09:43:06 +08:00
Chenyaaang	83d933718c	[Core][V1][TPU] Enable structured decoding on TPU V1 (#16499 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-04-22 18:05:23 -06:00
Nick Hill	5175b884f7	[BugFix] Remove default multiproc executor `collective_rpc` timeout (#17000 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-22 23:27:14 +00:00
Alexei-V-Ivanov-AMD	5536b30a4c	Fencing Kernels Tests for enabling on AMD (#16929 ) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>	2025-04-22 09:32:40 -07:00
Richard Zou	7f58fb9718	Add assertion for no objects while hashing hf_config (#16930 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-04-22 09:32:22 -07:00
vllmellm	30bc3e0f66	[FEAT][ROCm]: Support AITER MLA (#15893 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: qli88 <qiang.li2@amd.com>	2025-04-22 09:31:13 -07:00
Reid	f34410715f	[frontend] enhance tool_calls type check (#16882 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-22 15:40:24 +00:00
Chauncey	68d4c33202	[Misc] Add S3 environment variables for better support of MinIO. (#16977 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-04-22 14:27:36 +00:00
Zhengyuan Su (苏政渊)	f961d7f6ef	[BugFix] Pass in correct VLLM config in FlashInfer backend (#13207 ) (#16973 ) Signed-off-by: 苏政渊 <suzhengyuan@moonshot.cn> Co-authored-by: 苏政渊 <suzhengyuan@moonshot.cn>	2025-04-22 06:44:10 -07:00
Harry Mellor	d059110498	Improve configs - `SpeculativeConfig` (#16971 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-22 12:55:36 +00:00
Yang Fan	571e8dd65e	[Bugfix] Fix distributed bug again in Qwen2.5-VL & Qwen2.5-Omni (#16974 ) Signed-off-by: fyabc <suyang.fy@alibaba-inc.com>	2025-04-22 12:23:17 +00:00
Reid	4b91c927f6	[Misc] refactor example series (#16972 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-22 11:44:21 +00:00
vllmellm	0e237f0035	[FEAT][ROCm] Integrate Paged Attention Kernel from AITER (#15001 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-04-22 02:46:28 -07:00
Cyrus Leung	8f7bace7c3	[Doc] Improve documentation for multimodal CLI args (#16960 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-22 08:35:35 +00:00
Nick Hill	e4d6144232	[BugFix] Fix incremental detokenization perf issue (#16963 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-22 08:16:19 +00:00
Lei Wang	8d32dc603d	[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036 ) Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com> Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com>	2025-04-22 09:01:36 +01:00
Woosuk Kwon	c4ab9f3e71	[V1] Remove pre-allocation for KV cache (#16941 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-22 00:52:18 -07:00
Flora Feng	2689d5c027	[Model] Use autoweightloader for mamba (#16950 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2025-04-22 07:48:15 +00:00
Chauncey	acba33a0f1	[Bugfix] Fix the issue where llm.generate cannot be called repeatedly after setting GuidedDecodingParams (#16767 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-04-22 06:02:20 +00:00
SnowCharm	a114bf20a3	[Perf] Optimize `_update_states` for GPU model runner (#16910 ) Signed-off-by: snowcharm <snowcharmqq@gmail.com>	2025-04-22 14:01:54 +08:00
Michael Yao	3097ce3a32	[Doc] Update ai_accelerator/hpu-gaudi.inc.md (#16956 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-04-22 05:33:27 +00:00

1 2 3 4 5 ...

6001 Commits