xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-22 14:57:17 +08:00

Author	SHA1	Message	Date
lkchen	6685890d11	[Fix] Move "model_config" as keyword args in chat_utils.py (#18098 ) Signed-off-by: Linkun <github@lkchen.net>	2025-05-13 23:27:26 -07:00
Ecthlion_zyy	33011318c2	Fix broken example: examples/offline_inference/profiling at scheduler_config (#18117 )	2025-05-13 23:19:14 -07:00
qli88	4f8b373225	[BugFix][AMD] Compatible patch for AITER lib after 04/20 (#17912 ) Signed-off-by: Qiang Li <qiang.li2@amd.com>	2025-05-13 23:05:20 -07:00
Charlie Fu	7b2f28deba	[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm (#18082 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-05-13 22:13:56 -07:00
vllmellm	2d912fb66f	[FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 (#17955 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-13 22:03:47 -07:00
Michael Goin	12e6c0b41c	[Bugfix][V1] Fix FlashInfer V1 backend using the wrong VllmConfig (#18086 )	2025-05-13 20:36:17 -07:00
Michael Goin	9a2a6357de	[Bugfix] Fix FP8 Marlin MoE and enable for compressed-tensors models (#18026 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-13 19:48:33 -07:00
youkaichao	6266c57bae	[core][distributed] add ep group and all2all interface (#18077 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-05-14 10:46:49 +08:00
Jon Gill	754b699cbe	[Bug]: Fix S3 model/tokenizer path resolution (#18083 ) Signed-off-by: Jon Gill <jon@yurts.ai>	2025-05-13 19:34:17 -07:00
Roger Wang	6e27c6d86b	[Misc] Remove unused numpy tensor (#18084 ) Signed-off-by: Roger Wang <hey@rogerw.me>	2025-05-13 19:33:40 -07:00
Nick Hill	d5af47a149	[P/D] Add some more debug logs to `NixlConnector` (#18102 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-13 19:33:03 -07:00
Pavani Majety	65f0f74b66	[Hardware/NVIDIA/Modelopt] Fix modelopt forward method for v1 torch.compile (#18101 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-05-13 19:33:00 -07:00
Luka Govedič	176a95c670	[Fix] Support CUDAGraph capture for encoder-decoder on ROCm (#18104 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2025-05-13 19:31:42 -07:00
Chen Zhang	f2ae883b67	[v1][KVCacheManager] pass num_new_computed_tokens to kv cache manager (#18001 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-13 19:09:39 -07:00
vllmellm	40de1ef455	[FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature (#14968 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-13 19:08:20 -07:00
Russell Bryant	0189a65a2e	[Docs] Expand security doc with firewall info (#18081 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-13 19:36:00 +00:00
Nick Hill	55aa7af994	[V1] DP scale-out (2/N): Decouple engine process management and comms (#15977 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-13 10:48:21 -07:00
Harry Mellor	0b217da646	Update deprecated type hinting in `vllm/adapter_commons` (#18073 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 08:32:51 -07:00
Harry Mellor	19324d660c	Update deprecated type hinting in `vllm/compilation` (#18072 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 08:32:48 -07:00
Harry Mellor	fc407a1425	Give auto-merge label workflow permission to add labels to issues (#18078 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 07:53:13 -07:00
Harry Mellor	009d9e7590	Convert `benchmarks` to `ruff format` (#18068 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 13:43:29 +00:00
Cyrus Leung	b922c2ebd2	[Bugfix] Fix entrypoints metrics tests (#18063 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-13 06:42:43 -07:00
Russell Bryant	00b14e0f16	[CI] set token permissions for pre-commit CI job (#17729 ) Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	2025-05-13 13:38:30 +00:00
Russell Bryant	54e467e6f8	[CI] Add token permissions for add-ready-label CI job (#17730 ) Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	2025-05-13 13:38:13 +00:00
Russell Bryant	79a1d25bbd	[CI] Add workflow permissions for helm CI job (#17727 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	2025-05-13 12:49:07 +00:00
Russell Bryant	9944011b30	[CI] Set token permissions for reminder comment CI job (#17728 ) Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	2025-05-13 12:46:58 +00:00
Harry Mellor	8c946cecca	Update deprecated type hinting in `vllm/transformers_utils` (#18058 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 04:34:37 -07:00
Harry Mellor	ff334ca1cd	Update deprecated type hinting in `vllm/profiler` (#18057 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 04:34:34 -07:00
Harry Mellor	6223dd8114	Update deprecated type hinting in `model_executor/layers` (#18056 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 04:17:23 -07:00
Reid	906f0598fc	[doc] add download/list/delete HF model CLI usage (#17940 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-13 11:15:51 +00:00
Aaron Pham	cb528d0585	[Fix] check to make sure processor has chat templates (#18047 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-05-13 03:04:10 -07:00
Harry Mellor	98fcba1575	Convert `.buildkite` to `ruff format` (#17656 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 09:28:31 +00:00
Russell Bryant	23b3134eb5	[Benchmarks] Refactor run_structured_output_benchmarks.sh (#17722 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-13 01:47:29 -07:00
Michael Goin	ea6ae8cb45	[Bugfix] Fix marlin moe fallback logic for llama4 (#18042 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-13 07:53:28 +00:00
Woosuk Kwon	2ff297dce9	[BugFix] Set default random seed to 0 for V1 (#17929 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-05-13 07:52:19 +00:00
Jin Huang	8dd0671bac	[Bugfix][V1] Only get input embeddings w/ multi-modal models if first PP (#17916 ) Signed-off-by: Jin Huang <jinhun@amazon.com> Co-authored-by: Jin Huang <jinhun@amazon.com>	2025-05-13 15:10:07 +08:00
Chen Zhang	f0d610a8ae	[v1][KVCacheManager] Avoid full cache hit by controlling max_length (#17999 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-05-13 06:50:38 +00:00
Driss Guessous	e57e4d6e9e	Fix Broken macro for cutlass moe (#18049 ) Signed-off-by: drisspg <drisspguessous@gmail.com>	2025-05-12 23:31:06 -07:00
Nick Hill	ee5be834e7	[BugFix] Fix 4-GPU RLHF tests (#18007 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-12 23:03:55 -07:00
Calvin Chen	48545728d8	cleanup invalid prints (#18050 ) Signed-off-by: calvin chen <120380290@qq.com>	2025-05-12 23:01:57 -07:00
Chauncey	dc1a821768	[Feature][V1] Support `tool_choice: required` when using Xgrammar as the `StructuredOutputBackend`. (#17845 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-05-12 23:01:31 -07:00
Cyrus Leung	61e0a506a3	[Bugfix] Avoid repeatedly creating dummy data during engine startup (#17935 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-12 22:40:19 -07:00
Michael Goin	1df491c522	[Bugfix] Fixes for new marlin moe usage (#18017 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-13 03:50:04 +00:00
Arjun Kathuria	d8487ef557	[ROCm]: Fix build from source failure with gcc14 and ROCm 6.3 (#13779 ) Signed-off-by: Arjun Kathuria <arjun.kathuria8@gmail.com>	2025-05-12 20:36:33 -07:00
Jee Jee Li	c06af9a959	[Misc] Slight spelling modification (#18039 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-12 20:36:27 -07:00
Tao He	60f7624334	Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844 )	2025-05-12 19:52:47 -07:00
hissu-hyvarinen	f6518b2b48	[ROCm] Skip tests for quantizations incompatible with ROCm (#17905 ) Signed-off-by: Hissu Hyvarinen <hissu.hyvarinen@amd.com>	2025-05-12 18:39:28 -06:00
Harry Mellor	d67085c2c8	Remove noisy warnings from `SchedulerConfig` (#17995 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 00:33:45 +00:00
Michael Goin	307939f299	Use NVFP4 Marlin for CompressedTensorsW4A16Fp4 (#18000 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Dipika <dipikasikka1@gmail.com> Co-authored-by: Dipika <dipikasikka1@gmail.com>	2025-05-12 18:07:34 -06:00
Harry Mellor	9d7ea9dbbf	Update some more deprecated type hinting (#17998 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-12 23:49:33 +00:00

1 2 3 4 5 ...

6481 Commits