xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-18 08:46:58 +08:00

Author	SHA1	Message	Date
bnellnm	f9c069c85e	Modularize fused experts and integrate PPLX kernels (#15956 )	2025-05-14 13:11:54 -07:00
Nick Hill	59dd311cf5	[KVConnector] Keep KVTransferParams as a dict (#18033 )	2025-05-14 08:05:57 -07:00
Cyrus Leung	d066e52013	[Bugfix] Fix chat utils tests (#18139 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-14 05:38:21 -07:00
Cyrus Leung	d62a076e84	[Model] GritLM supports other attention backends (#18109 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-14 03:33:19 -07:00
Jee Jee Li	259127f8b8	[Bugfix] Fix LoRA test (#18123 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-14 10:25:47 +00:00
TJian	612c2edb4f	[FEAT] [ROCm]: Add AITER CK 2 Stages MoE support (#17110 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-05-14 03:03:11 -07:00
rongfu.leng	82e7f9bb03	[Misc] replace does not exist model (#18119 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-05-14 02:13:47 -07:00
Cyrus Leung	8f5dc41481	[Bugfix] Fix entrypoints audio test failure (#18111 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-14 09:08:07 +00:00
wang.yuqi	63ad622233	[New Model]: support GTE NewModel (#17986 )	2025-05-14 01:31:31 -07:00
lkchen	6685890d11	[Fix] Move "model_config" as keyword args in chat_utils.py (#18098 ) Signed-off-by: Linkun <github@lkchen.net>	2025-05-13 23:27:26 -07:00
Charlie Fu	7b2f28deba	[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm (#18082 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-05-13 22:13:56 -07:00
vllmellm	2d912fb66f	[FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 (#17955 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-13 22:03:47 -07:00
Chen Zhang	f2ae883b67	[v1][KVCacheManager] pass num_new_computed_tokens to kv cache manager (#18001 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-13 19:09:39 -07:00
vllmellm	40de1ef455	[FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature (#14968 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-13 19:08:20 -07:00
Nick Hill	55aa7af994	[V1] DP scale-out (2/N): Decouple engine process management and comms (#15977 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-13 10:48:21 -07:00
Aaron Pham	cb528d0585	[Fix] check to make sure processor has chat templates (#18047 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-05-13 03:04:10 -07:00
Michael Goin	ea6ae8cb45	[Bugfix] Fix marlin moe fallback logic for llama4 (#18042 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-13 07:53:28 +00:00
Chen Zhang	f0d610a8ae	[v1][KVCacheManager] Avoid full cache hit by controlling max_length (#17999 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-05-13 06:50:38 +00:00
Chauncey	dc1a821768	[Feature][V1] Support `tool_choice: required` when using Xgrammar as the `StructuredOutputBackend`. (#17845 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-05-12 23:01:31 -07:00
hissu-hyvarinen	f6518b2b48	[ROCm] Skip tests for quantizations incompatible with ROCm (#17905 ) Signed-off-by: Hissu Hyvarinen <hissu.hyvarinen@amd.com>	2025-05-12 18:39:28 -06:00
bwshen-mi	acee8f48aa	[Model] Support MiMo-7B inference with MTP (#17433 ) Signed-off-by: wp-alpha <wangpeng66@xiaomi.com> Co-authored-by: wangpeng66 <wangpeng66@xiaomi.com>	2025-05-12 23:25:33 +00:00
wwl2755	dc9905368d	[V1][Spec Decode] Eagle unit tests (#17350 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-05-12 23:01:17 +00:00
Russell Bryant	ebab1ac37c	[CI] Make JSON output tests less likely to fail (#17859 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-12 22:31:54 +00:00
Jonathan Berkhahn	98ea35601c	[Lora][Frontend]Add default local directory LoRA resolver plugin. (#16855 ) Signed-off-by: jberkhahn <jaberkha@us.ibm.com>	2025-05-12 10:39:10 -07:00
Robert Shaw	d19110204c	[P/D] NIXL Integration (#17751 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Brent Salisbury <bsalisbu@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Brent Salisbury <bsalisbu@redhat.com>	2025-05-12 09:46:16 -07:00
Maximilien de Bayser	05a4324f8e	Initialize the delta tool call fields explicitly (#17340 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: igmainc <igmainc@icloud.com>	2025-05-12 13:28:58 +00:00
Cheng Kuan Yong Jason	08bf784078	[Bugfix] validate grammar and throw 400 error instead of crashing the engine when xgrammar validation fails (#17623 ) Signed-off-by: Jason Cheng <jasoncky96@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-05-12 09:06:10 +08:00
Isotr0py	021c16c7ca	[Model] Broadcast Ovis2 implementation to fit Ovis1.6 (#17861 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-11 17:56:30 -07:00
TJian	a810b5b088	[BugFix] [ROCm]: Bugfix and handle addition case of input for `rocm_aiter_rms_norm` (#17857 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-11 04:17:11 -07:00
wang.yuqi	e4b8713380	[New Model]: nomic-embed-text-v2-moe (#17785 )	2025-05-11 00:59:43 -07:00
Dipika Sikka	cd3edfc908	[Misc] Add compressed-tensors NVFP4A16 emulation support (#17914 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Dipika <dipikasikka1@gmail.com>	2025-05-11 15:58:38 +08:00
Frieda Huang	9cea90eab4	[Frontend] Add /classify endpoint (#17032 ) Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>	2025-05-11 07:57:07 +00:00
Ben Browning	8132365b74	[Bugfix]: v1 engine - consider lora adapters in allowed_token_ids (#17855 ) Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-05-11 00:53:58 -07:00
Jinzhen Lin	d74e5f37bc	[Kernel] fp4 marlin kernel (#17687 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>	2025-05-10 19:58:49 -07:00
Chen Zhang	ca66a1674c	[v1] Rename specialized_manager.py to single_type_kv_cache_manager.py (#17946 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-10 16:14:12 -07:00
Chen Zhang	950751a987	[v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders (#17483 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-10 16:12:04 -07:00
Ximo Guanter	fc4441a4ee	Add missing content type headers to /ping and /health (#17036 ) (#17786 ) Signed-off-by: Ximo Guanter <ximo.guanter@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-10 07:13:32 +01:00
tracelogfb	246e3e0a36	fix broken test vllm:test_kernels - test_attention_selector.py::test_flash_attn (#17873 ) Co-authored-by: Stephen Chen <tracelog@meta.com>	2025-05-10 10:46:54 +08:00
Pavani Majety	0c0fdae84f	[Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model (#16362 )	2025-05-09 16:24:41 -07:00
Harry Mellor	4b2ed7926a	Improve configs - the rest! (#17562 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-09 15:18:44 -07:00
Cyrus Leung	6e5595ca39	[CI/Build] Automatically retry flaky tests (#17856 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-09 09:55:17 -06:00
Chen Zhang	200da9a517	[v1] Move block management logic from KVCacheManager to SpecializedManager (#17474 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-09 15:25:34 +00:00
Harry Mellor	c6798baa9c	Change `top_k` to be disabled with `0` (still accept `-1` for now) (#17773 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-09 10:01:49 +00:00
Ning Xie	d310e6de98	[BUGFIX]: return fast when request requires prompt logprobs (#17251 )	2025-05-08 21:25:41 -07:00
vllmellm	3c9396a64f	[FEAT][ROCm]: Support AITER MLA on V1 Engine (#17523 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: qli88 <qiang.li2@amd.com> Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>	2025-05-09 10:42:05 +08:00
Shu Wang	376786fac1	Add cutlass support for blackwell fp8 blockwise gemm (#14383 ) Signed-off-by: Shu Wang <shuw@nvidia.com>	2025-05-08 15:09:55 -07:00
Russell Bryant	ec54d73c31	[CI] Fix test_collective_rpc (#17858 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-08 16:47:12 +00:00
fxmarty-amd	bb239a730f	[Bugfix] Fix quark fp8 format loading on AMD GPUs (#12612 ) Signed-off-by: Felix Marty <felmarty@amd.com> Signed-off-by: kewang2 <kewang2@amd.com> Co-authored-by: kewang2 <kewang2@amd.com>	2025-05-08 02:53:53 -07:00
Jevin Jiang	a463555dee	[TPU] Fix the test_sampler (#17820 )	2025-05-08 05:51:33 -04:00
Cyrus Leung	96722aa81d	[Frontend] Chat template fallbacks for multimodal models (#17805 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-07 23:05:54 -07:00

... 31 32 33 34 35 ...

3537 Commits