xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-27 17:57:20 +08:00

Author	SHA1	Message	Date
Rui Qiao	217937221b	Elastic Expert Parallel Initial Support (#20775 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-07-18 17:46:09 -07:00
hax0r31337	5782581acf	[Bugfix] Voxtral on Blackwell GPUs (RTX 50 series) (#21077 ) Signed-off-by: hax0r31337 <liulihaocaiqwq@gmail.com>	2025-07-18 18:40:18 -04:00
JialinOuyang-Meta	0f199f197b	[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue (#21005 ) Signed-off-by: Jialin Ouyang <jialino@meta.com>	2025-07-18 12:34:40 -07:00
Richard Zou	b2eb2b5ad7	[Kernel] Apply torch.Tag.needs_fixed_stride_order only for torch==2.6.0 (#19346 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-07-18 14:10:21 -04:00
Richard Zou	21274ab476	[CI] Update CODEOWNERS for vllm/compilation (#21185 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2025-07-18 06:51:12 -07:00
Thomas Parnell	ed8cbfedf8	Let GraniteMoeAttention use YaRN (#21174 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-07-18 05:52:52 -07:00
Cyrus Leung	45badd05d0	[Core] Set pooling params based on task and model (#21128 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-18 05:41:17 -07:00
ElizaWszola	4adc66f64d	[Bugfix] Allocate less memory in non-batched CUTLASS MoE (#21121 ) Signed-off-by: ElizaWszola <ewszola@redhat.com>	2025-07-18 18:55:52 +08:00
Cyrus Leung	55ad648715	[Doc] Fix typo in model name (#21178 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-18 03:55:10 -07:00
wang.yuqi	5895afd780	[Bugfix] The special_tokens in tokenizer should also be controlled by do_lower_case in encoder_config. (#20750 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-07-18 09:10:47 +00:00
wang.yuqi	ca4eb82bcb	[Model] Re-add the implicit conversion feature for as_seq_cls_model (#21103 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-07-18 07:15:07 +00:00
Roger Wang	ba2dfbb0c2	[Misc] Make MM embedding merge interface explicit in model runner (#21147 ) Signed-off-by: Roger Wang <hey@rogerw.me> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-18 07:13:57 +00:00
Jialin Ouyang	1bf65138f6	[benchmark] Sending request strictly follows the random intervals (#21108 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-07-18 06:22:08 +00:00
Woosuk Kwon	54cf1cae62	[Misc] Do not print async output warning for v1 (#21151 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-17 21:57:02 -07:00
shixianc	5780121c95	[Perf] Add swap_ab to SM90 FP8 non-block CUTLASS moe grouped gemm (#20911 ) Signed-off-by: Shixian Cui <shixian@amazon.com> Co-authored-by: Shixian Cui <shixian@amazon.com>	2025-07-18 04:34:43 +00:00
Shu Wang	c7d8724e78	[Core] FlashInfer CUTLASS fused MoE backend (NVFP4) (#20037 ) Signed-off-by: shuw <shuw@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-17 21:32:45 -07:00
22quinn	b38baabcf9	[Doc] Add inplace weights loading example (#19640 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-07-17 21:12:23 -07:00
Lucas Wilkinson	89cab4d01f	[Attention] Make local attention backend agnostic (#21093 )	2025-07-18 00:10:42 -04:00
Lucia Fang	b9a21e9173	[Docs] Update supported models documentation with missing models (#20844 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-07-17 20:12:13 -07:00
Ricardo Decal	c4e3b12524	[Docs] Add minimal demo of Ray Data API usage (#21080 ) Signed-off-by: Ricardo Decal <rdecal@anyscale.com>	2025-07-17 20:09:19 -07:00
elvischenv	8dfb45ca33	[Bugfix] Fix the tensor non-contiguous issue for Flashinfer TRT-LLM backend attention kernel (#21133 )	2025-07-18 00:35:58 +00:00
Wentao Ye	8a8fc94639	[Log] Debugging Log with more Information (#20770 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-18 00:19:46 +00:00
Woosuk Kwon	4de7146351	[V0 deprecation] Remove V0 HPU backend (#21131 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-17 16:37:36 -07:00
Eric Curtin	ac9fb732a5	On environments where numa cannot be detected we get 0 (#21115 ) Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-07-17 18:52:17 +00:00
Jee Jee Li	a3a6c695f4	[Misc] Qwen MoE model supports LoRA (#20932 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-17 18:32:52 +00:00
Cyrus Leung	90bd2ab6e3	[Model] Update pooling model interface (#21058 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-17 16:05:40 +00:00
ElizaWszola	9fb2d22032	[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE (#20762 ) Signed-off-by: ElizaWszola <ewszola@redhat.com>	2025-07-17 09:56:44 -04:00
Harry Mellor	2d6a38209b	[Docs] Move code block out of admonition now that it's short (#21118 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-17 06:12:29 -07:00
wangxiyuan	89e3c4e9b4	[Misc] Avoid unnecessary import (#21106 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-17 12:57:41 +00:00
Harry Mellor	fe8a2c544a	[Docs] Improve docstring formatting for `FusedMoEParallelConfig.make` (#21117 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-17 04:13:00 -07:00
kYLe	4ef00b5cac	[VLM] Add Nemotron-Nano-VL-8B-V1 support (#20349 ) Signed-off-by: Kyle Huang <kylhuang@nvidia.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-07-17 03:07:55 -07:00
Asher	5a7fb3ab9e	[Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820 ) Signed-off-by: Asher Zhang <asherszhang@tencent.com>	2025-07-17 09:10:09 +00:00
Varun Sundar Rabindranath	11dfdf21bf	[Kernel] DeepGemm MoE : Integrate triton permute / unpermute kernels (#20903 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-17 08:10:37 +00:00
Chauncey	fdc5b43d20	[Bugfix]: Fix final_res_batch list index out of range error (#21055 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-07-17 00:29:09 -07:00
Jee Jee Li	c5b8b5953a	[Misc] Fix PhiMoE expert mapping (#21085 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-17 05:47:49 +00:00
David Ben-David	4fcef49ec4	[V1] [KVConnector] Fix MultiprocExecutor worker output aggregation (#21048 ) Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com>	2025-07-17 13:29:45 +08:00
Zhonghua Deng	8a4e5c5f3c	[V1][P/D]Enhance Performance and code readability for P2pNcclConnector (#20906 ) Signed-off-by: Abatom <abzhonghua@gmail.com>	2025-07-16 22:13:00 -07:00
Lucas Wilkinson	76b494444f	[Attention] Refactor attention metadata builder interface (#20466 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-07-17 04:44:25 +00:00
Michael Goin	28a6d5423d	[Bugfix] Fix Machete zero point issue for GPTQ models on SM90 (#21066 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-16 19:54:45 -07:00
XiongfeiWei	58760e12b1	[TPU] Start using python 3.12 (#21000 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-07-16 19:37:44 -07:00
Michael Goin	a50d918225	[Docker] Allow FlashInfer to be built in the ARM CUDA Dockerfile (#21013 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-16 19:37:13 -07:00
Kevin_Xiong	c9ba8104ed	[Bugfix] weight loading use correct tp_group with patch_tensor_parallel_group (#21024 ) Signed-off-by: KevinXiong-C <kevin_xiong1997@outlook.com>	2025-07-16 19:36:36 -07:00
Michael Goin	4e7dfbe7b4	Update PyTorch to `torch==2.7.1` for CUDA (#21011 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-17 02:30:44 +00:00
QiliangCui	72ad273582	Remove torch_xla.tpu.version() from pallas.py. (#21065 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-07-17 00:25:26 +00:00
Nir David	01513a334a	Support FP8 Quantization and Inference Run on Intel Gaudi (HPU) using INC (Intel Neural Compressor) (#12010 ) Signed-off-by: Nir David <ndavid@habana.ai> Signed-off-by: Uri Livne <ulivne@habana.ai> Co-authored-by: Uri Livne <ulivne@habana.ai>	2025-07-16 15:33:41 -04:00
Cyrus Leung	ac2bf41e53	[Model] Remove model sampler (#21059 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-16 19:03:37 +00:00
Harry Mellor	a931b4cdcf	Remove Qwen Omni workaround that's no longer necessary (#21057 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-16 16:25:23 +00:00
Avshalom Manevich	a0f8a79646	[fix] fix qwen image_embeds input (#21049 ) Signed-off-by: h-avsha <avshalom.manevich@hcompany.ai>	2025-07-16 15:17:20 +00:00
Mac Misiura	18bdcf4113	feat - add a new endpoint `get_tokenizer_info` to provide tokenizer/chat-template information (#20575 ) Signed-off-by: m-misiura <mmisiura@redhat.com>	2025-07-16 21:52:14 +08:00
Cyrus Leung	1c3198b6c4	[Model] Consolidate pooler implementations (#20927 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-16 13:39:13 +00:00

1 2 3 4 5 ...

7804 Commits