xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-27 08:01:19 +08:00

Author	SHA1	Message	Date
ElizaWszola	9fb2d22032	[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE (#20762 ) Signed-off-by: ElizaWszola <ewszola@redhat.com>	2025-07-17 09:56:44 -04:00
Harry Mellor	2d6a38209b	[Docs] Move code block out of admonition now that it's short (#21118 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-17 06:12:29 -07:00
wangxiyuan	89e3c4e9b4	[Misc] Avoid unnecessary import (#21106 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-17 12:57:41 +00:00
Harry Mellor	fe8a2c544a	[Docs] Improve docstring formatting for `FusedMoEParallelConfig.make` (#21117 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-17 04:13:00 -07:00
kYLe	4ef00b5cac	[VLM] Add Nemotron-Nano-VL-8B-V1 support (#20349 ) Signed-off-by: Kyle Huang <kylhuang@nvidia.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-07-17 03:07:55 -07:00
Asher	5a7fb3ab9e	[Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820 ) Signed-off-by: Asher Zhang <asherszhang@tencent.com>	2025-07-17 09:10:09 +00:00
Varun Sundar Rabindranath	11dfdf21bf	[Kernel] DeepGemm MoE : Integrate triton permute / unpermute kernels (#20903 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-17 08:10:37 +00:00
Chauncey	fdc5b43d20	[Bugfix]: Fix final_res_batch list index out of range error (#21055 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-07-17 00:29:09 -07:00
Jee Jee Li	c5b8b5953a	[Misc] Fix PhiMoE expert mapping (#21085 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-17 05:47:49 +00:00
David Ben-David	4fcef49ec4	[V1] [KVConnector] Fix MultiprocExecutor worker output aggregation (#21048 ) Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com>	2025-07-17 13:29:45 +08:00
Zhonghua Deng	8a4e5c5f3c	[V1][P/D]Enhance Performance and code readability for P2pNcclConnector (#20906 ) Signed-off-by: Abatom <abzhonghua@gmail.com>	2025-07-16 22:13:00 -07:00
Lucas Wilkinson	76b494444f	[Attention] Refactor attention metadata builder interface (#20466 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-07-17 04:44:25 +00:00
Michael Goin	28a6d5423d	[Bugfix] Fix Machete zero point issue for GPTQ models on SM90 (#21066 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-16 19:54:45 -07:00
XiongfeiWei	58760e12b1	[TPU] Start using python 3.12 (#21000 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-07-16 19:37:44 -07:00
Michael Goin	a50d918225	[Docker] Allow FlashInfer to be built in the ARM CUDA Dockerfile (#21013 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-16 19:37:13 -07:00
Kevin_Xiong	c9ba8104ed	[Bugfix] weight loading use correct tp_group with patch_tensor_parallel_group (#21024 ) Signed-off-by: KevinXiong-C <kevin_xiong1997@outlook.com>	2025-07-16 19:36:36 -07:00
Michael Goin	4e7dfbe7b4	Update PyTorch to `torch==2.7.1` for CUDA (#21011 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-17 02:30:44 +00:00
QiliangCui	72ad273582	Remove torch_xla.tpu.version() from pallas.py. (#21065 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-07-17 00:25:26 +00:00
Nir David	01513a334a	Support FP8 Quantization and Inference Run on Intel Gaudi (HPU) using INC (Intel Neural Compressor) (#12010 ) Signed-off-by: Nir David <ndavid@habana.ai> Signed-off-by: Uri Livne <ulivne@habana.ai> Co-authored-by: Uri Livne <ulivne@habana.ai>	2025-07-16 15:33:41 -04:00
Cyrus Leung	ac2bf41e53	[Model] Remove model sampler (#21059 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-16 19:03:37 +00:00
Harry Mellor	a931b4cdcf	Remove Qwen Omni workaround that's no longer necessary (#21057 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-16 16:25:23 +00:00
Avshalom Manevich	a0f8a79646	[fix] fix qwen image_embeds input (#21049 ) Signed-off-by: h-avsha <avshalom.manevich@hcompany.ai>	2025-07-16 15:17:20 +00:00
Mac Misiura	18bdcf4113	feat - add a new endpoint `get_tokenizer_info` to provide tokenizer/chat-template information (#20575 ) Signed-off-by: m-misiura <mmisiura@redhat.com>	2025-07-16 21:52:14 +08:00
Cyrus Leung	1c3198b6c4	[Model] Consolidate pooler implementations (#20927 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-16 13:39:13 +00:00
Michael Yao	260127ea54	[Docs] Add intro and fix 1-2-3 list in frameworks/open-webui.md (#19199 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-07-16 06:11:38 -07:00
Seiji Eicher	d0dc4cfca4	Fix inadvertently silenced PP tests for `mp`, add DeepSeek V2/V3 model family to PP tests (#20831 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-07-16 00:14:49 -07:00
Lucas Wilkinson	d31a647124	[BugFix] Fix import error on non-blackwell machines (#21020 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-07-15 22:27:29 -07:00
Chengji Yao	85431bd9ad	[TPU] fix kv_cache_update kernel block size choosing logic (#21007 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-07-16 04:39:48 +00:00
zhiweiz	c11013db8b	[Meta] Llama4 EAGLE Support (#20591 ) Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: qizixi <qizixi@meta.com>	2025-07-15 21:14:15 -07:00
Peter Pan	1eb2b9c102	[CI] update typos config for CI pre-commit and fix some spells (#20919 ) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>	2025-07-15 21:12:40 -07:00
Maximilien de Bayser	6ebf313790	Avoid direct comparison of floating point numbers (#21002 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-07-15 21:12:14 -07:00
Patrick von Platen	cfbcb9ed87	[Voxtral] Add more tests (#21010 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-15 21:11:49 -07:00
Wentao Ye	76ddeff293	[Doc] Remove duplicate docstring (#21012 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-15 20:09:13 -07:00
Michael Goin	f46098335b	[Bugfix] Fix Mistral3 support on SM100/SM120 (#20998 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-15 20:08:41 -07:00
Chendi.Xue	e9534c7202	[CI][HPU] update for v0 deprecate by switching to VLLM_TARGET_DEVICE=empty (#21006 ) Signed-off-by: Chendi.Xue <chendi.xue@intel.com>	2025-07-15 20:07:05 -07:00
Doug Smith	7976446015	Add Dockerfile argument for VLLM_USE_PRECOMPILED environment (#20943 ) Signed-off-by: dougbtv <dosmith@redhat.com>	2025-07-15 19:53:57 -07:00
Ming Yang	fcb9f879c1	[Bugfix] Correct per_act_token in CompressedTensorsW8A8Fp8MoECutlassM… (#20937 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-07-15 19:53:42 -07:00
Ricardo Decal	3ed94f9d0a	[Docs] Enhance Anyscale documentation, add quickstart links for vLLM (#21018 ) Signed-off-by: Ricardo Decal <rdecal@anyscale.com>	2025-07-15 19:46:56 -07:00
Reid	fa839565f2	[Misc] Refactor: Improve argument handling for `conda` command (#20481 ) Signed-off-by: reidliu41 <reid201711@gmail.com>	2025-07-15 19:43:19 -07:00
Brayden Zhong	75a99b98bf	[Chore] Remove outdated transformers check (#20989 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-07-15 19:42:40 -07:00
Chauncey	b5c3b68359	[Misc] bump xgrammar version to v0.1.21 (#20992 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-07-15 19:42:16 -07:00
Thomas Parnell	6cbc4d4bea	[Model] Add ModelConfig class for GraniteMoeHybrid to override default max_seq_len_to_capture (#20923 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-07-15 19:19:10 -07:00
Michael Goin	153c6f1e61	[Frontend] Remove print left in FrontendArgs.add_cli_args (#21004 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-15 19:18:41 -07:00
Chauncey	34cda778a0	[Frontend] OpenAI Responses API supports input image (#20975 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-07-15 18:59:36 -06:00
Elfie Guo	30800b01c2	[Nvidia] Integrate SM100 cudnn prefill API to MLA prefill (#20411 ) Signed-off-by: Elfie Guo <elfieg@nvidia.com> Co-authored-by: Elfie Guo <eflieg@nvidia.com>	2025-07-15 17:56:45 -07:00
Chen LI	10be209493	[Bug Fix] get_distributed_init_method should get the ip from get_ip i… (#20889 ) Signed-off-by: Chen Li <lcpingping@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-07-15 21:23:52 +00:00
Marko Rosenmueller	19c863068b	[Frontend] Support cache_salt in /v1/completions and /v1/responses (#20981 ) Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>	2025-07-15 21:01:04 +00:00
Tuan, Hoang-Trong	f29fd8a7f8	[BugFix] fix 3 issues: (1) using metadata for causal-conv1d, (2) indexing overflow in v1 vLLM, and (3) init_states in v0 (#20838 ) Signed-off-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com> Co-authored-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>	2025-07-15 16:08:26 -04:00
Gregory Shtrasberg	ed10f3cea1	[ROCm] warpSize is being made non constexpr in ROCm 7.0 (#20330 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-07-15 14:01:44 -04:00
Harry Mellor	b637e9dcb8	Add full serve CLI reference back to docs (#20978 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-15 17:42:30 +00:00

... 2 3 4 5 6 ...

7928 Commits