xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-11 18:57:17 +08:00

Author	SHA1	Message	Date
Hongxia Yang	ed3a1d2106	[ROCm] fix num_stages for default moe config to avoid triton OutOfResource error (#17744 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>	2025-05-07 00:39:48 +00:00
Harry Mellor	022afbeb4e	Fix doc build performance (#17748 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-07 00:36:41 +00:00
Thomas Parnell	2f925e5777	[Kernel] Unified Triton kernel that doesn't distinguish between prefill + decode (#16828 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-06 18:21:48 -04:00
Gregory Shtrasberg	de906b95f9	[Bugfix] Fix for the condition to accept empty encoder inputs for mllama (#17732 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-05-06 19:59:06 +00:00
d.transposed	d456aea71f	[Misc] Add Next Edit Prediction (NEP) datasets support in `benchmark_serving.py` (#16839 ) Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Signed-off-by: dtransposed <> Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>	2025-05-06 15:38:45 -04:00
Jevin Jiang	621ca2c0ab	[TPU] Increase block size and reset block shapes (#16458 )	2025-05-06 13:55:04 -04:00
Harry Mellor	6115b11582	Make right sidebar more readable in "Supported Models" (#17723 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-06 16:48:26 +00:00
Cyrus Leung	5b8c390747	[Bugfix] Fix modality limits in vision language example (#17721 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-06 16:12:28 +00:00
Reid	7525d5f3d5	[doc] Add RAG Integration example (#17692 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-06 16:10:23 +00:00
Chen Zhang	aabcd2cae3	[v1] Introduce KVCacheBlocks as interface between Scheduler and KVCacheManager (#17479 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-06 08:50:34 -07:00
Michael Yao	0d115460a7	[Docs] Use gh-file to add links to tool_calling.md (#17709 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-05-06 15:27:19 +00:00
Aaron Pham	175bda67a1	[Feat] Add deprecated=True to CLI args (#17426 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-05-06 08:11:27 -07:00
Chen Zhang	cba31c47c4	[v1] AttentionMetadata for each layer (#17394 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-06 07:58:37 -07:00
Li, Jiang	a6fed02068	[V1][PP] Support PP for MultiprocExecutor (#14219 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: jiang.li <jiang1.li@intel.com>	2025-05-06 07:58:05 -07:00
Michael Goin	d419aa5dc4	[V1] Enable TPU V1 backend by default (#17673 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-06 06:49:49 -07:00
Mengqing Cao	f9bc5a0693	[Bugfix] Fix triton import with local TritonPlaceholder (#17446 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-05-06 17:53:09 +08:00
Harry Mellor	05e1f96419	Fix `dockerfilegraph` pre-commit hook (#17698 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-06 08:56:48 +00:00
Lucas Wilkinson	6eae34533a	[Misc] Fix ScalarType float4 naming (#17690 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-06 01:07:15 -07:00
Cyrus Leung	63ced7b43f	[Doc] Update notes for H2O-VL and Gemma3 (#17219 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-06 07:51:02 +00:00
Mikhail Podvitskii	dc47ba32f8	[Bugfix] Fixed prompt length for random dataset (#17408 ) Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com>	2025-05-06 07:00:08 +00:00
Richard Zou	edbf2d609e	[easy] Fix logspam on PiecewiseBackend errors (#17138 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-05-05 23:46:11 -07:00
Stan Wozniak	999328be0d	[Model] Add GraniteMoeHybrid 4.0 model (#17497 ) Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com> Co-authored-by: Thomas Ortner <boh@zurich.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>	2025-05-06 12:00:31 +08:00
Michael Goin	98834fefaa	Update nm to rht in doc links + refine fp8 doc (#17678 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-06 00:41:14 +00:00
Varun Sundar Rabindranath	90bd2ae172	[Bugfix] LoRA - Retire unused maxnreg LoRA kernel argument (#17677 )	2025-05-05 17:34:29 -07:00
Nicolò Lucchesi	5941e0b7ea	[TPU][V1] Add support for top-logprobs (#17072 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-05-05 14:20:15 -07:00
XiongfeiWei	9765940824	[TPU] Enable gemma3-27b with TP>1 on multi-chips. (#17335 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-05-05 14:19:58 -07:00
Nick Hill	5ea5c514da	[BugFix] Increase timeout for startup failure test (#17642 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-05 20:53:19 +00:00
Russell Bryant	d3efde8176	[Benchmarks] Remove invalid option under V1 engine (#17651 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-05 16:30:22 -04:00
Thomas J. Fan	aea302be6c	Use git-path commit in hook (#17616 ) Signed-off-by: Thomas J. Fan <thomasjpfan@gmail.com>	2025-05-05 17:55:32 +00:00
Isotr0py	cc05b90d86	[Doc] Fix broken cuda installation doc rendering (#17654 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-05 17:52:40 +00:00
Jinzhen Lin	1d0c9d6b2d	[Kernel] some optimizations for dense marlin and moe marlin (#16850 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>	2025-05-05 09:39:30 -07:00
Tyler Michael Smith	f62cad6431	[Build/CI] Upgrade CUTLASS to 3.9.2 (#17641 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-05-04 19:23:17 -07:00
Chauncey	5394ad7387	[Bugfix] fix KeyError on top logprobs are special tokens (#17637 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-05-04 19:22:35 -07:00
Tyler Michael Smith	68e1ee0072	[Bugfix][Easy] Fix whitespace in shm_broadcast.py logging (#17635 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-05-04 19:20:19 -07:00
Cyrus Leung	2858830c39	[Bugfix] Prioritize dtype in root config before checking text config (#17629 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-04 12:43:05 +00:00
Harry Mellor	d6484ef3c3	Add full API docs and improve the UX of navigating them (#17485 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-03 19:42:43 -07:00
Cyrus Leung	46fae69cf0	[Misc] V0 fallback for `--enable-prompt-embeds` (#17615 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-03 22:59:24 +00:00
Isotr0py	f66f1e0fa3	[Bugfix] Fix broken Qwen2.5-omni tests (#17613 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-03 17:08:14 +00:00
Cyrus Leung	887d7af882	[Core] Gate `prompt_embeds` behind a feature flag (#17607 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-04 00:19:20 +08:00
Gregory Shtrasberg	a92842454c	[Bugfix][ROCm] Using device_type because on ROCm the API is still torch.cuda (#17601 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-05-02 22:25:47 -07:00
Tyler Michael Smith	c8386fa61d	[Build/CI] Upgrade CUTLASS to 3.9.1 (#17602 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-05-02 22:25:14 -07:00
Chenyaaang	87baebebd8	[Frontend][TPU] Add TPU default max-num-batched-tokens based on device name (#17508 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-05-02 21:42:44 -07:00
rasmith	e3d0a1d190	[Quantizaton] [AMD] Add support for running DeepSeek int8 w8a8 MoE on ROCm (#17558 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-05-02 21:41:10 -07:00
22quinn	d47b605eca	Update test requirements to CUDA 12.8 (#17576 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-05-02 21:40:15 -07:00
Liangfu Chen	22c6f6397f	[Neuron][Build] Require setuptools >= 77.0.3 for PEP 639 (#17603 ) Signed-off-by: Liangfu Chen <liangfc@amazon.com>	2025-05-03 02:41:59 +00:00
Kevin H. Luu	3ec97e2cc5	[release] Add command to clean up Docker containers/images in TPU release machine (#17606 )	2025-05-02 18:54:34 -07:00
Eric Hartford	9b103a1d76	fix typo in logging (#17605 )	2025-05-02 18:04:40 -07:00
Richard Zou	b90b0852e9	[easy] Print number of needed GPUs in skip message (#17594 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-05-02 15:27:43 -07:00
Xiaodong Wang	9352cdb56d	[Hardware][AMD] Improve OAM device ID + llama4 Maverick MOE tuning (#16263 ) Signed-off-by: Lu Fang <lufang@fb.com> Co-authored-by: Lu Fang <lufang@fb.com>	2025-05-02 19:44:19 +00:00
Zhiyu	182f40ea8b	Add NVIDIA TensorRT Model Optimizer in vLLM documentation (#17561 )	2025-05-02 11:36:46 -07:00

... 4 5 6 7 8 ...

6557 Commits