xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-04 04:57:54 +08:00

Author	SHA1	Message	Date
c0de128	ac23d0ba18	[Bugfix][Hardware][AMD] Use dynamic WARP_SIZE in sampler vectorized_process Replace hardcoded WARP_SIZE=32 with the dynamic WARP_SIZE macro from cuda_compat.h to correctly support both Wave64 (MI300X/gfx942) and Wave32 (Strix Halo/gfx1151) architectures. The previous hardcoded value was incorrect for AMD CDNA GPUs which use 64-wide wavefronts. While the current static_assert (kWarpSize >= 4) passes for both 32 and 64, having inconsistent WARP_SIZE definitions across the codebase is a maintenance issue and potential latent bug. Changes: - Add cuda_compat.h include for WARP_SIZE macro - Replace local WARP_SIZE constant with kWarpSize from cuda_compat.h - Update static_assert and comments to use kWarpSize Signed-off-by: c0de128 <kevin.mckay@outlook.com>	2025-12-24 09:02:06 -06:00
Daniel Cámpora	eaa82a709a	[Bugfix][DSV32] Fix overflow in topk. (#30754 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-12-16 14:21:17 -08:00
Daniel Cámpora	184076c3fe	[DeepSeek v3.2] Make top-k work for any logit values. (#27568 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-08 06:55:58 -08:00
Lain	09a7e6f617	[Deepseek v3.2] Remove extra logics in indexer (#26465 ) Signed-off-by: Siyuan Fu <siyuanf@nvidia.com> Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Signed-off-by: Lain <siyuanf@nvidia.com> Co-authored-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-10-21 23:34:03 +00:00
Daniel Cámpora	80e9452984	[Deepseek v3.2] Optimize top_k_per_row (#26763 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-10-21 08:30:07 +00:00
Daniel Cámpora	e1098ced95	Add topk logits torch op for DS3.2. (#25945 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-10-07 10:07:32 +00:00
Vadim Gimpelson	f73d02aadc	[BUG] Fix #20484 . Support empty sequence in cuda penalty kernel (#20491 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>	2025-07-05 19:38:02 -07:00
Vadim Gimpelson	5d6d1adf15	[KERNEL] Sampler. CUDA kernel for applying repetition penalty (#18437 )	2025-06-03 21:13:01 -07:00

8 Commits