xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-08-02 20:17:07 +08:00

Author	SHA1	Message	Date
Cyrus Leung	86ae693f20	[Deprecation][2/N] Replace `--task` with `--runner` and `--convert` (#21470 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-27 19:42:40 -07:00
Alexander Matveev	8f605ee309	[Attention] Make CutlassMLA the default backend for SM100 (blackwell) (#21626 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-27 20:13:00 +00:00
Ning Xie	a9b2a1d704	[Misc] Refactor vllm config str (#21666 )	2025-07-27 09:51:44 -07:00
Caleb_Du	57c22e57f9	Fix CUDA permute/unpermute for use with DeepGemm Moe (#17934 ) Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>	2025-07-27 07:08:00 -07:00
Wentao Ye	bda9d0535f	[Refactor] Refactor MOE NVFP4 Code Base: ModelOpt + Compressed Tensor (#21631 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-27 05:25:21 -07:00
Isotr0py	3d847a3125	[VLM] Add video support for Intern-S1 (#21671 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-07-27 11:49:43 +00:00
Benji Beck	5f8c9a425e	Migrate Florence2ImagePixelInputs to TensorSchema (#21663 ) Signed-off-by: Benji Beck <benjibeck@meta.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-27 02:43:02 -07:00
Ning Xie	1cbf951ba2	[Misc] add default value for file pattern arg (#21659 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-07-27 05:14:51 +00:00
ZiTian.Zhao	a8936e5193	Refactor: Remove numpy dependency from LoggingStatLogger (#20529 ) Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>	2025-07-27 04:06:21 +00:00
Ye (Charlotte) Qi	01a395e9e7	[CI/Build][Doc] Clean up more docs that point to old bench scripts (#21667 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-07-27 04:02:12 +00:00
Huy Do	971948b846	Handle non-serializable objects in vllm bench (#21665 )	2025-07-27 03:35:22 +00:00
Isotr0py	eed2f463b2	[VLM] Support HF format Phi-4-MM model (#17121 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-07-26 20:07:57 -07:00
Benji Beck	20950b29fb	Migrate ChameleonImagePixelInputs to TensorSchema (#21657 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-07-26 19:34:25 -07:00
Benji Beck	3339cba3ff	Migrate FuyuImagePatchInputs to TensorSchema (#21662 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-07-26 19:34:14 -07:00
Benji Beck	0b8caf9095	Migrate DeepseekVL2ImageInputs to TensorSchema (#21658 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-07-26 19:34:11 -07:00
Benji Beck	ccf27cc4d4	Migrate Blip2ImagePixelInputs and Blip2ImageEmbeddingInputs to TensorSchema (#21656 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-07-27 10:33:52 +08:00
Jinzhen Lin	c657369841	support `torch.compile` for bailing moe (#21664 )	2025-07-26 23:54:32 +00:00
Wenchen Lo	6c66f28fa5	Remove xformers requirement for Mistral-format Pixtral and Mistral3 (#21154 ) Signed-off-by: Wenchen Lo <charles761013@gmail.com>	2025-07-26 17:20:29 -06:00
Kaixi Hou	de509ae8eb	[NVIDIA] Explicitly disable shuffled weights for flashinfer blockscale moe fp8 kernels (#21411 ) Signed-off-by: kaixih <kaixih@nvidia.com>	2025-07-26 07:10:36 -07:00
Ye (Charlotte) Qi	e7c4f9ee86	[CI/Build][Doc] Move existing benchmark scripts in CI/document/example to vllm bench CLI (#21355 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-07-26 07:10:14 -07:00
Yeju Zhou	9094d11c5d	[Bugfix][Apple Silicon] fix missing symbols when build from source on Mac with Apple Silicon (#21380 ) Signed-off-by: Yeju Zhou <yejuzhou@outlook.com>	2025-07-26 07:09:57 -07:00
Wentao Ye	56e544f24b	[Refactor] Remove `moe_align_block_size_triton` (#21335 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-26 07:08:29 -07:00
WeiQing Chen	97d6c30cc9	[BugFix] Fix shared storage connector load kv only load attention layer (#21428 ) Signed-off-by: David Chen <530634352@qq.com>	2025-07-26 07:07:40 -07:00
Ye (Charlotte) Qi	a40a8506df	[Misc] Improve memory profiling debug message (#21429 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-07-26 07:07:21 -07:00
Wentao Ye	c215f5c877	[Bug] Fix `has_flashinfer_moe` Import Error when it is not installed (#21634 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-26 07:06:14 -07:00
Maximilien de Bayser	1cd6eaba54	Support encoder-only models without KV-Cache (#21270 ) Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-07-26 21:09:52 +08:00
Isotr0py	f27fdfc3ed	[Bugfix] Investigate Qwen2-VL failing test (#21527 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-07-26 06:09:29 -07:00
Benji Beck	de10ff0b7c	Migrate AyaVisionImagePixelInputs to TensorSchema for shape validation (#21622 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-07-26 06:08:18 -07:00
Benji Beck	9d197280fa	Migrate AriaImagePixelInputs to TensorSchema for shape validation (#21620 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-07-26 06:08:15 -07:00
Huy Do	e98def439c	[Take 2] Correctly kill vLLM processes after benchmarks (#21646 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-07-26 06:06:05 -07:00
Reid	05c1126f29	[Misc] remove unused try-except in pooling config check (#21618 ) Signed-off-by: reidliu41 <reid201711@gmail.com>	2025-07-26 12:20:03 +00:00
Lyu Han	875af38e01	Support Intern-S1 (#21628 ) Signed-off-by: Roger Wang <hey@rogerw.me> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Your Name <you@example.com> Co-authored-by: Roger Wang <hey@rogerw.me> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-07-26 19:14:04 +08:00
QiliangCui	7728dd77bb	[TPU][Test] Divide TPU v1 Test into 2 parts. (#21431 )	2025-07-26 06:20:30 +00:00
Alexandre JUAN	2f6e6b33fb	[Bugfix] Fix isinstance check for tensor types in _load_prompt_embeds to use dtype comparison (#21612 ) Signed-off-by: Alexandre Juan <a.juan@netheos.net>	2025-07-25 20:11:10 -07:00
Huy Do	a55c95096b	Correctly kill vLLM processes after finishing serving benchmarks (#21641 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-07-25 19:06:21 -07:00
WeiQing Chen	97349fe2bc	[Docs] add offline serving multi-modal video input expamle Qwen2.5-VL (#21530 ) Signed-off-by: David Chen <530634352@qq.com>	2025-07-25 18:37:32 -07:00
Farzad Abdolhosseini	62965de5fe	[Model] Ultravox: Support Llama 4 and Gemma 3 backends (#17818 ) Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai> Signed-off-by: Patrick Li <patrick8289@gmail.com> Co-authored-by: Patrick Li <patrick8289@gmail.com>	2025-07-25 18:12:31 -07:00
Alex Kogan	7ae75fa6d0	[Feature] Add support for MoE models in the calibration-free RTN-based quantization (#20766 ) Signed-off-by: Alex Kogan <alex.kogan@oracle.com>	2025-07-25 18:09:34 -07:00
Chengji Yao	f1b286b2fb	[TPU] Update ptxla nightly version to 20250724 (#21555 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-07-25 17:09:00 -07:00
Rui Qiao	c7742d6113	[Bugfix] Always set RAY_ADDRESS for Ray actor before spawn (#21540 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-07-25 17:08:30 -07:00
Rui Qiao	cea96a0156	[Bugfix] Fix sync_and_slice_intermediate_tensors (#21537 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-07-25 17:07:58 -07:00
Yong Hoon Shin	2eddd437ba	Add interleaved RoPE test for Llama4 (Maverick) (#21478 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-07-25 17:07:26 -07:00
Wentao Ye	75d29cf4e1	[Perf] Cuda Kernel for Int8 Per Token Group Quant (#21476 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-25 17:07:07 -07:00
Daniel Han	41d3082c41	Add Unsloth to RLHF.md (#21636 )	2025-07-25 17:06:48 -07:00
QiliangCui	7cfea0df39	[TPU][Test] Rollback PR-21550. (#21619 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-07-25 13:22:01 -07:00
Wenhua Cheng	5ac3168ee3	[Docs] add auto-round quantization readme (#21600 ) Signed-off-by: Wenhua Cheng <wenhua.cheng@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-25 08:52:42 -07:00
Kebe	396ee94180	[CI] Unifying Dockerfiles for ARM and X86 Builds (#21343 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-07-25 07:33:56 -07:00
mgazz	e189b50f53	Add support for Prithvi in Online serving mode (#21518 ) Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-07-25 07:01:27 -07:00
czhu-cohere	136d750f5f	[Kernel] Improve machete memory bound perf (#21556 ) Signed-off-by: czhu-cohere <conway.zhu@cohere.com>	2025-07-25 06:53:21 -07:00
who who who	b3caeb82e7	[ROCm][AITER] Enable fp8 kv cache on rocm aiter backend. (#20295 ) Signed-off-by: fsx950223 <fsx950223@outlook.com> Signed-off-by: amd-ruitang3 <Rui.Tang2@amd.com> Co-authored-by: amd-ruitang3 <Rui.Tang2@amd.com>	2025-07-25 06:50:21 -07:00

1 2 3 4 5 ...

8031 Commits