xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-10 06:45:01 +08:00

Author	SHA1	Message	Date
RichardoMu	40b6c9122b	[V1] feat:add engine v1 tracing (#20372 ) Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com> Signed-off-by: Ye Zhang <zhysishu@gmail.com> Signed-off-by: RichardoMu <44485717+RichardoMrMu@users.noreply.github.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Co-authored-by: Mu Huai <tianbowen.tbw@antgroup.com> Co-authored-by: Ye Zhang <zhysishu@gmail.com> Co-authored-by: Benjamin Bartels <benjamin@bartels.dev> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: 瑜琮 <ly186375@antfin.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-09-11 17:10:39 -07:00
Michael Goin	c3aea10dc8	[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-11 15:43:14 -07:00
Duncan Moss	074854b24f	[Kernel][B200] `mxfp4` fused cutlass moe (#23696 ) Signed-off-by: Duncan Moss <djm.moss@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-11 17:04:56 -04:00
co63oc	e26fef8397	fix some typos (#24616 ) Signed-off-by: co63oc <co63oc@users.noreply.github.com>	2025-09-11 10:48:46 -07:00
Harry Mellor	c1eda615ba	Fix model name included in responses (#24663 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-11 10:47:51 -07:00
Isotr0py	bcbe2a4d9e	[VLM] Optimize GLM4.5-V-style video processing to only decode necessary frames (#24161 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-11 09:44:34 -07:00
wang.yuqi	fd1ce98cdd	[CI] Split mteb test from Language Models Test (#24634 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-09-11 06:37:51 -07:00
Harry Mellor	5f5271f1ee	Move `LoRAConfig` from `config/__init__.py` to `config/lora.py` (#24644 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-11 11:01:38 +00:00
wang.yuqi	a8b0361c92	[CI] Split pooling from entrypoints Test (#24632 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-09-11 01:53:09 -07:00
Tao He	e93f4cc9e3	Add the support for the qwen3 next model (a hybrid attention model). (#24526 ) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-11 15:32:09 +08:00
Jerry Zhang	2048c4e379	[torchao] Support quantization configs using module swap (#21982 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-09-10 23:53:24 -07:00
TaehyunKim	9bd831f501	[Model] New model support for Motif-1-Tiny (#23414 ) Signed-off-by: ca1207 <ca1207zzz@gmail.com> Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com> Co-authored-by: WyldeCat <skan1543@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-10 23:29:40 -07:00
Wenlong Wang	6c8deacd72	[Bug] [Spec Decode] Fix model_initialization test and mismatch in aux_hidden_layers (#24613 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-09-10 21:23:18 -07:00
Hanjie Qiu	dcb28a332b	[Kernel] Flashinfer MLA (trtllm-gen) decode kernel integration (#21078 ) Signed-off-by: hjjq <hanjieq@nvidia.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-10 15:31:10 -07:00
Gregory Shtrasberg	9a161307f5	[torch.compile][ROCm][V1] Enable attention output FP8 fusion for V1 attention backends (#19767 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-10 13:59:55 -07:00
Russell Bryant	37e8182bfe	[v1] Add Whisper model support (encoder-decoder) (#21088 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: NickLucche <nlucches@redhat.com>	2025-09-10 13:53:35 -07:00
Nick Hill	4db4426404	[CI] Fail subprocess tests with root-cause error (#23795 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-10 13:53:21 -07:00
Xingyu Liu	9fb74c27a7	[Core] Support configuration parsing plugin (#24277 ) Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com> Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-10 11:32:43 -07:00
Jee Jee Li	bb3eb80d92	[Core] Split LoRA layers (#24574 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-10 07:47:51 -07:00
pwschuurman	fcc0a3130a	[CI] Fix tensorizer test assertion (#24545 ) Signed-off-by: Peter Schuurman <psch@google.com>	2025-09-10 06:57:36 -07:00
co63oc	3144d90217	fix some typos (#24167 ) Signed-off-by: co63oc <co63oc@users.noreply.github.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-09-10 06:21:23 -07:00
wang.yuqi	bd98842c8a	[CI] Add PPL test for generation models (#24485 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-09-10 06:16:39 -07:00
Ye (Charlotte) Qi	492196ed0e	[CI/Build] split true unit tests to Entrypoints Unit Tests (#24418 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-09-10 06:16:07 -07:00
lacora	0b9a612fa3	[BugFix][easy] Fix flaky test test_gpt_oss_multi_turn_chat (#24549 ) Signed-off-by: lacora2017 <yehu@meta.com> Co-authored-by: lacora2017 <yehu@meta.com>	2025-09-10 21:14:55 +08:00
Harry Mellor	f36355abfd	Move `LoadConfig` from `config/__init__.py` to `config/load.py` (#24566 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-10 06:14:18 -07:00
baonudesifeizhai	6cbd41909e	Feature/vit attention unification# 23880 (#23978 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-10 06:10:14 -07:00
danielafrimi	72d30108a0	Support for NemotronH Nano VLM (#23644 ) Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>	2025-09-10 06:10:06 -07:00
vllmellm	7c195d43da	[ROCm][Bugfix] Fix Aiter RMSNorm (#23412 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-09-10 21:08:03 +08:00
Flora Feng	77f62613f9	Consolidate rendering parameters into RenderConfig dataclass (#24543 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2025-09-10 08:44:47 +00:00
Remy	feaf202e93	[Bugfix] Guard `_may_reorder_batch` for encoder-only models on CPU (#24319 ) (#24348 ) Signed-off-by: Remy <eunhwan.shin@dtonic.io> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2025-09-10 14:24:42 +08:00
pwschuurman	4377b1ae3b	[Bugfix] Update Run:AI Model Streamer Loading Integration (#23845 ) Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai> Signed-off-by: Peter Schuurman <psch@google.com> Co-authored-by: Omer Dayan (SW-GPU) <omer@run.ai> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-09-09 21:37:17 -07:00
Chenheli Hua	009d689b0c	[Core] Simplify and unify mm uuid handling & auto-generated mm hash overrides processing. (#24271 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-09-09 21:36:09 -07:00
Wei	0efdb5c3ba	[gpt-oss] Cache permute indices for faster MXFP4 MoE layer loading (#24154 ) Signed-off-by: Wei Wei <wwei6@meta.com>	2025-09-10 04:27:53 +00:00
Wenlong Wang	53b42f4102	[BugFix][Spec Decode] Fix out-of-range index triggered by eagle3; re-enable test for LlamaForCausalLMEagle3 (#24392 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-09-09 21:24:23 -07:00
Nick Hill	83dd28aae4	[CI] Adjust threshold for flaky ngram spec decoding test (#24528 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-09 21:07:33 -07:00
Nick Hill	7e7db04310	[CI] Retry flaky fp8 cutlass mla tests (#24536 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-09 20:33:10 -07:00
Jiangyun Zhu	b8a93076d3	[CI] execute all piecewise compilation tests together (#24502 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-09-09 11:05:25 -07:00
Flora Feng	15cb047e25	Extend renderer with embedding support and integrate completion endpoint (#24405 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2025-09-10 01:46:46 +08:00
d.transposed	922d3b401b	[Bugfix] Handle the edge case in detokenizer where processed tokens contain both `stop` str and `eos` token (#23938 ) Signed-off-by: dtransposed <damian.bogunowicz@gmail.com>	2025-09-09 07:30:24 -07:00
wang.yuqi	19332c0479	[Model] Systematic support for fp32 head, pooling models part (#23810 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-09-09 07:29:50 -07:00
Didier Durand	46876dff32	[Doc]: fixing typos to improve docs (#24480 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-08 23:06:04 -07:00
Ming Yang	1823a00d67	[Misc] Support bench serve long context (#24373 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-09-08 22:53:10 -07:00
Cyrus Leung	948dd3443b	[Bugfix] Fix Apertus HF repo name (#24447 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-08 21:40:29 -07:00
Zebing Lin	82dfb12e52	[Core] Use sha256 bytes instead of BlockHash to reduce GC overhead (#23673 ) Signed-off-by: linzebing <linzebing1995@gmail.com>	2025-09-08 21:34:37 -07:00
elvischenv	bba1042c6f	[Flashinfer] Support Flashinfer TRTLLM FP8-qkv BF16/FP16-out Attention Kernel (#23647 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-09-08 20:53:07 -07:00
Matthew Bonanni	620db1fc58	[Attention] FlashAttention MLA cudagraph support (#23958 ) Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-09-08 22:05:26 +00:00
Jiangyun Zhu	7be141b2c5	[CI] Enable encoder model compilation test (#24442 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-09-08 11:48:06 -07:00
Jee Jee Li	8d7f39b48c	[Model] Remove quantized mixtral (#24437 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-08 11:02:14 -07:00
Chenheli Hua	01dfb5e982	[Frontend] User-provided uuids for medias in chat. (RFC #22044 ) (#23449 ) Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: Roger Wang <hey@rogerw.me> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.me> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-09-08 06:42:20 -07:00
Harry Mellor	03dd652c16	Move `KVEventsConfig` from `config/__init__.py` to `config/kv_events.py` (#24433 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-08 06:41:27 -07:00

... 3 4 5 6 7 ...

3066 Commits