xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-01 19:27:08 +08:00

Author	SHA1	Message	Date
Richard Zou	682e0b6d2f	Log how much time loading a compiled artifact takes (#16848 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-04-19 16:50:46 +00:00
Cyrus Leung	205d84aaa9	[VLM] Clean up models (#16873 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-19 12:13:06 +00:00
Roger Wang	5124f5bf51	[Model] Qwen2.5-Omni Cleanup (#16872 )	2025-04-19 09:37:02 +00:00
Isotr0py	83f3c3bd91	[Model] Refactor Phi-4-multimodal to use merged processor and support V1 (#15477 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-19 02:26:11 -07:00
vie-serendipity	d9737ca1c6	[V1][Misc] stop update prefix cache stats when logs_stats is disabled (#16460 ) Signed-off-by: vie-serendipity <2733147505@qq.com>	2025-04-19 02:25:19 -07:00
Nicolò Lucchesi	2ef0dc53b8	[Frontend] Add sampling params to `v1/audio/transcriptions` endpoint (#16591 ) Signed-off-by: Jannis Schönleber <joennlae@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Jannis Schönleber <joennlae@gmail.com>	2025-04-19 07:03:54 +00:00
Divakar Verma	1d4680fad2	[rocm][MI300] llama4 maverick fp8 moe config tp8 (#16847 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-04-19 06:21:43 +00:00
Yang Fan	2c1bd848a6	[Model][VLM] Add Qwen2.5-Omni model support (thinker only) (#15130 ) Signed-off-by: fyabc <suyang.fy@alibaba-inc.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Xiong Wang <wangxiongts@163.com>	2025-04-18 23:14:36 -07:00
wang.yuqi	3d3ab3689f	[New Model]: Snowflake Arctic Embed (Family) (#16649 )	2025-04-18 08:11:57 -07:00
Harry Mellor	686623c5e7	Fix `nullable_kvs` fallback (#16837 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-18 05:58:39 -07:00
Cyrus Leung	aadb656562	[Misc] Clean up Kimi-VL (#16833 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-18 05:15:09 -07:00
Jonghyun Choe	87e067de41	[Model] use AutoWeightsLoader for BigCode, GPT-J (#16823 ) Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com>	2025-04-18 10:42:41 +00:00
Lucia Fang	e31045f95c	[Bugfix] fix pp for llama4 (#16746 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-04-18 13:51:30 +08:00
Luka Govedič	aaec845f8e	[ROCm] [Attention] Cleanup ROCm output passing (#16431 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2025-04-18 05:46:45 +00:00
rongfu.leng	7bdfd29a35	[Misc] add collect_env to cli and docker image (#16759 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-17 22:13:35 -07:00
Harry Mellor	e78587a64c	Improve-mm-and-pooler-and-decoding-configs (#16789 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-17 22:13:32 -07:00
Lucas Wilkinson	7eb4255628	[BugFix] Accuracy fix for llama4 int4 - improperly casted scales (#16801 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-04-17 22:13:29 -07:00
Shanshan Shen	30ed81b7ca	[V1][Structured Output] Minor modification to `_validate_structured_output()` (#16748 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-18 13:12:54 +08:00
Cyrus Leung	c16fb5dae8	[Doc] Improve help examples for `--compilation-config` (#16729 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-17 21:22:34 -07:00
Lucas Wilkinson	183dad7a85	[Attention] Update to lastest FA3 code (#13111 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-04-17 15:14:07 -07:00
Yihua Cheng	3408e47159	[P/D][V1] KV Connector API V1 (#15960 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: remi <remi@mistral.ai> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>	2025-04-17 13:22:40 -07:00
Nick Hill	0377b8310b	[MLA] Simplification to batch P/D reordering (#16673 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-17 16:12:09 -04:00
Mark McLoughlin	e4755f7fac	[V1][Metrics] Fix http metrics middleware (#15894 )	2025-04-17 19:52:18 +00:00
Sijia(Jackson) Chen	92edf35826	[ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints (#16674 )	2025-04-17 11:44:34 -07:00
Nicolò Lucchesi	eb5819b2d9	[V1][TPU] Enable Top K (#15489 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Hyesoo Yang <hyeygit@gmail.com> Co-authored-by: Hyesoo Yang <hyeygit@gmail.com>	2025-04-17 18:18:11 +00:00
Nicolò Lucchesi	5989f4684d	[TPU][V1] Fix padding recompilation when `max-num-batched-tokens` is not even (#16726 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-17 18:09:57 +00:00
rongfu.leng	5125d72f02	[Model] use AutoWeightsLoader for olmoe,opt,orion,persimmon,phi3_small (#16548 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-17 17:48:31 +00:00
Ximingwang-09	a018e555fd	[Kernel] Add fp8_w8a8 fused MoE kernel tuning configs for DeepSeek V3/R1 on NVIDIA H20 (#16753 ) Signed-off-by: ximing.wxm <ximing.wxm@antgroup.com> Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-04-18 00:01:30 +08:00
Robin	6211b92273	[Bugfix]Fix index out of range error in api server log (#16787 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-04-17 09:01:07 -07:00
Nick Hill	05fcd1b430	[V1][Perf] Faster incremental detokenization (#15137 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-17 07:45:24 -07:00
Harry Mellor	d27ea94034	Improve configs - `TokenizerPoolConfig` + `DeviceConfig` (#16603 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-17 11:19:42 +00:00
intervitens	5b1aca2ae3	[Bugfix] Fix GLM4 model (#16618 ) Signed-off-by: intervitens <intervitens@tutanota.com>	2025-04-17 03:35:07 -07:00
Russell Bryant	9dbf7a2dc1	[V1] Remove log noise when idle (#16735 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-16 21:34:08 -07:00
David Heineman	607029e515	[Bugfix] Revert max_prompt_len validation for decoder-only models. (#16741 ) Signed-off-by: David Heineman <david@davidheineman.com>	2025-04-16 21:33:15 -07:00
Divakar Verma	95aca283b4	[rocm][V0] fix selection logic for custom PA in V0 (#16426 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-04-16 19:52:11 -07:00
Robert Shaw	2b05b8ce69	[V1][Frontend] Improve Shutdown And Logs (#11737 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com> Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-16 19:48:34 -07:00
Bryan Lu	2cbd4d2999	[V1][Spec Dec Bug Fix] Respect Spec Dec Method Specification (#16636 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com>	2025-04-16 19:47:26 -07:00
Staszek Paśko	3092375e27	[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased] (#16432 ) Signed-off-by: Staszek Pasko <staszek@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-16 19:28:32 -07:00
Jade Zheng	8a7368e069	[Misc] Remove redundant comment (#16703 ) Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>	2025-04-17 00:44:52 +00:00
Harry Mellor	93e561ec4d	Improve error for structured output backend selection (#16717 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-17 00:35:35 +00:00
Joe Runde	e1b004839a	[Hardware] Add processor inputs to platform validation (#16680 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-04-16 09:28:42 -07:00
xsank	ee378f3d49	[Model] support modernbert (#16648 ) Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com> Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com>	2025-04-16 05:30:15 -07:00
Shanshan Shen	976711d9db	[V1][Structured Output] Move xgrammar related utils to `backend_xgrammar.py` (#16578 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-16 17:01:36 +08:00
billishyahao	3ac98edcb1	[Feature] add model aware kv ops helper (#16020 ) Signed-off-by: billishyahao <bill.he@amd.com>	2025-04-15 23:00:43 -07:00
Richard Zou	966c742ed2	Disable remote caching when calling compile_fx (#16611 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-04-15 22:18:28 -07:00
Jee Jee Li	0d7d05f4b6	[Misc] Modify LRUCache touch (#16689 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-16 04:51:38 +00:00
Shinichi Hemmi	3badb0213b	[Model] Add PLaMo2 (#14323 ) Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com> Signed-off-by: shemmi <shemmi@preferred.jp> Co-authored-by: Kento Nozawa <nzw0301@preferred.jp> Co-authored-by: Hiroaki Mikami <mhiroaki@preferred.jp> Co-authored-by: Calvin Metzger <metzger@preferred.jp>	2025-04-15 19:31:30 -07:00
Angky William	fdcb850f14	[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server (#10546 ) Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local> Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>	2025-04-15 22:31:38 +00:00
Dipika Sikka	54a66e5fee	[Misc] Update `compressed-tensors` WNA16 to support zero-points (#14211 )	2025-04-15 07:33:51 -06:00
DefTruth	280d62b8a2	[Kernel] Remove redundant Exp calculations (#16123 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-15 12:58:37 +00:00

1 2 3 4 5 ...

4003 Commits