xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-12 07:07:09 +08:00

Author	SHA1	Message	Date
yt0428	05cae69f0f	[model] Add support for openPangu_Ultra_MoE (#27521 ) Signed-off-by: yuantao <2422264527@qq.com> Signed-off-by: yt0428 <51468697+yt0428@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-04 08:17:20 -08:00
Vadim Gimpelson	5fd8f02ea9	[PERF] Decouple projections from GDN custom op (#27512 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-04 08:11:41 -08:00
lyrisz	97e3dda84b	[Perf] SM100 - add swap AB optimization to CUTLASS FP8 GEMM (#27284 ) Signed-off-by: Faqin Zhong <faqin.zhong@gmail.com> Co-authored-by: Faqin Zhong <zhofaqin@amazon.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-04 07:49:25 -08:00
Nick Hill	5a0a6dfd55	[BugFix] Fix incorrect preallocated sampled_token_ids tensor size (#28025 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-04 07:38:16 -08:00
bnellnm	938772af03	[Kernels] Isolate modular kernel code from FusedMoEMethodBase subclasses. (#27123 )	2025-11-04 21:59:45 +08:00
tomeras91	e4ee658672	[Model] add optimal triton fused moe configs for NemotronH MoE (#27967 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-11-04 12:59:43 +00:00
tomeras91	77f8001f53	[Model][Bugfix] fix pipeline parallelism support for NemotronH (#27968 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-11-04 12:28:36 +00:00
Zhuohan Li	300a265978	[Core] Enable StatLogger in LLMEngine (#28020 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-11-04 04:13:35 -08:00
Jerry Zhang	03c4c4aa9d	Support using Int4PreshuffledTensor after loading (#26066 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-11-04 06:00:57 -05:00
yugong333	2ec401bc39	Load tuned fused_moe_lora shrink and expand kernel configs separately (#27435 ) Signed-off-by: Yu Gong <yu3.gong@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-04 18:27:35 +08:00
Varun Sundar Rabindranath	4022a9d279	[BugFix][Performance] Restore flashinfer autotuning for all scenarios (#27904 )	2025-11-04 15:56:21 +08:00
Zhewen Li	53f6e81dfd	[CI/Build] Fix OpenAI API correctness on AMD CI (#28022 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-11-04 07:20:50 +00:00
CSWYF3634076	43a6acfb7d	[Model] fix ernie45 reasoning_parser (#27973 ) Signed-off-by: wangyafeng <wangyafeng@baidu.com>	2025-11-04 07:16:46 +00:00
Mark McLoughlin	58279c60b5	[KV Connector] Make KVCacheConfig an explicit constructor argument (#27887 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-11-03 23:00:49 -08:00
Zhewen Li	2f84ae1f27	[CI/Build] Update LM Eval Version in AMD CI (#27944 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-11-04 06:36:40 +00:00
xiangze-arm	f32cbc9a0c	[CPU]Improve dynamic 4bit moe performance (#27240 ) Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>	2025-11-04 06:33:23 +00:00
Wentao Ye	7e4be74104	[Bug] Batch invariant: Fix flash attn MLA `RuntimeError: scheduler_metadata must have shape (metadata_size)` (#27884 )	2025-11-04 14:05:55 +08:00
Mark McLoughlin	380ba6816d	[Metrics] Enable sleep state metric outside of dev mode (#27867 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-11-03 20:35:36 -08:00
liuzhenwei	14a125a06d	[NIXL][XPU] Pin NIXL version to 0.7.0 (#27849 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>	2025-11-04 03:28:35 +00:00
Chauncey	c02fccdbd2	[Refactor] Lazy import tool_parser (#27974 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-11-04 10:10:10 +08:00
li2haipeng	6ddae74054	[LoRA] Lora shrink swizzle (#27694 ) Signed-off-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com> Signed-off-by: Haipeng Li <li2haipeng@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-04 09:30:20 +08:00
vllmellm	b13a447546	[Bugfix][ROCm] Fix ViT rotary embeddings for torch.compile compatibility on ROCm (#27748 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-11-03 17:12:19 -08:00
QiliangCui	7956b0c0bc	Remove the tpu docker image nightly build. (#27997 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-11-04 00:35:54 +00:00
Tyler Michael Smith	3758757377	[Bugfix] Fix MoE Routing Simulation (#28002 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2025-11-03 22:26:49 +00:00
Hank_	ccd3e55e51	[Bugfix][plugin] fla crash on plugin (#27322 )	2025-11-04 05:27:03 +08:00
Matthew Bonanni	01baefe674	Add TP parameter to attention tests (#27683 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-03 13:04:40 -08:00
Ning Xie	786030721e	[Docs] add runai_streamer_sharded to LoadConfig (#27937 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-11-03 20:35:16 +00:00
Matthew Bonanni	145c00a4d3	[Bugfix] change FlashMLA reorder_batch_threshold (#27777 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-03 15:17:10 -05:00
Lucas Kabela	55011aef24	[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile (#27764 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2025-11-03 11:12:15 -08:00
Sophie du Couédic	a4398fbb5e	[Feature][Benchmarks] Support `inf` burstiness (#26941 ) Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>	2025-11-03 18:33:17 +00:00
Aurick Qiao	2c19d96777	[Spec Decode] Integrate Suffix Decoding from Arctic Inference (#25784 ) Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>	2025-11-03 09:23:31 -08:00
Lucas Wilkinson	4bc400f47e	[CI/Testing] Add basic single node dual batch overlap test (#27235 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-03 17:00:46 +00:00
ahao-anyscale	cac4c10ef0	[BUG] Make 'binary' default option for saving torch compile artifacts when using standalone_compile (#27616 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2025-11-03 11:13:51 -05:00
pwschuurman	f7d2946e99	[Bugfix] Skip gs:// model paths for speculator detection (#27846 ) Signed-off-by: Peter Schuurman <psch@google.com>	2025-11-03 14:31:03 +00:00
gnovack	294c805f1d	Early exit for MoE LoRA kernels (#27131 ) Signed-off-by: gnovack <gnovack@amazon.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-03 20:22:17 +08:00
zhang-prog	40b69e33e7	[Model] Add PaddleOCR-VL Model Support (#27758 ) Signed-off-by: zhangyue <zhangyue66@baidu.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: zhangyue66 <zhangyue66@baidu.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-03 19:04:22 +08:00
Jee Jee Li	32257297dd	[CI/Build] Remove the flaky gpt-oss lora test (#27966 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-03 16:50:06 +08:00
Misha Efimov	ba464e6ae2	Add ORCA endpoint load metrics support (#24905 ) Signed-off-by: Misha Efimov <mef@google.com>	2025-11-03 08:21:31 +00:00
Kunshang Ji	7f4bdadb92	[XPU]Refine Dockerfile.xpu, avoid oneccl dependency issue (#27964 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-03 07:36:59 +00:00
Rémi Delacourt	cec7c28833	[Bugfix] Padded Eagle Specdec with Chunked Prefill (#26263 ) Signed-off-by: Rémi Delacourt <remi@mistral.ai> Signed-off-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com> Signed-off-by: remi <remi@mistral.ai> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-03 02:22:46 -05:00
Thomas Parnell	18961c5ea6	[Hybrid] Pass kernel block size to builders (#27753 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-11-03 05:48:03 +00:00
Sungyoon Jeong	470ad118b6	[Frontend] Align finish_reason when tool is called with OpenAI (#25054 ) Signed-off-by: Sungyoon Jeong <sungyoon.jeong@furiosa.ai> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2025-11-03 04:21:18 +00:00
Biswa Panda	1bf43ae35d	[BugFix][LoRA] use adapter_id instead of id field of lora_request (#27728 ) Signed-off-by: Biswa Panda <biswa.panda@gmail.com>	2025-11-03 10:08:08 +08:00
Vensen	0ce743f4e1	Fix(llm): Abort orphaned requests when llm.chat() batch fails Fixes #26081 (#27420 ) Signed-off-by: vensenmu <vensenmu@gmail.com>	2025-11-02 16:24:01 +00:00
Cyrus Leung	6c317a656e	[Misc] Provide Siglip2 chat template (#27939 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-02 13:42:38 +00:00
Asaf Joseph Gardin	00b31a36a2	[V1] [Hybrid] Mamba1 Automatic Prefix Caching (#26377 ) Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>	2025-11-02 04:16:23 -08:00
Julien Denize	73444b7b56	Performance fix MistralTokenizer: cache special ids and tokens (#27925 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2025-11-02 08:48:33 +00:00
Cyrus Leung	853a8eb53b	[Bugfix] Fix Qwen Omni audio inference (#27920 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-02 05:06:05 +00:00
Ben Browning	758ea2e980	[CI/Build] Fix flaky test_transcription_validation.py::test_basic_audio_gemma (#27924 ) Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-11-02 03:45:02 +00:00
Yue Zhang	685c99ee77	[KV offload] Offloading connector async scheduling support (#27648 ) Signed-off-by: KevinCheung2259 <2651309292@qq.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-11-01 21:08:56 +00:00

1 2 3 4 5 ...

10967 Commits