xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-03 06:37:08 +08:00

Author	SHA1	Message	Date
Lucas Kabela	55011aef24	[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile (#27764 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2025-11-03 11:12:15 -08:00
Sophie du Couédic	a4398fbb5e	[Feature][Benchmarks] Support `inf` burstiness (#26941 ) Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>	2025-11-03 18:33:17 +00:00
Aurick Qiao	2c19d96777	[Spec Decode] Integrate Suffix Decoding from Arctic Inference (#25784 ) Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>	2025-11-03 09:23:31 -08:00
Lucas Wilkinson	4bc400f47e	[CI/Testing] Add basic single node dual batch overlap test (#27235 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-03 17:00:46 +00:00
ahao-anyscale	cac4c10ef0	[BUG] Make 'binary' default option for saving torch compile artifacts when using standalone_compile (#27616 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2025-11-03 11:13:51 -05:00
pwschuurman	f7d2946e99	[Bugfix] Skip gs:// model paths for speculator detection (#27846 ) Signed-off-by: Peter Schuurman <psch@google.com>	2025-11-03 14:31:03 +00:00
gnovack	294c805f1d	Early exit for MoE LoRA kernels (#27131 ) Signed-off-by: gnovack <gnovack@amazon.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-03 20:22:17 +08:00
zhang-prog	40b69e33e7	[Model] Add PaddleOCR-VL Model Support (#27758 ) Signed-off-by: zhangyue <zhangyue66@baidu.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: zhangyue66 <zhangyue66@baidu.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-03 19:04:22 +08:00
Jee Jee Li	32257297dd	[CI/Build] Remove the flaky gpt-oss lora test (#27966 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-03 16:50:06 +08:00
Misha Efimov	ba464e6ae2	Add ORCA endpoint load metrics support (#24905 ) Signed-off-by: Misha Efimov <mef@google.com>	2025-11-03 08:21:31 +00:00
Kunshang Ji	7f4bdadb92	[XPU]Refine Dockerfile.xpu, avoid oneccl dependency issue (#27964 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-03 07:36:59 +00:00
Rémi Delacourt	cec7c28833	[Bugfix] Padded Eagle Specdec with Chunked Prefill (#26263 ) Signed-off-by: Rémi Delacourt <remi@mistral.ai> Signed-off-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com> Signed-off-by: remi <remi@mistral.ai> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-03 02:22:46 -05:00
Thomas Parnell	18961c5ea6	[Hybrid] Pass kernel block size to builders (#27753 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-11-03 05:48:03 +00:00
Sungyoon Jeong	470ad118b6	[Frontend] Align finish_reason when tool is called with OpenAI (#25054 ) Signed-off-by: Sungyoon Jeong <sungyoon.jeong@furiosa.ai> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2025-11-03 04:21:18 +00:00
Biswa Panda	1bf43ae35d	[BugFix][LoRA] use adapter_id instead of id field of lora_request (#27728 ) Signed-off-by: Biswa Panda <biswa.panda@gmail.com>	2025-11-03 10:08:08 +08:00
Vensen	0ce743f4e1	Fix(llm): Abort orphaned requests when llm.chat() batch fails Fixes #26081 (#27420 ) Signed-off-by: vensenmu <vensenmu@gmail.com>	2025-11-02 16:24:01 +00:00
Cyrus Leung	6c317a656e	[Misc] Provide Siglip2 chat template (#27939 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-02 13:42:38 +00:00
Asaf Joseph Gardin	00b31a36a2	[V1] [Hybrid] Mamba1 Automatic Prefix Caching (#26377 ) Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>	2025-11-02 04:16:23 -08:00
Julien Denize	73444b7b56	Performance fix MistralTokenizer: cache special ids and tokens (#27925 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2025-11-02 08:48:33 +00:00
Cyrus Leung	853a8eb53b	[Bugfix] Fix Qwen Omni audio inference (#27920 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-02 05:06:05 +00:00
Ben Browning	758ea2e980	[CI/Build] Fix flaky test_transcription_validation.py::test_basic_audio_gemma (#27924 ) Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-11-02 03:45:02 +00:00
Yue Zhang	685c99ee77	[KV offload] Offloading connector async scheduling support (#27648 ) Signed-off-by: KevinCheung2259 <2651309292@qq.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-11-01 21:08:56 +00:00
Benjamin Bartels	1e88fb751b	Adds anthropic /v1/messages endpoint to openai api_server (#27882 ) Signed-off-by: bbartels <benjamin@bartels.dev> Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>	2025-11-01 12:45:42 -07:00
Nick Hill	c2ed069b32	[BugFix] Fix mixed penalties batch with async scheduling (#27910 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-01 10:51:24 -07:00
wenxindongwork	af6e19f50f	[Core][TPU] Support TPU Data Parallalism (#27365 ) Signed-off-by: wenxindongwork <wenxindong@google.com>	2025-11-01 17:14:44 +00:00
Cyrus Leung	99d69af9ec	[Bugfix] Python 3.10 compatibility for `Self` (#27918 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-01 15:28:54 +00:00
Haco	d811b442d3	[Bugfix] DeepSeek V3.2 MTP metadata & CUDA graph issues (#26779 ) Signed-off-by: xiaohajiayou <923390377@qq.com>	2025-11-01 10:52:43 -04:00
wangxiyuan	30a14b034f	[V0 deprecation] Remove VLLM_USE_V1 usage in platform and v1 module (#27798 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-01 10:17:45 +00:00
Harry Mellor	799ce45cc1	[Docs] Mock all imports for docs (#27873 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-01 10:02:23 +00:00
ai-jz	2c0c7c39bd	feat(benchmarks): support HF model names in multi-turn benchmark (#27850 )	2025-11-01 08:04:52 +00:00
Yihua Cheng	e675118849	[Add] cmdline argument parsing for KV cache offloading modules (#27621 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-01 07:17:07 +00:00
TJian	e2347dbf58	[Bugfix] [Model] Missing MRoPE function definition from `KeyeForConditionalGeneration` (#27895 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-11-01 13:45:23 +08:00
Cyrus Leung	879a06579e	[CI/Build] Bump transformers version (#27528 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-31 22:11:07 -07:00
yugong333	29de3cdee4	Adding SplitK in fused_moe_lora kernel (#27818 ) Signed-off-by: Yu Gong <yu3.gong@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-01 12:55:46 +08:00
Yan Ma	7e2729b57e	[Multimodal][XPU]Enable vision attn backend for xpu platform (#27525 ) Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Yejing Lai <yejing.lai@intel.com> Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-01 04:45:02 +00:00
Jee Jee Li	3a5de7d2d6	[Bugfix] Fix KDA output (#27905 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-01 11:54:36 +08:00
Jee Jee Li	bc4486d609	[Kernel] Enable FusedMoEModularKernel support bias (#27754 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-01 02:05:12 +00:00
Nick Hill	0cdbe7b744	[Core] Async scheduling + structured outputs compatibility (#26866 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-01 00:35:04 +00:00
Chen Zhang	df334868ca	[Hybrid] A simpler algorithm to find kernel_block_size (#26476 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-10-31 21:30:28 +00:00
Bram Wasti	0e0a638c3b	Batch invariance doc (#27839 ) Signed-off-by: Bram Wasti <bwasti@meta.com> Signed-off-by: Bram Wasti <bwasti@fb.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-31 17:22:19 -04:00
Matthew Bonanni	f29aeb5a25	Add FLASHINFER_MLA to test_mla_backends and add B200 CI run (#27663 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-31 11:12:19 -07:00
Vinay R Damodaran	5e8862e9e0	[Feature] Pydantic validation for scheduler.py and structured_outputs.py (#26519 ) Signed-off-by: Vinay Damodaran <vrdn@hey.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-31 18:05:50 +00:00
Nick Hill	9e5bd3076e	[Cleanup] Remove no-longer-used `SpeculativeConfig.enable_chunked_prefill` (#27826 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-31 10:57:45 -07:00
Shu Wang	fc16f1c477	Flashinfer_CUTLASS_MOE fuses quantization for TP (#27223 ) Signed-off-by: Shu Wang. <shuw@nvidia.com>	2025-10-31 17:54:29 +00:00
ZiTian Zhao	bc306fe5e9	fix incorrect type annotation in KimiMLP (#27885 ) Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>	2025-10-31 17:38:02 +00:00
Chenguang Zheng	103a468bbf	[bugfix] Missing cached item in beam search (#27874 ) Signed-off-by: fake0fan <645327136@qq.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-31 17:34:27 +00:00
Rob Mulla	70bfbd7b16	Docs update tpu install instructions (#27824 ) Signed-off-by: Rob Mulla <rob.mulla@gmail.com> Signed-off-by: Rob Mulla <RobMulla@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-31 10:29:55 -07:00
GuanLuo	d6517be3cd	[Bugfix] Missing NIXL metadata for handshake initialization if instance spans multi-node (#26338 ) Signed-off-by: Guan Luo <gluo@nvidia.com> Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com> Signed-off-by: Guan Luo <41310872+GuanLuo@users.noreply.github.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2025-10-31 10:16:00 -07:00
Isotr0py	7e06c40e63	[Bugfix] Fix broken MRoPE for GLM-4.1V/GLM-4.5V (#27860 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-31 17:04:51 +00:00
Madeesh Kannan	675704ac01	[Bugfix] Allow 64-bit integer values for LoRA IDs to avoid overflow/truncation (#27876 ) Signed-off-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2025-10-31 16:58:42 +00:00

... 3 4 5 6 7 ...

11139 Commits