xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-28 04:07:11 +08:00

Author	SHA1	Message	Date
amdfaa	a7791eac9d	[CI/Build] Install uv for AMD MI300: Language Models Tests (Hybrid) %N (#28142 ) Signed-off-by: amdfaa <107946068+amdfaa@users.noreply.github.com> Signed-off-by: zhewenli <zhewenli@meta.com> Co-authored-by: zhewenli <zhewenli@meta.com>	2025-11-13 14:34:55 +00:00
Pleaplusone	8da2f28f53	[ROCm][BugFix]Fix `get_cu_count` in rocm_aiter_fa.py (#28618 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-13 14:18:20 +00:00
Akash kaothalkar	86d15bfd8d	[Hardware][PowerPC] Fix fp16 compilation error for Power in cpu attention backend and bump oneDNN version (#28535 ) Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>	2025-11-13 13:32:21 +00:00
Fanli Lin	c9fe6abe7c	[Bugfix] Fix FPS value type for Qwen2.5-Omni video processing (#28630 ) Signed-off-by: Lin, Fanli <fanli.lin@intel.com>	2025-11-13 13:06:06 +00:00
zofia	c47b6c85ac	[XPU] add sym params to IPEXConfig (#28611 ) Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>	2025-11-13 11:35:04 +00:00
baonudesifeizhai	c428e8d80b	Fix io processor pooling #28273 (#28484 ) Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>	2025-11-13 11:34:14 +00:00
Zijing Liu	5e973209aa	[BugFix] Fix type error when assign a trition kernel tensor to a torch.nn.Parameter (#28603 ) Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>	2025-11-13 11:30:04 +00:00
Di Wu	e63fd44560	Fix: Correctly filter special tokens in benchmark_prefix_caching (#28615 ) Signed-off-by: Di Wu <dw2761@nyu.edu>	2025-11-13 10:57:44 +00:00
Yong Hoon Shin	11ac9ddd03	Support all interleaved layer types (#28485 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-11-13 08:57:20 +00:00
Chauncey	5c9ad138d5	[Frontend] supports interleaved thinking (#28531 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-11-13 16:14:13 +08:00
Jiangyun Zhu	fa183e9271	[Bugfix] fix kimi-linear crash (#28445 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-11-13 07:59:58 +00:00
usberkeley	4ab34f6ef1	Add NUMA node validation for CPU thread binding (#28555 ) Signed-off-by: Bradley <bradley.b.pitt@gmail.com>	2025-11-13 07:03:52 +00:00
Huy Do	c33b87e777	Use official xformers-0.0.33 built for PT 2.9 (#28600 ) Signed-off-by: Huy Do <huydhn@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-11-12 22:48:53 -08:00
tjandy98	4504e8029b	[Bugfix] Prevent crash on empty grammar string (#28210 ) Signed-off-by: tjandy98 <3953059+tjandy98@users.noreply.github.com>	2025-11-13 06:42:29 +00:00
Pleaplusone	ca00b1bfc6	[ROCm][BugFix] Remove the usage of `device_info` from aiter (#28383 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-12 21:43:42 -08:00
Radu Salavat	d44fbbab0e	[build][cmake]: Bundle static ACL and torch libgomp for CPU extension builds (#28059 ) Signed-off-by: Radu Salavat <radu.salavat@arm.com>	2025-11-13 05:43:08 +00:00
Lucia Fang	7e082bc14e	Support DeepEP for Kimi-k2-thinking through enabling gemm selection for compressed-tensor marlin wna16 (#28574 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-11-12 21:40:45 -08:00
Fanli Lin	dbbe0c756a	[XPU] Support Triton path for LoRA operations on XPU (#28511 ) Signed-off-by: Fanli Lin <fanli.lin@intel.com>	2025-11-13 05:31:42 +00:00
Pleaplusone	7dca0c90cb	[BugFix][ROCm] Fix `get_cu_count` missing variable error (#28608 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-13 05:18:56 +00:00
Andrew Xia	1a0b157a2e	[Frontend][responsesAPI][1/n] convert responses API tool input to chat completions tool format (#28231 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2025-11-13 04:47:22 +00:00
Andrew Xia	7c38ed0f1c	[Frontend] split append tool output (#28333 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-11-13 04:03:23 +00:00
Jialin Ouyang	a1d3866dda	[n-gen] DO NOT repeatedly return finished child requests (#28591 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-13 03:36:07 +00:00
Harry Mellor	97d1c99302	Rename clashing method names for vLLM model protocol (#27583 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 19:14:33 -08:00
Harry Mellor	3226283461	[Docs] Add some details about what the MoE block needs for the Transformers backend (#28588 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-13 03:12:14 +00:00
Nick Hill	8832fff972	[BugFix] Fix `mm_encoder_attn_backend` arg type checking (#28599 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-13 03:06:03 +00:00
Michael Goin	a543e678b4	[Bugfix] Fix SM100 gpt-oss regression due to faulty attn sink support (#28561 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-12 19:40:59 -07:00
wangxiyuan	2dacd57394	[platform] Move get_cu_count to utils (#27005 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-13 08:48:47 +08:00
Gregory Shtrasberg	d75ad04818	[ROCm][Bugfix] Revert removing setuptools version restriction (#28592 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-11-12 16:46:58 -08:00
Michael Goin	52eadcec9e	[Docs] Update meetups.md description (#28583 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-13 00:00:23 +00:00
Harry Mellor	51c599f0ec	Skip models that cannot currently init on Transformers v5 (#28471 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 23:43:57 +00:00
Alexander Matveev	69d0e90313	[MoE][Kernel][Perf] Improve Shared Expert Stream Overlap (#28406 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-11-12 23:37:24 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	4ca5cd5740	[Core][AMD] Migrate fully transparent sleep mode to ROCm platform (#12695 ) Signed-off-by: Hollow Man <hollowman@opensuse.org> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>	2025-11-12 15:24:12 -08:00
Michael Goin	10f01d5a3a	[Bugfix] Adjust Marlin CUDA arch selection to 8.0+PTX;9.0+PTX (#28294 )	2025-11-12 15:14:13 -08:00
QiliangCui	3eb0c2673e	[TPU] Support GCS path in VLLM_TORCH_PROFILER_DIR (#28487 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-11-12 22:31:14 +00:00
vllmellm	d8140b9833	[ROCM] Fix ROCm warnings, environment flag access, and GEMM kernel naming for consistency in `_aiter_ops.py` (#28464 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-11-12 21:46:57 +00:00
Varun Sundar Rabindranath	74a9a9faad	[Performance][B200] Fix deepgemm prologue (#27897 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-12 13:13:03 -08:00
Wei Wei	478ee511de	[Misc]Fix typo in llm_engine.py (#28584 ) Signed-off-by: Wei Wei <wwei6@meta.com>	2025-11-12 12:59:43 -08:00
Andy Lo	58ce8d12b7	[BugFix] Priority scheduling and spec tokens preemption (#28558 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2025-11-12 20:29:21 +00:00
Yihua Cheng	94a9ebcf31	[KV connector][WIP] KV cache proxy based on LMCache multi-process mode (#27902 ) Signed-off-by: ApostaC <yihua98@uchicago.edu>	2025-11-12 20:25:43 +00:00
Harry Mellor	a39dd7bb06	[CI] Skip "Multi-Modal Models Test (Extended) 3" test that's broken in current Transformers (#28559 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 19:38:13 +00:00
Thomas Parnell	64d57c3be7	[Model] [Config] Correctly identify granite-4.0-micro as non-hybrid model (#28563 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-11-12 18:17:55 +00:00
PerryZhang01	a1e7fa362a	[EPLB][ROCm]: support EPBL for ROCm backend (#27731 ) Signed-off-by: Perry Zhang <perzhang@amd.com> Co-authored-by: Perry Zhang <perzhang@amd.com>	2025-11-12 18:16:35 +00:00
alberto	bac904565f	Implement ARC KV cache eviction policy for CPU offloader (#27039 ) Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> Signed-off-by: alberto <aperdomo@redhat.com> Co-authored-by: Or Ozeri <or@ozery.com>	2025-11-12 09:51:39 -08:00
Benjamin Chislett	304419576a	[Perf] Refactor cudagraph_support to enable full CUDA graphs for spec decoding with FlashInfer (#28479 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-13 01:56:40 +09:00
Harry Mellor	a742134cc5	Remove deprecated fields from `CompilationConfig` (#27593 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 16:10:28 +00:00
Nicolò Lucchesi	728a9eb70e	[Misc] Refactor Attention kv transfer methods into decorator (#27816 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-11-12 16:05:44 +00:00
Canlin Guo	bc5bd45c7d	[Refactor] Remove redundant TP gather/split in split_qkv in QwenVL (#28271 ) Signed-off-by: gcanlin <canlinguosdu@gmail.com>	2025-11-12 15:56:47 +00:00
Alexander Matveev	f76e85c299	[Performance][Hopper] Avoid M dim padding to 4x for most cases (due to cuda graphs paddings) (#28492 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-11-12 10:51:43 -05:00
Harry Mellor	54aecd9ed5	Fix pre-commit (and XPU) on `main` (#28556 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 06:13:41 -08:00
wangxiyuan	10138c92a5	[V0 deprecation] Deprecate use_v1 parameter (#28112 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-12 14:03:52 +00:00

1 2 3 4 5 ...

11250 Commits