xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-30 20:34:28 +08:00

Author	SHA1	Message	Date
Akshat Tripathi	643622ba46	[Hardware][TPU][V1] Multi-LoRA Optimisations for the V1 TPU backend (#15655 ) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Signed-off-by: xihajun <junfan@krai.ai> Signed-off-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk> Signed-off-by: Jorge de Freitas <jorge@krai.ai> Co-authored-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: xihajun <junfan@krai.ai> Co-authored-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk> Co-authored-by: Jorge de Freitas <jorge@krai.ai>	2025-05-28 19:59:09 +00:00
Mark McLoughlin	0e98964e94	[V1][Metrics] Remove metrics that were deprecated in 0.8 (#18837 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-28 18:54:12 +00:00
Alex Brooks	321331b8ae	[Core] Add Lora Support to Beam Search (#18346 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-05-28 08:58:24 -07:00
Reid	435fa95444	[Frontend] add run batch to CLI (#18804 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-28 07:08:57 -07:00
Harry Mellor	4c2b38ce9e	Enable Pydantic mypy checks and convert configs to Pydantic dataclasses (#17599 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-28 12:46:04 +00:00
wang.yuqi	de65fc8e1e	[CI] improve embed testing (#18747 )	2025-05-28 00:16:35 -07:00
Rabi Mishra	b78f844a67	[Bugfix][FailingTest]Fix test_model_load_with_params.py (#18758 ) Signed-off-by: rabi <ramishra@redhat.com>	2025-05-28 05:42:54 +00:00
wang.yuqi	3e9ce609bd	[Bugfix] Fix nomic max_model_len (#18755 )	2025-05-27 20:29:53 -07:00
Satyajith Chilappagari	e0cbad4e30	[Neuron] Support quantization on neuron (#18283 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>	2025-05-27 22:10:33 +00:00
Michael Goin	5873877241	[Bugfix] Mistral tool calling when content is list (#18729 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-27 09:05:37 -07:00
Mark McLoughlin	06a0338015	[V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-27 09:37:06 +00:00
Isotr0py	1f1b1bc03b	[V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-27 04:40:28 +00:00
Cyrus Leung	82e2339b06	[Doc] Move examples and further reorganize user guide (#18666 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-26 07:38:04 -07:00
Ning Xie	5a2c76cbe1	[CI] fix dump_input for str type (#18697 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-05-26 18:23:35 +08:00
Cyrus Leung	38b13dfe78	[CI/Build] Replace `math.isclose` with `pytest.approx` (#18703 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-26 02:05:17 -07:00
Ning Xie	4ea62c0ea0	[CI] add missing argument (#18694 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-05-26 00:22:04 -07:00
Cyrus Leung	fba0642704	[CI/Build][Doc] Update `gte-Qwen2-1.5B-instruct` usage (#18683 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-05-25 20:27:50 -07:00
Cyrus Leung	57fd13a707	[Bugfix] Fix profiling dummy data for Pixtral (#18677 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-25 14:05:30 +00:00
Michael Goin	63934543a0	Speed up the `kernels/quantization/` tests (#18669 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-25 05:02:59 +00:00
Isotr0py	75f81750f3	[VLM] Initialize video input support for InternVL models (#18499 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-05-25 04:51:25 +00:00
Mengqing Cao	6ab681bcbe	[Misc][ModelScope] Change to use runtime VLLM_USE_MODELSCOPE (#18655 ) Signed-off-by: Mengqing Cao <cmq0113@163.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-05-25 04:51:21 +00:00
qizixi	c1e4a4052d	[V1][Spec Decode] Support multi-layer eagle draft model (#18030 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-05-24 09:45:34 +00:00
Yuanhao WU	a859320575	[Model] Add support for Qwen2.5-Omni-7B-AWQ (Qwen2_5OmniForConditionalGeneration) (#18647 )	2025-05-24 09:15:36 +00:00
qizixi	d55e446d13	[V1][Spec Decode] Small refactors to improve eagle bookkeeping performance (#18424 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-05-24 06:51:22 +00:00
Robert Shaw	2b10ba7491	[Bugfix][Nixl] Fix Preemption Bug (#18631 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-05-23 23:30:16 +00:00
Feng XiaoLong	4fc1bf813a	[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454 ) Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com> Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>	2025-05-23 16:16:26 -07:00
Michael Goin	0ddf88e16e	[CI] Enable test_initialization to run on V1 (#16736 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-23 15:09:44 -07:00
Chen Zhang	6550114c9c	[v1] Redo "Support multiple KV cache groups in GPU model runner (#17945 )" (#18593 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-23 09:39:47 -07:00
Ning Xie	cd821ea5d2	[CI] fix kv_cache_type argument (#18594 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-05-23 04:49:18 -07:00
Chauncey	b046cf792d	[Feature][V1]: suupports cached_tokens in response usage (#18149 ) Co-authored-by: simon-mo <xmo@berkeley.edu>	2025-05-23 01:41:03 -07:00
cascade	71ea614d4a	[Feature]Add async tensor parallelism using compilation pass (#17882 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-05-23 01:03:34 -07:00
aws-elaineyz	ed5d408255	[Neuron] Remove bypass on EAGLEConfig and add a test (#18514 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com>	2025-05-22 21:26:32 -07:00
lkchen	e44d8ce8c7	[Bugfix] Set `KVTransferConfig.engine_id` in post_init (#18576 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-05-23 02:54:42 +00:00
Harry Mellor	4b0da7b60e	Enable hybrid attention models for Transformers backend (#18494 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 10:12:08 +08:00
Mark McLoughlin	c6b636f9fb	[V1][Spec Decoding] Use model_loader.get_model() to load models (#18273 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-23 02:05:44 +00:00
Chenheli Hua	04eb88dc80	Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (#18569 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-05-23 01:59:18 +00:00
rasmith	46791e1b4b	[AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh (#18568 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-05-22 18:45:35 -07:00
Sanger Steel	c32e249a23	[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (#17926 ) Signed-off-by: Sanger Steel <sangersteel@gmail.com>	2025-05-22 18:44:18 -07:00
Kai Wu	c91fe7b1b9	[Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_pythonic parser (#17917 ) Signed-off-by: Kai Wu <kaiwu@meta.com>	2025-05-22 16:44:08 -07:00
Tyler Michael Smith	6e588da0f4	[Build/CI] Fix CUDA 11.8 build (#17679 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-22 12:13:54 -07:00
David Xia	1f3a1200e4	[Bugfix] make `test_openai_schema.py` pass (#18224 ) Signed-off-by: David Xia <david@davidxia.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-22 18:34:06 +00:00
Harry Mellor	ca86a7cf6e	[CI/Build] Update bamba test model location (#18544 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-22 06:01:07 -07:00
lkchen	a35a494745	[Bugfix] Add kwargs to RequestOutput __init__ to be forward compatible (#18513 ) Signed-off-by: Linkun <github@lkchen.net>	2025-05-22 05:24:43 -07:00
aws-elaineyz	fa72f9a812	Order sequence ids + config update to support specifying custom quantization layers (#18279 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com> Co-authored-by: Tailin Pan <tailinpa@amazon.com> Co-authored-by: Rishabh Rajesh <rishyraj@amazon.com> Co-authored-by: Yishan McNabb <yishanm@amazon.com> Co-authored-by: Patrick Lange <patlange@amazon.com> Co-authored-by: Maxwell Goldberg <mgld@amazon.com> Co-authored-by: Aakash Shetty <sheaak@amazon.com>	2025-05-22 02:20:36 -07:00
Jee Jee Li	db5a29ba19	[Bugfix] Fix LoRA test (#18518 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-21 21:48:53 -07:00
Russell Bryant	6e0fd34d3c	[CI] Fix race condition with StatelessProcessGroup.barrier (#18506 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-21 20:19:13 -07:00
Mark McLoughlin	bb0a311213	Revert "[v1] Support multiple KV cache groups in GPU model runner (#17945 ) (#18459 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-21 10:25:23 -07:00
Hosang	dd5fa7e04f	[ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 (#17004 ) Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>	2025-05-21 08:35:00 -07:00
bnellnm	c6c10ca920	[Bugfix] Reduce moe_sum test size to avoid OOM (#18484 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-05-21 06:46:39 -07:00
Dhia Eddine Rhaiem	eca18691d2	[MODEL] FalconH1 (#18406 ) Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae> Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae> Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae>	2025-05-21 04:59:06 -07:00

1 2 3 4 5 ...

2025 Commits