xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-08 17:07:13 +08:00

Author	SHA1	Message	Date
Ning Xie	cd821ea5d2	[CI] fix kv_cache_type argument (#18594 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-05-23 04:49:18 -07:00
Chauncey	b046cf792d	[Feature][V1]: suupports cached_tokens in response usage (#18149 ) Co-authored-by: simon-mo <xmo@berkeley.edu>	2025-05-23 01:41:03 -07:00
cascade	71ea614d4a	[Feature]Add async tensor parallelism using compilation pass (#17882 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-05-23 01:03:34 -07:00
aws-elaineyz	ed5d408255	[Neuron] Remove bypass on EAGLEConfig and add a test (#18514 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com>	2025-05-22 21:26:32 -07:00
lkchen	e44d8ce8c7	[Bugfix] Set `KVTransferConfig.engine_id` in post_init (#18576 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-05-23 02:54:42 +00:00
Harry Mellor	4b0da7b60e	Enable hybrid attention models for Transformers backend (#18494 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 10:12:08 +08:00
Mark McLoughlin	c6b636f9fb	[V1][Spec Decoding] Use model_loader.get_model() to load models (#18273 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-23 02:05:44 +00:00
Chenheli Hua	04eb88dc80	Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (#18569 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-05-23 01:59:18 +00:00
rasmith	46791e1b4b	[AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh (#18568 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-05-22 18:45:35 -07:00
Sanger Steel	c32e249a23	[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (#17926 ) Signed-off-by: Sanger Steel <sangersteel@gmail.com>	2025-05-22 18:44:18 -07:00
Kai Wu	c91fe7b1b9	[Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_pythonic parser (#17917 ) Signed-off-by: Kai Wu <kaiwu@meta.com>	2025-05-22 16:44:08 -07:00
Tyler Michael Smith	6e588da0f4	[Build/CI] Fix CUDA 11.8 build (#17679 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-22 12:13:54 -07:00
David Xia	1f3a1200e4	[Bugfix] make `test_openai_schema.py` pass (#18224 ) Signed-off-by: David Xia <david@davidxia.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-22 18:34:06 +00:00
Harry Mellor	ca86a7cf6e	[CI/Build] Update bamba test model location (#18544 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-22 06:01:07 -07:00
lkchen	a35a494745	[Bugfix] Add kwargs to RequestOutput __init__ to be forward compatible (#18513 ) Signed-off-by: Linkun <github@lkchen.net>	2025-05-22 05:24:43 -07:00
aws-elaineyz	fa72f9a812	Order sequence ids + config update to support specifying custom quantization layers (#18279 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com> Co-authored-by: Tailin Pan <tailinpa@amazon.com> Co-authored-by: Rishabh Rajesh <rishyraj@amazon.com> Co-authored-by: Yishan McNabb <yishanm@amazon.com> Co-authored-by: Patrick Lange <patlange@amazon.com> Co-authored-by: Maxwell Goldberg <mgld@amazon.com> Co-authored-by: Aakash Shetty <sheaak@amazon.com>	2025-05-22 02:20:36 -07:00
Jee Jee Li	db5a29ba19	[Bugfix] Fix LoRA test (#18518 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-21 21:48:53 -07:00
Russell Bryant	6e0fd34d3c	[CI] Fix race condition with StatelessProcessGroup.barrier (#18506 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-21 20:19:13 -07:00
Mark McLoughlin	bb0a311213	Revert "[v1] Support multiple KV cache groups in GPU model runner (#17945 ) (#18459 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-21 10:25:23 -07:00
Hosang	dd5fa7e04f	[ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 (#17004 ) Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>	2025-05-21 08:35:00 -07:00
bnellnm	c6c10ca920	[Bugfix] Reduce moe_sum test size to avoid OOM (#18484 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-05-21 06:46:39 -07:00
Dhia Eddine Rhaiem	eca18691d2	[MODEL] FalconH1 (#18406 ) Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae> Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae> Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae>	2025-05-21 04:59:06 -07:00
Rabi Mishra	61acfc45bc	[Bugfix][Failing Test] Fix test_events.py (#18460 ) Signed-off-by: rabi <ramishra@redhat.com>	2025-05-21 04:57:28 -07:00
bnellnm	92247c522e	[Bug] Fix moe_sum signature (#18440 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-05-20 22:37:08 -07:00
Michael Goin	f4a8a37465	[Minor] Rename quantization nvfp4 to modelopt_fp4 (#18356 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-20 09:08:37 -07:00
wang.yuqi	86847700d7	[CI] Add mteb testing to test the accuracy of the embedding model (#17175 )	2025-05-20 06:51:12 -07:00
Jee Jee Li	6b35cb10a0	[Misc] Add LoRA code owner (#18387 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-20 03:27:30 -07:00
Nan Qin	9609327fa4	[Core] [Bugfix]: tensor parallel with prompt embeds (#18171 ) Signed-off-by: Nan2018 <nan@protopia.ai> Co-authored-by: Andrew Sansom <andrew@protopia.ai>	2025-05-19 20:21:27 -07:00
Isotr0py	f07a673eb2	[Misc] Allow `AutoWeightsLoader` to skip loading weights with specific substr in name (#18358 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-19 20:20:12 -07:00
Satyajith Chilappagari	dc1440cf9f	Neuron up mistral (#18222 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>	2025-05-19 09:54:47 -07:00
Wenhua Cheng	e2ee1e8e9e	[Feature]Add support for models quantized with AutoRound (#17850 ) Signed-off-by: wenhuach21 <wenhua.cheng@intel.com>	2025-05-19 09:38:53 -07:00
Jee Jee Li	6781af5608	[Quantization] Pool model support bitsandbytes (#18087 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-19 09:03:43 -07:00
Nan Qin	221cfc2fea	Feature/vllm/input embedding completion api (#17590 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai> Signed-off-by: Nan2018 <nan@protopia.ai> Co-authored-by: 临景 <linjing.yx@alibaba-inc.com> Co-authored-by: Bryce1010 <bryceyx@gmail.com> Co-authored-by: Andrew Sansom <andrew@protopia.ai> Co-authored-by: Andrew Sansom <qthequartermasterman@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-05-18 20:18:05 -07:00
wwl2755	9da1095daf	[Spec Decode][V0] Fix spec decode correctness test in V0 eagle/medusa (#18175 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-05-18 19:49:46 -07:00
cascade	9ab2c02ff8	Support sequence parallelism combined with pipeline parallelism (#18243 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-05-17 22:47:25 +00:00
Jinzhen Lin	e73b7dfd69	[Bugfix] fix `an illegal memory access was encountered` of marlin kernel + act_order (#18245 )	2025-05-16 16:02:44 -07:00
Bowen Wang	7fdfa01530	[Sampler] Adapt to FlashInfer 0.2.3 sampler API (#15777 ) Signed-off-by: Bowen Wang <abmfy@icloud.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-05-16 15:14:03 -07:00
Isotr0py	390ec88905	[Misc] Consolidate Audio tests into multimodal common generation tests (#18214 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-16 09:18:08 +00:00
Seiji Eicher	541817670c	[Misc] Add Ray Prometheus logger to V1 (#17925 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-05-16 01:02:42 -07:00
Lucia Fang	3d2779c29a	[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827 ) Signed-off-by: Lucia Fang <fanglu@fb.com>	2025-05-15 22:28:27 -07:00
Will Eaton	6b31c84aff	Throw better error for when running into k8s service discovery issue (#18209 ) Signed-off-by: Will Eaton <weaton@redhat.com>	2025-05-15 21:07:28 -07:00
Harry Mellor	b18201fe06	Allow users to pass arbitrary JSON keys from CLI (#18208 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-15 21:05:34 -07:00
Lucas Wilkinson	4e1c6a0264	[Bugfix] fix rotary embedding test for _get_padded_tensor_shape (#18229 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-16 01:32:45 +00:00
Lucia Fang	8795eb9975	[Bugfix] Fix test_eagle test (#18223 ) Signed-off-by: Lucia Fang <fanglu@fb.com>	2025-05-15 15:59:42 -07:00
Alexei-V-Ivanov-AMD	566ec04c3d	Adding "Basic Models Test" and "Multi-Modal Models Test (Extended) 3" in AMD Pipeline (#18106 ) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-05-15 08:49:23 -07:00
hustxiayang	451da4bcbd	add tools into TokenizeChatRequest (#18187 ) Signed-off-by: yangxia <yangxiast@gmail.com>	2025-05-15 04:01:49 -07:00
omahs	a9944aabfa	fix: typos (#18151 ) Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>	2025-05-15 02:16:15 -07:00
Russell Bryant	a8f5aec20a	[V1] Update zmq socket creation in nixl connector (#18148 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-14 23:17:57 -07:00
David Xia	de71fec81b	[CI] don't skip fixed `test_kv_cache_events()` (#18183 ) Signed-off-by: David Xia <david@davidxia.com>	2025-05-14 23:17:16 -07:00
Ning Xie	420caf7557	[UT] Add ut for none hash (#17892 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-05-15 13:28:11 +08:00

... 15 16 17 18 19 ...

2797 Commits