xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-16 09:56:09 +08:00

Author	SHA1	Message	Date
Robin	c908a07f57	[Doc] Added QwQ-32B to the supported models list in the reasoning out… (#14479 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-03-08 07:07:32 +00:00
Robin	7b6fd6e486	[Doc]add doc for Qwen models tool calling (#14478 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-03-08 06:58:46 +00:00
Harry Mellor	47512b3200	Default to `generation_config` from model (#12622 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-08 14:46:15 +08:00
Roger Meier	3b9c6c6947	[CI/Build] refactor: set timezone of container to UTC (#12888 ) Signed-off-by: Roger Meier <r.meier@siemens.com>	2025-03-07 22:42:01 -08:00
Aviv Keshet	4aae667668	[core] add `extra_args` to `SamplingParams` (#13300 ) Signed-off-by: Aviv Keshet <akeshet@scaledcognition.com>	2025-03-08 14:41:18 +08:00
Cody Yu	9f3bc0f58c	[MISC][V1] Register process killing handler only in the main thread (#14380 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2025-03-07 22:40:06 -08:00
Mathis Felardos	980385f8c1	[Bugfix][Disaggregated] Add a check in send_kv_caches_and_hidden_states and fix the reshape of the KVCache (#14369 ) Signed-off-by: Mathis Felardos <mathis@mistral.ai>	2025-03-07 22:39:31 -08:00
Tyler Michael Smith	ca7a2d5f28	Revert "[Perf] Reduce MLA CPU overheads in V1 (#14384 )" (#14471 )	2025-03-07 22:18:53 -08:00
Tyler Michael Smith	333681408f	[Bugfix][V1] Handle MLA in kv_cache_interface (#14462 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-07 22:18:25 -08:00
afeldman-nm	ef64044079	[V1] Prompt logprobs + APC compatibility; prompt logprobs reqs cannot fill APC (#13949 )	2025-03-08 01:48:12 +00:00
yarongmu-google	66e16a038e	[Bugfix] Fix torch_xla which can't handle None seed introduced in #14274 (#14459 ) Signed-off-by: Yarong Mu <ymu@google.com>	2025-03-07 23:17:04 +00:00
Mark McLoughlin	e1f0835ae0	[V1][Metrics] Fix traceback with preemptions+LoRA (#14220 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-03-07 15:36:16 -05:00
Nick Hill	8ed5421aaa	[V1] Eagerly remove finished requests from the batch (#14388 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-07 10:56:00 -08:00
youkaichao	c6359e8ca6	[v1] torch.compile integration explanation (#14437 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-03-08 01:55:50 +08:00
Jee Jee Li	952a074980	[Misc] Add Phi4-MM example (#14343 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-07 17:28:52 +00:00
Jinzhen Lin	d0feea31c7	[Kernel] optimize performance of gptq marlin kernel when n is small (#14138 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>	2025-03-07 11:53:38 -05:00
Jeremy Arnold	58abe35455	[Benchmarks] Make detokenization optional in benchmark scripts (#11697 ) Signed-off-by: Jeremy Arnold <Jeremy.Arnold@amd.com>	2025-03-07 08:09:00 -08:00
York-RDWang	f7ebad2307	[Doc] Update prefix_caching.md to match the example image (#14420 )	2025-03-07 15:29:00 +00:00
Aaron Pham	80e9afb5bc	[V1][Core] Support for Structured Outputs (#12388 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-03-07 07:19:11 -08:00
iefgnoix	1e3598edeb	Use the optimized block sizes after tuning the kernel. (#14329 )	2025-03-07 13:25:13 +00:00
Harry Mellor	f7a6bd0fa1	Fix missing `kv_caches` and `attn_metadata` in `OpenVINOCausalLM` (#14271 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-07 12:30:42 +00:00
Aleksandr Malyshev	0ca3b8e01c	[BUGFIX] Skip tokenization support for throughput benchmark (#12712 ) Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2025-03-07 02:51:47 -08:00
மனோஜ்குமார் பழனிச்சாமி	cc10281498	[Misc] Set default value of seed to None (#14274 ) Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>	2025-03-07 10:40:01 +00:00
Cyrus Leung	05fb6718f0	[Bugfix] Clean up multi-modal processors (#14417 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-07 10:33:38 +00:00
Jee Jee Li	12c29a881f	[Bugfix] Further clean up LoRA test (#14422 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-07 10:30:55 +00:00
Peng Li	70da0c0748	correct wrong markdown syntax (#14414 ) Signed-off-by: vincent-pli <justdoit.pli@gmail.com>	2025-03-07 08:01:18 +00:00
Cyrus Leung	c1588a2c94	[GH] Auto-apply multi-modality label to relevant PRs (#14402 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-07 15:26:32 +08:00
Ilya Lavrenov	8ca7a71df7	OpenVINO: added CPU-like conditions (#14338 ) Signed-off-by: Ilya Lavrenov <ilya.lavrenov@intel.com>	2025-03-06 22:24:49 -08:00
Isotr0py	63137cd922	[Build] Add nightly wheel fallback when latest commit wheel unavailable (#14358 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-03-06 22:10:57 -08:00
Jee Jee Li	ddd1ef66ec	[Bugfix] Fix JambaForCausalLM LoRA (#14370 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-06 22:05:47 -08:00
Lucas Wilkinson	e5e03c2c1b	[BugFix] Illegal Memory Access in the blockwise cutlass fp8 GEMMs (#14396 )	2025-03-06 21:56:06 -08:00
Luka Govedič	e1744502c2	[FP8] Refactor apply_fp8_linear and apply_fp8_linear_generic into an object (#14390 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-03-07 05:20:16 +00:00
Lucas Wilkinson	dae6896977	[Perf] Reduce MLA CPU overheads in V1 (#14384 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-06 19:59:14 -08:00
Brayden Zhong	c34eeec58d	[Bugfix] Correctly call `cudaProfilerStop` in benchmarks script (#14183 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-03-07 00:42:49 +00:00
Daniel Li	ad60bbb2b2	[Doc] Fix a typo (#14385 )	2025-03-06 16:31:52 -08:00
Chengji Yao	0578e5a462	[Hardware][TPU]Enable ragged paged attention kernel and resolve recompilation issue (#14310 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-03-06 23:31:05 +00:00
Michael Goin	04222984f8	[Docs] Add nsight guide to profiling docs (#14298 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-06 14:19:58 -08:00
Michael Goin	6832707e90	[V1][Bugfix] Standardize quantized kv cache rejection for attention backends (#14221 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-06 14:18:29 -08:00
Michael Goin	6b2ef5cd17	[Bug] Fix Attention when ignored in by quant_method (#14313 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-06 14:18:06 -08:00
Tyler Michael Smith	958adce478	[Bugfix] Fix use_direct_call condition in FusedMoE layer for (#14382 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-06 14:17:21 -08:00
Tyler Michael Smith	99b0915d3b	[Kernel] Add needs_fixed_stride_order tag to most GEMMs (#14306 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-06 14:17:09 -08:00
Thomas Parnell	8ca2b21c98	[CI] Disable spawn when running V1 Test (#14345 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-03-06 21:52:46 +00:00
Michael Goin	d9292786e1	[CI/Build] Use uv python for docker rather than ppa:deadsnakes/ppa (#13569 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-06 16:08:36 -05:00
Tyler Michael Smith	cc2f9b32c8	[Distributed] Add enable_expert_parallel arg (#14305 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-06 18:54:45 +00:00
Himanshu Jaju	cd579352bf	[V1] Do not detokenize if sampling param detokenize is False (#14224 ) Signed-off-by: Himanshu Jaju <hj@mistral.ai> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-03-06 10:40:24 -08:00
Ying Zhong	9f1710f1ac	Fix mla prefill context performance (#13897 ) Signed-off-by: ZhongYingMatrix <zhongyingmatrix@gmail.com>	2025-03-06 09:35:49 -08:00
Thomas Parnell	e642ec962c	Add authors to license header. (#14371 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com> Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com>	2025-03-06 08:43:09 -08:00
Dilip Gowda Bhagavan	ada19210a3	Adding cpu inference with VXE ISA for s390x architecture (#12613 ) Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com> Signed-off-by: Rishika Kedia <rishika.kedia@in.ibm.com> Co-authored-by: Rishika Kedia <rishika.kedia@in.ibm.com>	2025-03-06 08:40:53 -08:00
Harry Mellor	bf0560bda9	Reinstate `best_of` for V0 (#14356 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-06 08:34:22 -08:00
youkaichao	151b08e0fe	[RLHF] use worker_extension_cls for compatibility with V0 and V1 (#14185 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-03-07 00:32:46 +08:00

1 2 3 4 5 ...

5022 Commits