xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-01 09:07:15 +08:00

Author	SHA1	Message	Date
Chenyaaang	f8977c233f	Fix an error in dummy weight loading for quantization models (#18855 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-05-29 03:07:20 -07:00
Luka Govedič	f274581f44	[BugFix] Update pydantic to fix error on python 3.10 (#18852 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-05-29 03:05:46 -07:00
Lukas Geiger	0b1447f890	[Bugfix] Ensure tensors are contiguous during serialisation (#18860 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-05-29 03:05:20 -07:00
Nicolò Lucchesi	24d0ef8970	[Misc] Replace TODO in serving transcription (#18895 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-05-29 02:58:14 -07:00
Jee Jee Li	7fcfd954ff	[Bugfix] Fix misleading information in the documentation (#18845 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-29 02:54:14 -07:00
Reid	e740d07f07	[doc] add CLI doc (#18871 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-29 09:51:36 +00:00
Michael Yao	a652e71dd0	[Doc] Remove redundant spaces from compatibility_matrix.md (#18891 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-05-29 02:51:20 -07:00
Jee Jee Li	34d6c447c4	[LoRA] Add LoRA support for InternVL (#18842 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-29 08:46:24 +00:00
Satyajith Chilappagari	972eddf7c9	[Neuron] Add multi-LoRA support for Neuron. (#18284 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>	2025-05-29 16:41:22 +08:00
Brent Salisbury	fd7bb88d72	Fixes a dead link in nightly benchmark readme (#18856 ) Signed-off-by: Brent Salisbury <bsalisbu@redhat.com>	2025-05-29 04:41:39 +00:00
Yikun Jiang	3c49dbdd03	Skip device and quant Pydantic validation to make plugin device work (#18843 ) Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-05-28 20:12:30 -07:00
aws-elaineyz	1661a9c28f	[Doc][Neuron] Update documentation for Neuron (#18868 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com>	2025-05-28 19:44:01 -07:00
Chengji Yao	8e882ffdc0	[Bugfix][TPU] fix moe custom kernel import (#18853 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-05-28 19:34:19 -07:00
Richard Zou	26b4fa45be	Add ability to use CUDAGraphs with use_inductor=False (#17345 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-05-29 10:16:52 +08:00
Maximilien de Bayser	515b413ebf	Prevent the cross-encoder logic from being applied to classification tasks (#18838 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-05-28 19:16:17 -07:00
Hongxia Yang	269d901734	[Bugfix][ROCm] fix the power of 2 exception from triton_unified_attention.py when running llama4 models and unit test fix (#18100 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-29 07:21:46 +08:00
Varun Sundar Rabindranath	7951d78738	[Core] Enable CUDA graphs for DP + All2All kernels (#18724 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-05-28 22:55:30 +00:00
Harry Mellor	6dbe5b5c93	Remove checks for `None` for fields which should never be `None` (#17985 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-28 21:32:19 +00:00
Akshat Tripathi	643622ba46	[Hardware][TPU][V1] Multi-LoRA Optimisations for the V1 TPU backend (#15655 ) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Signed-off-by: xihajun <junfan@krai.ai> Signed-off-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk> Signed-off-by: Jorge de Freitas <jorge@krai.ai> Co-authored-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: xihajun <junfan@krai.ai> Co-authored-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk> Co-authored-by: Jorge de Freitas <jorge@krai.ai>	2025-05-28 19:59:09 +00:00
Aaron Pham	a09c7ca9f2	[Chore][Spec Decode] Update check NoneType instead of assigning variables (#18836 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-05-28 18:57:19 +00:00
Mark McLoughlin	0e98964e94	[V1][Metrics] Remove metrics that were deprecated in 0.8 (#18837 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-28 18:54:12 +00:00
rongfu.leng	c68b5c63eb	[Misc] fix olmoe model layer can't laod in tp gt 1 (#18828 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-05-28 17:36:21 +00:00
Aaron Pham	fced756923	[Chore] update ty configuration (#18839 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-05-28 08:59:11 -07:00
Alex Brooks	321331b8ae	[Core] Add Lora Support to Beam Search (#18346 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-05-28 08:58:24 -07:00
daniel-salib	6e4cea1cc5	decrement server_load on listen for disconnect (#18784 ) Signed-off-by: Daniel Salib <danielsalib@meta.com>	2025-05-28 22:15:12 +08:00
Reid	435fa95444	[Frontend] add run batch to CLI (#18804 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-28 07:08:57 -07:00
Harry Mellor	4c2b38ce9e	Enable Pydantic mypy checks and convert configs to Pydantic dataclasses (#17599 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-28 12:46:04 +00:00
Mengqing Cao	d781930f90	[Platform][Dist] Make torch distributed process group extendable (#18763 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-05-28 10:52:34 +00:00
Lucas Wilkinson	ce75efeecb	[BugFix] FA2 MLA Accuracy Issue (#18807 ) Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com>	2025-05-28 08:59:39 +00:00
Richard Zou	aa42561e40	Fix PiecewiseCompileInterpreter (#17338 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-05-28 08:40:53 +00:00
wang.yuqi	de65fc8e1e	[CI] improve embed testing (#18747 )	2025-05-28 00:16:35 -07:00
Cyrus Leung	0c492b7824	[Deprecation] Remove fallbacks for Embeddings API (#18795 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-28 15:09:04 +08:00
Cyrus Leung	0f0926b43f	[Deprecation] Remove unused sync methods in `async_timeout` (#18792 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-28 15:08:48 +08:00
Cyrus Leung	7f2c1a87e9	[Deprecation] Require overriding `get_dummy_text` and `get_dummy_mm_data` (#18796 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-28 15:08:35 +08:00
Rabi Mishra	b78f844a67	[Bugfix][FailingTest]Fix test_model_load_with_params.py (#18758 ) Signed-off-by: rabi <ramishra@redhat.com>	2025-05-28 05:42:54 +00:00
RonaldBXu	5e13c07d00	[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (2) (#18781 ) Signed-off-by: Ronald Xu <ronaldxu@amazon.com>	2025-05-28 05:09:14 +00:00
Divakar Verma	774c5fde30	[V1] fix torch profiling for V1 offline scenarios (#18445 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-05-28 04:16:30 +00:00
Guillaume Calmettes	9a21e331ff	[Bugfix]: correctly propagate errors message caught at the chat_templating step to the client (#18769 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-05-28 03:35:43 +00:00
wang.yuqi	3e9ce609bd	[Bugfix] Fix nomic max_model_len (#18755 )	2025-05-27 20:29:53 -07:00
fxmarty-amd	794ae1f551	[rocm] Fix wrong attention log (#18764 ) Signed-off-by: Felix Marty <felmarty@amd.com>	2025-05-27 19:45:41 -07:00
Lukas Geiger	d73a9457a5	[Core] Improve Tensor serialisation (#18774 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-05-28 09:46:21 +08:00
Luka Govedič	a3896c7f02	[Build] Fixes for CMake install (#18570 )	2025-05-27 20:49:24 -04:00
cascade	51e98e4ffd	[Bugfix] Disable prefix caching by default for benchmark (#18771 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-05-28 08:18:09 +08:00
Michael Goin	e56f44d9ec	Support datasets in `vllm bench serve` and sync with benchmark_[serving,datasets].py (#18566 )	2025-05-27 19:59:48 -04:00
Satyajith Chilappagari	e0cbad4e30	[Neuron] Support quantization on neuron (#18283 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>	2025-05-27 22:10:33 +00:00
Carol Zheng	b48d5cca16	[CI/Build] [TPU] Fix TPU CI exit code (#18282 ) Signed-off-by: Carol Zheng <cazheng@google.com>	2025-05-27 14:54:59 -07:00
Michael Goin	5873877241	[Bugfix] Mistral tool calling when content is list (#18729 ) Signed-off-by: mgoin <mgoin64@gmail.com> v0.9.0	2025-05-27 09:05:37 -07:00
Cyrus Leung	696259ca01	[Core] Automatically cast multi-modal input dtype (#18756 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-27 23:45:48 +08:00
chunxiaozheng	6b6d496114	optimize get_kv_cache_torch_dtype (#18531 ) Signed-off-by: idellzheng <idellzheng@tencent.com>	2025-05-27 13:08:44 +00:00
cascade	aaa4ac1c95	Disable prefix cache by default for benchmark (#18639 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-05-27 20:06:34 +08:00

1 2 3 4 5 ...

6821 Commits