xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-11 10:37:10 +08:00

Author	SHA1	Message	Date
Ricardo Decal	3ed94f9d0a	[Docs] Enhance Anyscale documentation, add quickstart links for vLLM (#21018 ) Signed-off-by: Ricardo Decal <rdecal@anyscale.com>	2025-07-15 19:46:56 -07:00
Reid	fa839565f2	[Misc] Refactor: Improve argument handling for `conda` command (#20481 ) Signed-off-by: reidliu41 <reid201711@gmail.com>	2025-07-15 19:43:19 -07:00
Brayden Zhong	75a99b98bf	[Chore] Remove outdated transformers check (#20989 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-07-15 19:42:40 -07:00
Chauncey	b5c3b68359	[Misc] bump xgrammar version to v0.1.21 (#20992 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-07-15 19:42:16 -07:00
Thomas Parnell	6cbc4d4bea	[Model] Add ModelConfig class for GraniteMoeHybrid to override default max_seq_len_to_capture (#20923 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-07-15 19:19:10 -07:00
Michael Goin	153c6f1e61	[Frontend] Remove print left in FrontendArgs.add_cli_args (#21004 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-15 19:18:41 -07:00
Chauncey	34cda778a0	[Frontend] OpenAI Responses API supports input image (#20975 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-07-15 18:59:36 -06:00
Elfie Guo	30800b01c2	[Nvidia] Integrate SM100 cudnn prefill API to MLA prefill (#20411 ) Signed-off-by: Elfie Guo <elfieg@nvidia.com> Co-authored-by: Elfie Guo <eflieg@nvidia.com>	2025-07-15 17:56:45 -07:00
Chen LI	10be209493	[Bug Fix] get_distributed_init_method should get the ip from get_ip i… (#20889 ) Signed-off-by: Chen Li <lcpingping@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-07-15 21:23:52 +00:00
Marko Rosenmueller	19c863068b	[Frontend] Support cache_salt in /v1/completions and /v1/responses (#20981 ) Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>	2025-07-15 21:01:04 +00:00
Tuan, Hoang-Trong	f29fd8a7f8	[BugFix] fix 3 issues: (1) using metadata for causal-conv1d, (2) indexing overflow in v1 vLLM, and (3) init_states in v0 (#20838 ) Signed-off-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com> Co-authored-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>	2025-07-15 16:08:26 -04:00
Gregory Shtrasberg	ed10f3cea1	[ROCm] warpSize is being made non constexpr in ROCm 7.0 (#20330 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-07-15 14:01:44 -04:00
Harry Mellor	b637e9dcb8	Add full serve CLI reference back to docs (#20978 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-15 17:42:30 +00:00
Harry Mellor	1e36c8687e	[Deprecation] Remove `nullable_kvs` (#20969 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-15 17:21:50 +00:00
Harry Mellor	5bac61362b	Configure Gemini (#20971 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-15 09:37:05 -07:00
Harry Mellor	313ae8c16a	[Deprecation] Remove everything scheduled for removal in v0.10.0 (#20979 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-15 15:57:53 +00:00
Cyrus Leung	c847e34b39	[CI/Build] Fix wrong path in Transformers Nightly Models Test (#20994 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-15 08:53:16 -07:00
Patrick von Platen	e7e3e6d263	Voxtral (#20970 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-07-15 07:35:30 -07:00
Christian Pinto	4ffd963fa0	[v1][core] Support for attention free models (#20811 ) Signed-off-by: Christian Pinto <christian.pinto@ibm.com>	2025-07-15 14:20:01 +00:00
Harry Mellor	56fe4bedd6	[Deprecation] Remove `TokenizerPoolConfig` (#20968 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-15 14:00:50 +00:00
Rui Qiao	d91278181d	[doc] Add more details for Ray-based DP (#20948 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-07-15 05:37:12 -07:00
Li Wang	20149d84d9	[MISC] Add init files for python package (#20908 ) Signed-off-by: wangli <wangli858794774@gmail.com>	2025-07-15 12:16:33 +00:00
Thomas Parnell	3534c39a20	[V1] [Hybrid] Refactor mamba state shape calculation; enable V1 via cli (#20840 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-07-15 04:04:35 -07:00
Yifei Teng	c586b55667	[TPU] Optimize kv cache update kernel (#20415 ) Signed-off-by: Yifei Teng <tengyifei88@gmail.com>	2025-07-15 03:56:43 -07:00
Ricardo Decal	33d560001e	[Docs] Improve documentation for ray cluster launcher helper script (#20602 ) Signed-off-by: Ricardo Decal <rdecal@anyscale.com>	2025-07-15 03:55:45 -07:00
kourosh hakhamaneshi	f148c44c6a	[frontend] Refactor CLI Args for a better modular integration (#20206 ) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>	2025-07-15 02:23:42 -07:00
Ricardo Decal	235bfd5dfe	[Docs] Improve documentation for RLHF example (#20598 ) Signed-off-by: Ricardo Decal <rdecal@anyscale.com>	2025-07-15 01:54:10 -07:00
Reid	68d28e37b0	[frontend] Add --help=page option for paginated help output (#20961 ) Signed-off-by: reidliu41 <reid201711@gmail.com>	2025-07-15 00:42:00 -07:00
Ilya Markov	37a7d5d74a	[Misc] Refactor AllReduceFusionPass. Remove parameter (#20918 ) Signed-off-by: ilmarkov <imarkov@redhat.com> Co-authored-by: ilmarkov <imarkov@redhat.com>	2025-07-15 06:57:40 +00:00
Woosuk Kwon	d4d309409f	Implement Async Scheduling (#19970 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-14 23:01:46 -07:00
Jennifer He	85bd6599e4	[Model] Add AutoWeightsLoader support for BERT, RoBERTa (#20534 ) Signed-off-by: Jennifer He <islandhe@gmail.com> Signed-off-by: <islandhe@gmail.com> Signed-off-by: Jen H <islandhe@gmail.com>	2025-07-15 13:34:24 +08:00
Boyuan Feng	91b3d190ae	[cold start] replace VLLM_COMPILE_DEPYF with debug_dump_dir (#20940 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-07-15 13:02:17 +08:00
Isotr0py	fc017915f5	[Doc] Clearer mistral3 and pixtral model support description (#20926 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-07-14 21:56:53 -07:00
Pavani Majety	9ad0a4588b	[Bugfix] Switch bailout logic for kv-cache-dtype with SM100 Flashinfer (#20934 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-07-15 03:27:50 +00:00
Ruheena Suhani Shaik	016b8d1b7f	Enabled BnB NF4 inference on Gaudi (#20172 ) Signed-off-by: Ruheena Suhani Shaik <rsshaik@habana.ai>	2025-07-14 20:26:08 -07:00
Nicolò Lucchesi	80305c1b24	[CI] Fix flaky `test_streaming_response` test (#20913 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-07-14 20:15:15 -07:00
Reid	37e2ecace2	feat: add image zoom to improve image viewing experience (#20763 ) Signed-off-by: reidliu41 <reid201711@gmail.com>	2025-07-14 20:14:23 -07:00
Ricardo Decal	054c8657e3	[Docs] Add Kuberay to deployment integrations (#20592 ) Signed-off-by: Ricardo Decal <rdecal@anyscale.com>	2025-07-14 20:13:55 -07:00
XiongfeiWei	d4170fad39	Use w8a8 quantized matmul Pallas kernel (#19170 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-07-15 03:06:33 +00:00
Michael Goin	946aadb4a0	[CI/Build] Split Entrypoints Test into LLM and API Server (#20945 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-15 02:44:18 +00:00
Michael Goin	bcdfb2a330	[Bugfix] Fix incorrect dispatch for CutlassBlockScaledGroupedGemm and DeepGEMM (#20933 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-15 01:42:17 +00:00
Richard Zou	ba8c300018	[BugFix] VLLM_DISABLE_COMPILE_CACHE=1 should disable all reads and writes from the cache (#20942 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2025-07-15 01:26:18 +00:00
Alexander Matveev	8cdc371217	SM100 Cutlass MLA decode with unrestricted num_heads (< 128) for DeepSeek TP (#20769 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-07-15 01:06:38 +00:00
Yong Hoon Shin	61e20828da	Fall back if flashinfer comm module not found (#20936 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-07-14 23:11:18 +00:00
Kuntai Du	55e1c66da5	[Docs] remove outdated performance benchmark (#20935 ) Signed-off-by: Kuntai Du <kuntai@uchicago.edu>	2025-07-14 22:14:17 +00:00
Thomas Parnell	86f3ac21ce	Fix overflow indexing in causal_conv1d kernel (#20938 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-07-14 21:43:07 +00:00
Nicolò Lucchesi	149f2435a5	[Misc] Relax translations tests (#20856 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-07-14 20:08:36 +00:00
Varun Sundar Rabindranath	c0569dbc82	[Misc] ModularKernel : Perform WeightAndReduce inside TritonExperts & DeepGemmExperts (#20725 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-14 19:47:16 +00:00
Michael Goin	8bb43b9c9e	Add benchmark dataset for mlperf llama tasks (#20338 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-14 19:10:07 +00:00
Tyler Michael Smith	559756214b	Change default model to Qwen3-0.6B (#20335 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-07-14 16:54:52 +00:00

1 2 3 4 5 ...

7741 Commits