xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-08 19:29:09 +08:00

Author	SHA1	Message	Date
Kunshang Ji	fce10dbed5	[XPU] Add xpu torch.compile support (#22609 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-08-27 05:33:27 +00:00
Didier Durand	7c04779afa	[Doc]: fix various spelling issues in multiple files (#23636 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-08-26 14:05:29 +00:00
nvjullin	f66673a39d	[Kernel] Added flashinfer fp8 per-tensor gemms (#22895 ) Signed-off-by: Julien Lin <jullin@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-08-26 06:54:04 -07:00
Michael Goin	906e461ed6	[CI Fix] Pin deepep and pplx tags in tools/ep_kernels/, gate multigpu tests (#23568 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-25 18:29:00 -07:00
Pate Motter	c34c82b7fe	[TPU][Bugfix] Fixes prompt_token_ids error in tpu tests. (#23574 ) Signed-off-by: Pate Motter <patemotter@google.com>	2025-08-25 14:29:16 -07:00
Didier Durand	47455c424f	[Doc: ]fix various typos in multiple files (#23487 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-25 00:04:04 +00:00
Zhewen Li	0483fabc74	[CI/Build] add EP dependencies to docker (#21976 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-08-22 13:34:40 -07:00
Naman Lalit	ebe14621e3	[Bug fix] Dynamically setting the backend variable for genai_perf_tests in the run-nightly-benchmark script (#23375 ) Signed-off-by: Naman Lalit <nl2688@nyu.edu>	2025-08-22 15:12:28 +00:00
Cyrus Leung	8896eb72eb	[Deprecation] Remove `prompt_token_ids` arg fallback in `LLM.generate` and `LLM.embed` (#18800 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-22 10:56:57 +08:00
22quinn	480bdf5a7b	[Core] Support custom executor qualname (#23314 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-22 09:40:54 +08:00
Lain	f8ce022948	add tg-mxfp4-moe-test (#22540 ) Signed-off-by: siyuanf <siyuanf@nvidia.com> Signed-off-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-08-21 17:05:47 +00:00
youkaichao	e0b056e443	[ci/build] Fix abi tag for aarch64 (#23329 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-08-21 23:32:55 +08:00
Michael Goin	f64ee61d9e	[CI] Block the cu126 wheel build while broken (#23285 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-21 04:21:05 +00:00
QiliangCui	8993073dc1	[CI] Delete images older than 24h. (#23291 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-08-20 21:15:20 -07:00
Cyrus Leung	2461d9e562	[CI/Build] Split out mm processor tests (#23260 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-20 20:05:20 -07:00
Li, Jiang	7be5d113d8	[CPU] Refactor CPU W8A8 scaled_mm (#23071 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-08-21 09:34:24 +08:00
youkaichao	1b125004be	[misc] fix multiple arch wheels for the nightly index (#23110 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-08-20 14:15:34 -07:00
Michael Goin	0cdbf5e61c	[Kernel/Quant] Remove the original marlin format and qqq (#23204 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-20 15:13:36 -04:00
Yong Hoon Shin	dfd2382039	[torch.compile] Support conditional torch.compile per module (#22269 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-08-20 16:52:59 +00:00
Michael Goin	50df09fe13	Update to flashinfer-python==0.2.12 and disable AOT compile for non-release image (#23129 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-20 08:05:54 -04:00
Louie Tsai	941f56858a	Fix a performance comparison issue in Benchmark Suite (#23047 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Li, Jiang <bigpyj64@gmail.com>	2025-08-20 03:14:32 +00:00
Michael Goin	0f4f0191d8	[CI/Build] Replace lm-eval gsm8k tests with faster implementation (#23002 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-19 15:07:30 -07:00
amirkl94	a38b8af4c3	[NVIDIA] Add SM100 Flashinfer Cutlass MoE fp8 backend (#22357 ) Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>	2025-08-19 18:01:53 -04:00
22quinn	f7cf5b512e	[Frontend] Add `/collective_rpc` API endpoint (#23075 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-19 17:29:32 +00:00
elvischenv	03752dba8f	[NVIDIA] Support Flashinfer TRTLLM FP8-q/kv/out Attention Kernel (#21716 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-08-19 08:22:15 -04:00
Robert Shaw	6603288736	[CI][V0 Deprecation] Removed V0 Only Chunked Prefill and Prefix Caching Tests (#22871 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-18 17:39:01 -07:00
Kunshang Ji	5c79b0d648	[XPU][CI]add xpu env vars in CI scripts (#22946 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-08-18 09:47:03 +00:00
afeldman-nm	bf7f470b22	[V1] Logits processors extensibility (#19912 ) Signed-off-by: Andrew Feldman <afeldman@redhat.com> Signed-off-by: Andrew Feldman <afeld2012@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Andrew Feldman <afeld2012@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-16 12:59:17 -07:00
Michael Goin	4fc722eca4	[Kernel/Quant] Remove AQLM (#22943 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-08-16 19:38:21 +00:00
Eli Uriegas	76144adf76	ci: Add CUDA + arm64 release builds (#21201 ) Signed-off-by: Eli Uriegas <eliuriegas@meta.com>	2025-08-15 23:16:23 +00:00
Michael Goin	a344a1a7da	Use regex in convert-results-json-to-markdown.py (#22989 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-08-15 20:54:20 +00:00
bnellnm	8ad7285ea2	[Kernels] Clean up FusedMoeMethodBase and modular kernel setup. Remove extra arguments from modular kernel methods. (#22035 ) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-08-15 14:46:00 -04:00
Harry Mellor	e8b40c7fa2	[CI] Remove duplicated docs build from buildkite (#22924 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-15 05:58:06 -07:00
nvjullin	279a5f31b3	[Kernel] Add nvfp4 gemm flashinfer backends (#22346 ) Signed-off-by: Julien Lin <jullin@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-08-14 16:03:55 -04:00
Louie Tsai	00e3f9da46	vLLM Benchmark suite improvement (#22119 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: Li, Jiang <bigpyj64@gmail.com>	2025-08-14 07:12:17 +00:00
Woosuk Kwon	71683ca6f6	[V0 Deprecation] Remove multi-step scheduling (#22138 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-08-12 20:18:39 -07:00
Harry Mellor	839ab00349	Re-enable Xet on TPU tests now that `hf_xet` has been updated (#22666 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-11 19:54:40 -07:00
Cyrus Leung	ebf7605b0d	[Misc] Move tensor schema tests (#22612 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-11 00:15:27 -07:00
22quinn	b799f4b9ea	[CI/Build] Fix tensorizer test for load_format change (#22583 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-10 19:30:00 -07:00
Kyuyeun Kim	9a0c5ded5a	[TPU] Add support for online w8a8 quantization (#22425 ) Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>	2025-08-08 23:12:54 -07:00
Thomas Parnell	8a0ffd6285	Remove mamba_ssm from vLLM requirements; install inside test container using `--no-build-isolation` (#22541 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-08-08 23:05:32 -07:00
Andrew Chan	35171b1172	[Doc] update docs for nightly benchmarks (#12022 ) Signed-off-by: Andrew Chan <andrewkchan.akc@gmail.com>	2025-08-07 00:29:45 -07:00
Siyuan Liu	4b29d2784b	[CI][TPU] Fix docker clean up (#22271 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-08-05 23:54:56 +00:00
elvischenv	83156c7b89	[NVIDIA] Support Flashinfer TRT-LLM Prefill Attention Kernel (#22095 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-08-05 02:45:34 -07:00
lkchen	f4f4e7ef27	[V0 deprecation][P/D] Deprecate v0 `KVConnectorBase` code (1/2) (#21785 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-08-04 19:11:33 -07:00
Isotr0py	3dddbf1f25	[Misc] Add tensor schema test coverage for multimodal models (#21754 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-08-03 00:52:14 -07:00
Roger Wang	067c34a155	docs: remove deprecated disable-log-requests flag (#22113 ) Signed-off-by: Roger Wang <hey@rogerw.me>	2025-08-02 00:19:48 -07:00
Michael Goin	88faa466d7	[CI] Initial tests for SM100 Blackwell runner (#21877 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-01 16:18:38 -07:00
Harry Mellor	2d7b09b998	Deprecate `--disable-log-requests` and replace with `--enable-log-requests` (#21739 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-01 17:16:37 +01:00
Charent	ad57f23f6a	[Bugfix] Fix: Fix multi loras with tp >=2 and LRU cache (#20873 ) Signed-off-by: charent <19562666+charent@users.noreply.github.com>	2025-07-31 19:48:13 -07:00

1 2 3 4 5 ...

685 Commits