xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-23 18:27:15 +08:00

Author	SHA1	Message	Date
h-avsha	3443aaf8dd	Move to a faster base64 implementation (#19984 ) Signed-off-by: h-avsha <avshalom.manevich@hcompany.ai>	2025-06-24 20:33:51 -07:00
Isotr0py	2273ec322c	Revert "Fix(models/siglip): Add compatibility for Gemma models quantized by llm-compressor" (#20030 )	2025-06-25 11:23:29 +08:00
Wentao Ye	a6c4b87fbc	Revert "[Feature] Integrate new deepgemm (#19820 )" (#20049 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-24 19:45:22 -07:00
Brayden Zhong	1afa9948f5	[Llama4] Update `attn_temperature_tuning` (#19997 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-06-24 22:42:53 -04:00
Eli Uriegas	0d06b533a0	cmake: Update vllm_flash_attn for vllm_kernels (#20032 ) Signed-off-by: Eli Uriegas <eliuriegas@meta.com>	2025-06-24 22:44:10 +00:00
Boyuan Feng	c01d1c5aba	use .dev for version comparison with pytorch nightly release (#20031 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-06-24 21:52:16 +00:00
Brayden Zhong	ead369845d	[Easy] Remove submodule added in #19463 (#20039 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-06-24 13:23:15 -07:00
Wentao Ye	c6e3bba8e6	[Feature] Integrate new deepgemm (#19820 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-24 12:51:56 -07:00
lkchen	91f7d9d0b6	[P/D] Asynchronously do _nixl_handshake (#19836 ) Signed-off-by: Linkun Chen <github@lkchen.net> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-06-24 12:46:10 -07:00
Nick Hill	8619e7158c	[BugFix] Fix multi-node offline data parallel (#19937 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-24 12:45:20 -07:00
d.transposed	c635c5f744	[Misc][Benchmarking] Add variable request-rate ("ramp-up") to the benchmarking client. (#19423 ) Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Co-authored-by: Roger Wang <hey@rogerw.me>	2025-06-24 18:41:49 +00:00
Lucas Wilkinson	a045b7e89a	[Perf] Improve/Fix-regression for FA3 in High QPS regimes (#19463 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-06-24 13:09:01 -04:00
amit	981eeca41a	[Fix][V1] Remove --scheduling-policy oracle (#20010 ) Signed-off-by: amit <amit.man@gmail.com>	2025-06-24 09:52:15 -07:00
Reid	26d34eb67e	refactor example - qwen3_reranker (#19847 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-24 14:03:20 +00:00
Li, Jiang	53da4cd397	[Bugfix][CPU] Fix InputBatch for pooling models in the CPU v1 (#20014 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-06-24 13:20:04 +00:00
Vadim Gimpelson	9a3b88328f	[PERF] Speedup of MRoPE prepare inputs (#19939 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>	2025-06-23 23:01:26 -07:00
Reid	3014c920da	add some examples for other benchmark scripts (#19893 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-24 05:57:46 +00:00
Kay Yan	0eed516951	[doc] Fix broken link in the installation for CPU (#19980 ) Signed-off-by: Kay Yan <kay.yan@daocloud.io>	2025-06-24 12:04:11 +08:00
Chenyaaang	ee5ad8d2c5	[Misc][Tools][Benchmark] Add profile to autotune script (#19711 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-06-24 00:59:41 +00:00
QiliangCui	a738dbb2a1	Update test case parameter to have the throughput above 8.0 (#19994 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-06-24 00:18:10 +00:00
Chenyaaang	33d5e29be9	[TPU] Fix tpu model runner test (#19995 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-06-23 16:04:28 -07:00
22quinn	4671ac6e2a	[Bugfix][Benchmark] Fix Marlin benchmark (#19929 )	2025-06-24 07:25:12 +09:00
Jun-Howie	dd2ccf8dde	Feat Dynamic Quantization for MoE Layers in GPTQ Marlin Backend (#19395 )	2025-06-24 07:23:28 +09:00
22quinn	a3bc76e4b5	[CI/Build] Push latest tag for cpu and neuron docker image (#19897 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-23 14:15:37 -07:00
cascade	e6327c9b3e	[Feature] Support sequence parallelism for static fp8 quantization (#19181 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-06-23 16:09:02 -04:00
lkchen	d0132f025d	[Misc] Add type alias `ReqId` and `EngineId` for better readability (#19880 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-06-23 12:57:57 -07:00
Isotr0py	61f4fc5dc6	[Bugfix][v1] Fix step pooler implementation and step pooling usage in v1 (#19956 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-23 18:38:06 +00:00
Tyler Michael Smith	68aaeb3749	[EP+DP] Optimize the little operations in the DeepGEMM + DeepEP low latency case (#19885 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Signed-off-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-06-23 11:07:47 -07:00
Lukas Geiger	c3649e4fee	[Docs] Fix syntax highlighting of shell commands (#19870 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-06-23 17:59:09 +00:00
Reid	53243e5c42	[doc] improve readability for long commands (#19920 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-23 14:27:07 +00:00
Jee Jee Li	a6e6604d32	[Bugfix] Fix CI bitsandbytes failure (#19969 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-06-23 21:30:55 +08:00
Reid	b82e0f82cb	[doc] use MkDocs collapsible blocks - supplement (#19973 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-23 10:54:16 +00:00
Isotr0py	5111642a6f	[Doc] Update V1 status for decoder-only embedding models (#19952 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-23 09:31:06 +00:00
lkchen	1bcd15edc7	[BugFix][P/D] Fix for cases where _recving_transfers can be cleaned up when all transfer done (#19874 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-06-22 22:41:53 -07:00
Nicolò Lucchesi	2ebff5b77c	[P/D][NixlConnector] Support `tp_size > num_kv_heads` deployments (#19691 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-06-22 22:41:50 -07:00
Reid	f17aec0d63	[doc] Fold long code blocks to improve readability (#19926 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-23 05:24:23 +00:00
Vensen	493c275352	Fix(models/siglip): Add compatibility for Gemma models quantized by llm-compressor (#19643 ) Signed-off-by: Vensenmu <vensenmu@gmail.com>	2025-06-23 03:40:28 +00:00
jinqinn	f39ab2d4bd	[Misc] Configurable timeout for execute_model RPC calls via env var (#19544 ) Signed-off-by: jinqinn <goodqinjin@163.com>	2025-06-22 20:36:26 -07:00
amit	4a0f7888a3	[Core] feat: Implement Priority Scheduling in V1 Engine (#19057 ) Signed-off-by: amit <amit.man@gmail.com> Co-authored-by: Roger Wang <Rogerw0108@gmail.com>	2025-06-22 20:18:08 -07:00
Aaron Pham	c4cf260677	[Perf][CLI] Improve overall startup time (#19941 )	2025-06-22 23:11:22 +00:00
Ye (Charlotte) Qi	33d51f599e	[BugFix] Add an env to disable moe chunking to work around compile incompatibility (#19642 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-06-22 15:17:49 -07:00
Aaron Pham	e91386cde1	[Chore] dedup logs (#19955 )	2025-06-22 19:43:07 +00:00
Ye (Charlotte) Qi	2c11a29f0b	[Misc] Simplify vllm bench cli subcommand implementation (#19948 )	2025-06-22 12:34:48 -04:00
Roger Wang	c76a506bd6	[Misc] Update model-specific PR tagging (#19949 ) Signed-off-by: Roger Wang <hey@rogerw.me>	2025-06-22 12:16:08 +00:00
Reid	ec0db6f51c	[doc] use snippets for contact us (#19944 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-22 10:26:13 +00:00
22quinn	c305a2109d	[CI/Build] Auto tag perf benchmarks related PRs (#19943 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-22 08:46:21 +00:00
Wang, Yi	202c5df935	[Benchmark] fix request loss if "ping" is returned (#19535 ) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-06-22 07:21:04 +00:00
Ning Xie	2bb246b8f7	[MISC] add cpu_kvcache_space_bytes to CacheConfig (#19812 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-22 13:39:09 +08:00
Ning Xie	4c409cabc2	[Misc] add vllm_config in __init__ (#19866 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-21 23:10:46 -04:00
Adrian	3b1e4c6a23	[Docs] Add GPT2ForSequenceClassification to supported models in docs (#19932 ) Signed-off-by: nie3e <adrcwiek@gmail.com>	2025-06-21 20:57:19 +00:00

1 2 3 4 5 ...

7290 Commits