amit
|
981eeca41a
|
[Fix][V1] Remove --scheduling-policy oracle (#20010)
Signed-off-by: amit <amit.man@gmail.com>
|
2025-06-24 09:52:15 -07:00 |
|
Reid
|
26d34eb67e
|
refactor example - qwen3_reranker (#19847)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-24 14:03:20 +00:00 |
|
Li, Jiang
|
53da4cd397
|
[Bugfix][CPU] Fix InputBatch for pooling models in the CPU v1 (#20014)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-06-24 13:20:04 +00:00 |
|
Vadim Gimpelson
|
9a3b88328f
|
[PERF] Speedup of MRoPE prepare inputs (#19939)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>
|
2025-06-23 23:01:26 -07:00 |
|
Reid
|
3014c920da
|
add some examples for other benchmark scripts (#19893)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-24 05:57:46 +00:00 |
|
Kay Yan
|
0eed516951
|
[doc] Fix broken link in the installation for CPU (#19980)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
|
2025-06-24 12:04:11 +08:00 |
|
Chenyaaang
|
ee5ad8d2c5
|
[Misc][Tools][Benchmark] Add profile to autotune script (#19711)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-06-24 00:59:41 +00:00 |
|
QiliangCui
|
a738dbb2a1
|
Update test case parameter to have the throughput above 8.0 (#19994)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-06-24 00:18:10 +00:00 |
|
Chenyaaang
|
33d5e29be9
|
[TPU] Fix tpu model runner test (#19995)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-06-23 16:04:28 -07:00 |
|
22quinn
|
4671ac6e2a
|
[Bugfix][Benchmark] Fix Marlin benchmark (#19929)
|
2025-06-24 07:25:12 +09:00 |
|
Jun-Howie
|
dd2ccf8dde
|
Feat Dynamic Quantization for MoE Layers in GPTQ Marlin Backend (#19395)
|
2025-06-24 07:23:28 +09:00 |
|
22quinn
|
a3bc76e4b5
|
[CI/Build] Push latest tag for cpu and neuron docker image (#19897)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-23 14:15:37 -07:00 |
|
cascade
|
e6327c9b3e
|
[Feature] Support sequence parallelism for static fp8 quantization (#19181)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-06-23 16:09:02 -04:00 |
|
lkchen
|
d0132f025d
|
[Misc] Add type alias ReqId and EngineId for better readability (#19880)
Signed-off-by: Linkun Chen <github@lkchen.net>
|
2025-06-23 12:57:57 -07:00 |
|
Isotr0py
|
61f4fc5dc6
|
[Bugfix][v1] Fix step pooler implementation and step pooling usage in v1 (#19956)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-23 18:38:06 +00:00 |
|
Tyler Michael Smith
|
68aaeb3749
|
[EP+DP] Optimize the little operations in the DeepGEMM + DeepEP low latency case (#19885)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-06-23 11:07:47 -07:00 |
|
Lukas Geiger
|
c3649e4fee
|
[Docs] Fix syntax highlighting of shell commands (#19870)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-06-23 17:59:09 +00:00 |
|
Reid
|
53243e5c42
|
[doc] improve readability for long commands (#19920)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-23 14:27:07 +00:00 |
|
Jee Jee Li
|
a6e6604d32
|
[Bugfix] Fix CI bitsandbytes failure (#19969)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-23 21:30:55 +08:00 |
|
Reid
|
b82e0f82cb
|
[doc] use MkDocs collapsible blocks - supplement (#19973)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-23 10:54:16 +00:00 |
|
Isotr0py
|
5111642a6f
|
[Doc] Update V1 status for decoder-only embedding models (#19952)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-23 09:31:06 +00:00 |
|
lkchen
|
1bcd15edc7
|
[BugFix][P/D] Fix for cases where _recving_transfers can be cleaned up when *all* transfer done (#19874)
Signed-off-by: Linkun Chen <github@lkchen.net>
|
2025-06-22 22:41:53 -07:00 |
|
Nicolò Lucchesi
|
2ebff5b77c
|
[P/D][NixlConnector] Support tp_size > num_kv_heads deployments (#19691)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-06-22 22:41:50 -07:00 |
|
Reid
|
f17aec0d63
|
[doc] Fold long code blocks to improve readability (#19926)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-23 05:24:23 +00:00 |
|
Vensen
|
493c275352
|
Fix(models/siglip): Add compatibility for Gemma models quantized by llm-compressor (#19643)
Signed-off-by: Vensenmu <vensenmu@gmail.com>
|
2025-06-23 03:40:28 +00:00 |
|
jinqinn
|
f39ab2d4bd
|
[Misc] Configurable timeout for execute_model RPC calls via env var (#19544)
Signed-off-by: jinqinn <goodqinjin@163.com>
|
2025-06-22 20:36:26 -07:00 |
|
amit
|
4a0f7888a3
|
[Core] feat: Implement Priority Scheduling in V1 Engine (#19057)
Signed-off-by: amit <amit.man@gmail.com>
Co-authored-by: Roger Wang <Rogerw0108@gmail.com>
|
2025-06-22 20:18:08 -07:00 |
|
Aaron Pham
|
c4cf260677
|
[Perf][CLI] Improve overall startup time (#19941)
|
2025-06-22 23:11:22 +00:00 |
|
Ye (Charlotte) Qi
|
33d51f599e
|
[BugFix] Add an env to disable moe chunking to work around compile incompatibility (#19642)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-06-22 15:17:49 -07:00 |
|
Aaron Pham
|
e91386cde1
|
[Chore] dedup logs (#19955)
|
2025-06-22 19:43:07 +00:00 |
|
Ye (Charlotte) Qi
|
2c11a29f0b
|
[Misc] Simplify vllm bench cli subcommand implementation (#19948)
|
2025-06-22 12:34:48 -04:00 |
|
Roger Wang
|
c76a506bd6
|
[Misc] Update model-specific PR tagging (#19949)
Signed-off-by: Roger Wang <hey@rogerw.me>
|
2025-06-22 12:16:08 +00:00 |
|
Reid
|
ec0db6f51c
|
[doc] use snippets for contact us (#19944)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-22 10:26:13 +00:00 |
|
22quinn
|
c305a2109d
|
[CI/Build] Auto tag perf benchmarks related PRs (#19943)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-22 08:46:21 +00:00 |
|
Wang, Yi
|
202c5df935
|
[Benchmark] fix request loss if "ping" is returned (#19535)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-06-22 07:21:04 +00:00 |
|
Ning Xie
|
2bb246b8f7
|
[MISC] add cpu_kvcache_space_bytes to CacheConfig (#19812)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-22 13:39:09 +08:00 |
|
Ning Xie
|
4c409cabc2
|
[Misc] add vllm_config in __init__ (#19866)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-21 23:10:46 -04:00 |
|
Adrian
|
3b1e4c6a23
|
[Docs] Add GPT2ForSequenceClassification to supported models in docs (#19932)
Signed-off-by: nie3e <adrcwiek@gmail.com>
|
2025-06-21 20:57:19 +00:00 |
|
Woosuk Kwon
|
2c5302fadd
|
[Multimodal] Optimize Qwen2/2.5-VL startup time (#19756)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-06-21 20:01:07 +00:00 |
|
Reid
|
caa680fd2e
|
[doc] add contact us in community (#19922)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-21 17:29:06 +00:00 |
|
汪志鹏
|
c3bf9bad11
|
[New model support]Support Tarsier2 (#19887)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-06-21 04:01:51 +00:00 |
|
Isotr0py
|
6f170f11dd
|
[Bugfix] Fix bnb 8bit model weights loading (#19917)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-21 03:29:09 +00:00 |
|
Rabin Adhikari
|
8ca81bb069
|
Fix: Check the type of params to be a Sequence not list. (#19910)
Signed-off-by: Rabin Adhikari <rabin.adk1@gmail.com>
|
2025-06-20 23:03:17 +00:00 |
|
wangxiyuan
|
e773a9e1c2
|
[Misc] Clean up useless code (#19889)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-06-20 21:09:09 +00:00 |
|
Ning Xie
|
71baf85ae1
|
[Kernel] mark TorchSDPABackend swap_blocks NotImplementedError (#19749)
|
2025-06-20 18:18:11 +00:00 |
|
Li, Jiang
|
79f2f1c2a1
|
[CPU][CI] Fallback sliding window to v0 and fix CPU pooling model tests (#19901)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-06-20 15:30:36 +00:00 |
|
Vlad Tiberiu Mihailescu
|
2e3e3c86dc
|
Export NaNs in logits to scheduler_stats if output is corrupted (#18777)
Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com>
|
2025-06-20 22:47:16 +08:00 |
|
Chendi.Xue
|
7e8977fcd4
|
[custom_op][vllm-plugin] update custom_op class to use op_registry (#19164)
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
|
2025-06-20 07:44:56 -07:00 |
|
Adrian
|
f1e840e842
|
[Model] GPT2ForSequenceClassification model (#19663)
Signed-off-by: nie3e <adrcwiek@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-06-20 12:07:41 +00:00 |
|
Thomas Parnell
|
7771d1de88
|
[Fix] import regex instead of re (#19875)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-06-20 11:16:48 +00:00 |
|