Lukas Geiger
|
c3649e4fee
|
[Docs] Fix syntax highlighting of shell commands (#19870)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-06-23 17:59:09 +00:00 |
|
Reid
|
53243e5c42
|
[doc] improve readability for long commands (#19920)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-23 14:27:07 +00:00 |
|
Jee Jee Li
|
a6e6604d32
|
[Bugfix] Fix CI bitsandbytes failure (#19969)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-23 21:30:55 +08:00 |
|
Reid
|
b82e0f82cb
|
[doc] use MkDocs collapsible blocks - supplement (#19973)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-23 10:54:16 +00:00 |
|
Isotr0py
|
5111642a6f
|
[Doc] Update V1 status for decoder-only embedding models (#19952)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-23 09:31:06 +00:00 |
|
lkchen
|
1bcd15edc7
|
[BugFix][P/D] Fix for cases where _recving_transfers can be cleaned up when *all* transfer done (#19874)
Signed-off-by: Linkun Chen <github@lkchen.net>
|
2025-06-22 22:41:53 -07:00 |
|
Nicolò Lucchesi
|
2ebff5b77c
|
[P/D][NixlConnector] Support tp_size > num_kv_heads deployments (#19691)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-06-22 22:41:50 -07:00 |
|
Reid
|
f17aec0d63
|
[doc] Fold long code blocks to improve readability (#19926)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-23 05:24:23 +00:00 |
|
Vensen
|
493c275352
|
Fix(models/siglip): Add compatibility for Gemma models quantized by llm-compressor (#19643)
Signed-off-by: Vensenmu <vensenmu@gmail.com>
|
2025-06-23 03:40:28 +00:00 |
|
jinqinn
|
f39ab2d4bd
|
[Misc] Configurable timeout for execute_model RPC calls via env var (#19544)
Signed-off-by: jinqinn <goodqinjin@163.com>
|
2025-06-22 20:36:26 -07:00 |
|
amit
|
4a0f7888a3
|
[Core] feat: Implement Priority Scheduling in V1 Engine (#19057)
Signed-off-by: amit <amit.man@gmail.com>
Co-authored-by: Roger Wang <Rogerw0108@gmail.com>
|
2025-06-22 20:18:08 -07:00 |
|
Aaron Pham
|
c4cf260677
|
[Perf][CLI] Improve overall startup time (#19941)
|
2025-06-22 23:11:22 +00:00 |
|
Ye (Charlotte) Qi
|
33d51f599e
|
[BugFix] Add an env to disable moe chunking to work around compile incompatibility (#19642)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-06-22 15:17:49 -07:00 |
|
Aaron Pham
|
e91386cde1
|
[Chore] dedup logs (#19955)
|
2025-06-22 19:43:07 +00:00 |
|
Ye (Charlotte) Qi
|
2c11a29f0b
|
[Misc] Simplify vllm bench cli subcommand implementation (#19948)
|
2025-06-22 12:34:48 -04:00 |
|
Roger Wang
|
c76a506bd6
|
[Misc] Update model-specific PR tagging (#19949)
Signed-off-by: Roger Wang <hey@rogerw.me>
|
2025-06-22 12:16:08 +00:00 |
|
Reid
|
ec0db6f51c
|
[doc] use snippets for contact us (#19944)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-22 10:26:13 +00:00 |
|
22quinn
|
c305a2109d
|
[CI/Build] Auto tag perf benchmarks related PRs (#19943)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-22 08:46:21 +00:00 |
|
Wang, Yi
|
202c5df935
|
[Benchmark] fix request loss if "ping" is returned (#19535)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-06-22 07:21:04 +00:00 |
|
Ning Xie
|
2bb246b8f7
|
[MISC] add cpu_kvcache_space_bytes to CacheConfig (#19812)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-22 13:39:09 +08:00 |
|
Ning Xie
|
4c409cabc2
|
[Misc] add vllm_config in __init__ (#19866)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-21 23:10:46 -04:00 |
|
Adrian
|
3b1e4c6a23
|
[Docs] Add GPT2ForSequenceClassification to supported models in docs (#19932)
Signed-off-by: nie3e <adrcwiek@gmail.com>
|
2025-06-21 20:57:19 +00:00 |
|
Woosuk Kwon
|
2c5302fadd
|
[Multimodal] Optimize Qwen2/2.5-VL startup time (#19756)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-06-21 20:01:07 +00:00 |
|
Reid
|
caa680fd2e
|
[doc] add contact us in community (#19922)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-21 17:29:06 +00:00 |
|
汪志鹏
|
c3bf9bad11
|
[New model support]Support Tarsier2 (#19887)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-06-21 04:01:51 +00:00 |
|
Isotr0py
|
6f170f11dd
|
[Bugfix] Fix bnb 8bit model weights loading (#19917)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-21 03:29:09 +00:00 |
|
Rabin Adhikari
|
8ca81bb069
|
Fix: Check the type of params to be a Sequence not list. (#19910)
Signed-off-by: Rabin Adhikari <rabin.adk1@gmail.com>
|
2025-06-20 23:03:17 +00:00 |
|
wangxiyuan
|
e773a9e1c2
|
[Misc] Clean up useless code (#19889)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-06-20 21:09:09 +00:00 |
|
Ning Xie
|
71baf85ae1
|
[Kernel] mark TorchSDPABackend swap_blocks NotImplementedError (#19749)
|
2025-06-20 18:18:11 +00:00 |
|
Li, Jiang
|
79f2f1c2a1
|
[CPU][CI] Fallback sliding window to v0 and fix CPU pooling model tests (#19901)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-06-20 15:30:36 +00:00 |
|
Vlad Tiberiu Mihailescu
|
2e3e3c86dc
|
Export NaNs in logits to scheduler_stats if output is corrupted (#18777)
Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com>
|
2025-06-20 22:47:16 +08:00 |
|
Chendi.Xue
|
7e8977fcd4
|
[custom_op][vllm-plugin] update custom_op class to use op_registry (#19164)
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
|
2025-06-20 07:44:56 -07:00 |
|
Adrian
|
f1e840e842
|
[Model] GPT2ForSequenceClassification model (#19663)
Signed-off-by: nie3e <adrcwiek@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-06-20 12:07:41 +00:00 |
|
Thomas Parnell
|
7771d1de88
|
[Fix] import regex instead of re (#19875)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-06-20 11:16:48 +00:00 |
|
Ning Xie
|
71d1219545
|
[Kernel] correct cpu worker function parameter type (#19745)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-20 10:50:13 +00:00 |
|
Reid
|
e384f2f108
|
[Misc] refactor example - openai_transcription_client (#19851)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-20 08:02:21 +00:00 |
|
Reid
|
089a306f19
|
[Misc] update cuda version (#19526)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-20 07:25:15 +00:00 |
|
kourosh hakhamaneshi
|
5e666f72cd
|
[Bugfix][Ray] Set the cuda context eagerly in the ray worker (#19583)
|
2025-06-19 22:01:16 -07:00 |
|
qli88
|
e3a3e4db46
|
[Bugfix] Enable PP with AITER+V1 (#19822)
Signed-off-by: Qiang Li <qiang.li2@amd.com>
|
2025-06-20 12:43:20 +08:00 |
|
Xerxes
|
e41bf15cd0
|
[Chore]: qwen3-moe-type-hints-mistake (#19860)
Co-authored-by: xinnan.hou <hxn02029096@alibaba-inc.com>
|
2025-06-19 21:43:07 -07:00 |
|
Brayden Zhong
|
5aa4a015ce
|
[Benchmark] Fix Value of type "SampleRequest" is not indexable (#18032)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-06-19 21:28:55 -07:00 |
|
Elaine Zhao
|
b6bad3d186
|
[CI][Neuron] Fail and exit on first error (#19622)
Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-06-20 12:27:51 +08:00 |
|
Isotr0py
|
ee9a1531aa
|
[CI/Build][Bugfix] Fix deadlock on v1 engine test CI (#19872)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-20 09:51:07 +08:00 |
|
Robert Shaw
|
10d82f9ac5
|
[Benchmark][Bugfix] Fix Dataset Length Calculation (#19868)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-06-19 18:30:41 -07:00 |
|
xzbdmw
|
ea10dd9d9e
|
[Frontend] early return chat format resolution when specified (#19735)
|
2025-06-19 18:49:59 +00:00 |
|
Alex Brooks
|
ead2110297
|
[Core][Bugfix] Fix Online MM Beam Search (#19688)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-06-19 17:18:07 +00:00 |
|
Li, Jiang
|
01220ce89a
|
[CI][CPU] Improve dummy Triton interfaces and fix the CPU CI (#19838)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-06-19 15:46:09 +00:00 |
|
22quinn
|
6f68c49220
|
[Doc] Update V1 user guide for embedding models (#19842)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-19 09:43:27 +00:00 |
|
Alexei-V-Ivanov-AMD
|
4719460644
|
Fixing Chunked Prefill Test. (#19762)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2025-06-19 01:36:16 -07:00 |
|
NekoMimiUnagi
|
466166dcfd
|
[Frontend] Add optional token-level progress bar to LLM.beam_search (#19301)
Signed-off-by: Ruosen Li <rxl190028@utdallas.edu>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Ubuntu <ubuntu@ip-172-31-71-179.ec2.internal>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-19 03:21:41 -04:00 |
|