Reid
|
435fa95444
|
[Frontend] add run batch to CLI (#18804)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-28 07:08:57 -07:00 |
|
Harry Mellor
|
4c2b38ce9e
|
Enable Pydantic mypy checks and convert configs to Pydantic dataclasses (#17599)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-28 12:46:04 +00:00 |
|
Mengqing Cao
|
d781930f90
|
[Platform][Dist] Make torch distributed process group extendable (#18763)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-05-28 10:52:34 +00:00 |
|
Lucas Wilkinson
|
ce75efeecb
|
[BugFix] FA2 MLA Accuracy Issue (#18807)
Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com>
|
2025-05-28 08:59:39 +00:00 |
|
Richard Zou
|
aa42561e40
|
Fix PiecewiseCompileInterpreter (#17338)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-05-28 08:40:53 +00:00 |
|
wang.yuqi
|
de65fc8e1e
|
[CI] improve embed testing (#18747)
|
2025-05-28 00:16:35 -07:00 |
|
Cyrus Leung
|
0c492b7824
|
[Deprecation] Remove fallbacks for Embeddings API (#18795)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-28 15:09:04 +08:00 |
|
Cyrus Leung
|
0f0926b43f
|
[Deprecation] Remove unused sync methods in async_timeout (#18792)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-28 15:08:48 +08:00 |
|
Cyrus Leung
|
7f2c1a87e9
|
[Deprecation] Require overriding get_dummy_text and get_dummy_mm_data (#18796)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-28 15:08:35 +08:00 |
|
RonaldBXu
|
5e13c07d00
|
[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (2) (#18781)
Signed-off-by: Ronald Xu <ronaldxu@amazon.com>
|
2025-05-28 05:09:14 +00:00 |
|
Divakar Verma
|
774c5fde30
|
[V1] fix torch profiling for V1 offline scenarios (#18445)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-05-28 04:16:30 +00:00 |
|
Guillaume Calmettes
|
9a21e331ff
|
[Bugfix]: correctly propagate errors message caught at the chat_templating step to the client (#18769)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2025-05-28 03:35:43 +00:00 |
|
wang.yuqi
|
3e9ce609bd
|
[Bugfix] Fix nomic max_model_len (#18755)
|
2025-05-27 20:29:53 -07:00 |
|
fxmarty-amd
|
794ae1f551
|
[rocm] Fix wrong attention log (#18764)
Signed-off-by: Felix Marty <felmarty@amd.com>
|
2025-05-27 19:45:41 -07:00 |
|
Lukas Geiger
|
d73a9457a5
|
[Core] Improve Tensor serialisation (#18774)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-05-28 09:46:21 +08:00 |
|
cascade
|
51e98e4ffd
|
[Bugfix] Disable prefix caching by default for benchmark (#18771)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-05-28 08:18:09 +08:00 |
|
Michael Goin
|
e56f44d9ec
|
Support datasets in vllm bench serve and sync with benchmark_[serving,datasets].py (#18566)
|
2025-05-27 19:59:48 -04:00 |
|
Satyajith Chilappagari
|
e0cbad4e30
|
[Neuron] Support quantization on neuron (#18283)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
|
2025-05-27 22:10:33 +00:00 |
|
Michael Goin
|
5873877241
|
[Bugfix] Mistral tool calling when content is list (#18729)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-27 09:05:37 -07:00 |
|
Cyrus Leung
|
696259ca01
|
[Core] Automatically cast multi-modal input dtype (#18756)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-27 23:45:48 +08:00 |
|
chunxiaozheng
|
6b6d496114
|
optimize get_kv_cache_torch_dtype (#18531)
Signed-off-by: idellzheng <idellzheng@tencent.com>
|
2025-05-27 13:08:44 +00:00 |
|
cascade
|
aaa4ac1c95
|
Disable prefix cache by default for benchmark (#18639)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-05-27 20:06:34 +08:00 |
|
Mark McLoughlin
|
06a0338015
|
[V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-27 09:37:06 +00:00 |
|
Cyrus Leung
|
4318c0559d
|
[CI/Build] Remove imports of built-in re (#18750)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-27 09:19:18 +00:00 |
|
Hyogeun Oh (오효근)
|
a68e293cb9
|
[Doc] Convert Sphinx directives ( {class}, {meth}, {attr}, ...) to MkDocs format for better documentation linking (#18663)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
|
2025-05-27 01:44:20 -07:00 |
|
Shawn Huang
|
6881107948
|
[BUG FIX] minicpm (#18739)
Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com>
Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com>
|
2025-05-27 01:04:49 -07:00 |
|
maobaolong
|
c24b1572ac
|
Minor fix about MooncakeStoreConnector (#18721)
Signed-off-by: baoloongmao <baoloongmao@tencent.com>
|
2025-05-27 08:02:28 +00:00 |
|
almersawi
|
a547aeb828
|
feat(rocm-support): support mamba2 on rocm (#18565)
Signed-off-by: Islam Almersawi <islam.almersawi@openinnovation.ai>
Co-authored-by: Islam Almersawi <islam.almersawi@openinnovation.ai>
|
2025-05-27 00:07:53 -07:00 |
|
vllmellm
|
d260f799a9
|
[FEAT] [ROCm] Upgrade AITER Fused MoE kernels. (#18271)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-05-26 23:14:07 -07:00 |
|
Lukas Geiger
|
b50602d5f0
|
[Model][Gemma3] Cast image pixel values already on CPU (#18732)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-05-27 05:42:54 +00:00 |
|
Isotr0py
|
1f1b1bc03b
|
[V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-27 04:40:28 +00:00 |
|
Lukas Geiger
|
0eebd74842
|
[Model][Gemma3] Simplify image input validation (#18710)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-05-27 11:13:37 +08:00 |
|
Lukas Geiger
|
e7523c2e03
|
[V1][Sampler] Improve performance of FlashInfer sampling by sampling logits instead of probs (#18608)
|
2025-05-26 11:49:36 -04:00 |
|
Cyrus Leung
|
a869baca73
|
[Bugfix] Fix Llama GGUF initialization (#18717)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-26 07:49:22 -07:00 |
|
Cyrus Leung
|
82e2339b06
|
[Doc] Move examples and further reorganize user guide (#18666)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-26 07:38:04 -07:00 |
|
Naveassaf
|
6d68030f1c
|
[Model] Add support for YARN in NemotronNAS models (#18427)
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
|
2025-05-26 10:31:49 +00:00 |
|
Ning Xie
|
5a2c76cbe1
|
[CI] fix dump_input for str type (#18697)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-05-26 18:23:35 +08:00 |
|
Cyrus Leung
|
61a45e7a72
|
[Bugfix] Fix Mistral-format models with sliding window (#18693)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-26 01:44:04 -07:00 |
|
Maximilien de Bayser
|
561b77a0d6
|
[Bugfix] Fix the lm_head in gpt_bigcode in lora mode (#6357)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
|
2025-05-26 14:52:25 +08:00 |
|
CYJiang
|
abd4030d94
|
refactor: simplify request handler, use positive condition check for handler assignment (#18690)
Signed-off-by: googs1025 <googs1025@gmail.com>
|
2025-05-26 06:32:28 +00:00 |
|
Lukas Geiger
|
6071e989df
|
[Core][Multimodal] Convert PIL Image to array without data copy when hashing (#18682)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-05-25 17:33:35 +00:00 |
|
Cyrus Leung
|
57fd13a707
|
[Bugfix] Fix profiling dummy data for Pixtral (#18677)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-25 14:05:30 +00:00 |
|
Yuqi Zhang
|
f2faac745d
|
[Bugfix] Fix cpu usage and cache hit stats reporting on cpu environment (#18674)
Signed-off-by: zzzyq <zhangyuqi94@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-05-25 02:36:06 -07:00 |
|
Cyrus Leung
|
503f8487c2
|
[Misc] Reduce logs on startup (#18649)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-24 23:03:53 -07:00 |
|
Ning Xie
|
44073a7ac3
|
[BUGFIX] catch subclass first for try...except (#18672)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-05-25 05:34:24 +00:00 |
|
Isotr0py
|
75f81750f3
|
[VLM] Initialize video input support for InternVL models (#18499)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-05-25 04:51:25 +00:00 |
|
Mengqing Cao
|
6ab681bcbe
|
[Misc][ModelScope] Change to use runtime VLLM_USE_MODELSCOPE (#18655)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-05-25 04:51:21 +00:00 |
|
Chenguang Li
|
cebc22f3b6
|
[Misc]Replace cuda hard code with current_platform in Ray (#14668)
Signed-off-by: noemotiovon <757486878@qq.com>
|
2025-05-24 20:26:31 -07:00 |
|
Ning Xie
|
6c6dcd8611
|
[MISC] correct signature for LoaderFunction (#18670)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-05-24 20:17:47 -07:00 |
|
Seiji Eicher
|
7891fdf0c6
|
[V1] Fix _pickle.PicklingError: Can't pickle <class 'transformers_modules.deepseek-ai.DeepSeek-V2-Lite... (#18640)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-05-24 20:07:20 -07:00 |
|