Nicolò Lucchesi
|
af35d3a3cc
|
[TPU][V1][Bugfix] Fix chunked prefill with padding (#15037)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-18 07:34:45 -07:00 |
|
Simon Mo
|
3b457143d2
|
[Bugfix] Register serializers for V0 MQ Engine (#15009)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-03-18 09:14:47 -04:00 |
|
Cyrus Leung
|
ab656f2c2f
|
[Bugfix] Loosen type check to avoid errors in V1 (#15021)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-18 12:54:40 +00:00 |
|
Serena
|
64fc2193dc
|
[Misc][Docs] fix the comments of KV_T and CACHE_T in CALL_RESHAPE_AND_CACHE_XX macros (#14347)
|
2025-03-18 05:50:19 -07:00 |
|
Sebastian Schoennenbeck
|
dd732028f5
|
[Bugfix][Frontend] Fix validation of logprobs in ChatCompletionRequest (#14352)
Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>
|
2025-03-18 05:50:05 -07:00 |
|
hoshi-hiyouga
|
414919138b
|
[Bugfix] torchrun compatibility (#14899)
Signed-off-by: hiyouga <hiyouga@buaa.edu.cn>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-03-18 05:49:27 -07:00 |
|
Jee Jee Li
|
db7c8ca910
|
[Misc] Embedding model support LoRA (#14935)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-18 12:07:00 +00:00 |
|
Patrick von Platen
|
f863ffc965
|
[Mistral-Small 3.1] Update docs and tests (#14977)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-18 03:29:42 -07:00 |
|
Varun Sundar Rabindranath
|
400d483e87
|
[Kernels] LoRA - Retire SGMV and BGMV Kernels (#14685)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-18 09:47:53 +00:00 |
|
Shanshan Shen
|
d1695758b2
|
[Doc][V1] Fix V1 APC doc (#14920)
|
2025-03-18 08:15:46 +00:00 |
|
Liangfu Chen
|
53a0cf8b95
|
[Neuron] trim attention kernel tests to fit trn1.2x instance (#14988)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
|
2025-03-18 15:05:52 +08:00 |
|
Tristan Leclercq
|
5eeabc2a44
|
[Bugfix] Fix bnb quantization for models with both HF-format and Mistral-format weights (#14950)
|
2025-03-17 23:27:26 +00:00 |
|
Alexander Matveev
|
18551e820c
|
[V1] TPU - Fix CI/CD runner (#14974)
|
2025-03-17 21:07:07 +00:00 |
|
Robert Shaw
|
e41e160263
|
[V1] Guard Against Main Thread Usage (#14972)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-03-17 13:23:02 -07:00 |
|
Cyrus Leung
|
b89fb2a4a1
|
[CI/Build] Use AutoModelForImageTextToText to load VLMs in tests (#14945)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 18:35:17 +00:00 |
|
Roger Wang
|
5340b0e221
|
[Bugfix] Fix interface for Olmo2 on V1 (#14976)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-17 11:26:38 -07:00 |
|
Roger Wang
|
37e3806132
|
[Bugfix] Make Gemma3 MM V0 only for now (#14971)
Signed-off-by: Roger Wang <ywang@roblox.com>
v0.8.0rc2
|
2025-03-17 10:04:21 -07:00 |
|
Aaron Pham
|
c0efdd655b
|
[Fix][Structured Output] using vocab_size to construct matcher (#14868)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-03-17 11:42:45 -04:00 |
|
Quentin
|
aaaec52ad9
|
[Bugfix][Model] Mixtral: use unused head_dim config argument (#14961)
Signed-off-by: Quentin Torroba <quentin.torroba@mistral.ai>
|
2025-03-17 07:44:18 -07:00 |
|
Tyler Michael Smith
|
e1eb45d397
|
[Bugfix] Fix precommit - line too long in pixtral.py (#14960)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 07:18:50 -07:00 |
|
Simon Mo
|
89fca671fb
|
[V1] Default MLA to V1 (#14921)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-03-17 06:54:40 -07:00 |
|
Patrick von Platen
|
d20b0c139c
|
Add patch merger (#14957)
|
2025-03-17 06:47:50 -07:00 |
|
Cyrus Leung
|
166a168b0f
|
[Doc] Fix misleading log during multi-modal profiling (#14955)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 06:14:32 -07:00 |
|
vllmellm
|
2bb0e1a799
|
[Bugfix][ROCm] running new process using spawn method for rocm in tests. (#14810)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-03-17 11:33:35 +00:00 |
|
Cyrus Leung
|
6eaf1e5c52
|
[Misc] Add --seed option to offline multi-modal examples (#14934)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 03:00:17 -07:00 |
|
Cyrus Leung
|
868a8c5b2c
|
[Bugfix] Fix Ultravox on V1 (#14929)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 17:15:20 +08:00 |
|
iefgnoix
|
b4ad56c1bd
|
[V1][TPU] Apply the ragged paged attention kernel fix and remove the padding. (#14846)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-03-17 01:48:28 -07:00 |
|
kushanam
|
69698f257e
|
fix minor miscalled method (#14327)
|
2025-03-17 01:47:58 -07:00 |
|
Lu Fang
|
cd0cd85102
|
[MISC] More AMD unused var clean up (#14926)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-03-17 16:40:41 +08:00 |
|
Russell Bryant
|
0a74bfce9c
|
setup.py: drop assumption about local main branch (#14692)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-17 01:37:42 -07:00 |
|
Chen Zhang
|
dd3b865854
|
[Doc] Add vLLM Beijing meetup slide (#14938)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-03-17 16:29:36 +08:00 |
|
Yan Ma
|
9b87a579aa
|
[Misc][XPU] Use None as device capacity for XPU (#14932)
Signed-off-by: yan ma <yan.ma@intel.com>
|
2025-03-17 01:22:14 -07:00 |
|
Cyrus Leung
|
b539222d4e
|
[V1] Remove input cache client (#14864)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-16 23:42:06 -07:00 |
|
Lily Liu
|
8d6cf89526
|
[V1] [Spec Decode] Support random sampling for spec decode (#13933)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
v0.8.0rc1
|
2025-03-16 22:00:20 -07:00 |
|
Simon Mo
|
583a9778e0
|
[Benchmark] Do not save detailed info to json by default (#14879)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-03-16 21:48:11 -07:00 |
|
Sibi
|
a73e183e36
|
[Misc] Replace os environ to monkeypatch in test suite (#14516)
Signed-off-by: sibi <85477603+t-sibiraj@users.noreply.github.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-03-16 20:35:57 -07:00 |
|
Lucas Wilkinson
|
1e799b7ec1
|
[BugFix] Fix MLA + V1 + TP==1 causing reinitialization of cuda context (#14910)
|
2025-03-17 03:35:37 +00:00 |
|
Woosuk Kwon
|
7f6c5ee06c
|
[V1][Minor] Add __repr__ to ConstantList (#14907)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-16 20:20:15 -07:00 |
|
Woosuk Kwon
|
faa0275730
|
[V1] Optimize the overhead of rewinding (#14905)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-16 20:19:30 -07:00 |
|
Cyrus Leung
|
8a5a9b70d7
|
[CI/Build] Update defaults for test reproducibility (#14893)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 10:38:15 +08:00 |
|
Robert Shaw
|
bb3aeddfaf
|
[CI] Nightly Tests (#14898)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2025-03-17 02:06:43 +00:00 |
|
Robert Shaw
|
aecc780dba
|
[V1] Enable Entrypoints Tests (#14903)
|
2025-03-16 17:56:16 -07:00 |
|
Vadim Gimpelson
|
90df7f23aa
|
[Doc] Add guidance for using ccache with pip install -e . in doc (#14901)
|
2025-03-16 23:10:04 +00:00 |
|
Rui Qiao
|
b9b5bdfc7d
|
[Misc] Catching Ray Compiled Graph PP test failures for V1 (#14847)
|
2025-03-16 15:46:42 -07:00 |
|
Woosuk Kwon
|
31060b2757
|
[V1][BugFix] Detect interleaved sliding window attention (#14896)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-16 14:53:53 -07:00 |
|
Nick Hill
|
fc1f67715d
|
[BugFix][V1] Fix overhead related to bad_words sampling when not in use (#14894)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-16 14:53:34 -07:00 |
|
Cyrus Leung
|
f6137adbcb
|
Revert "[Bugfix] Limit profiling run sequence length by max_model_len (#14785) (#14892)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-16 09:13:46 -07:00 |
|
Cyrus Leung
|
e53b1350f2
|
[Bugfix] Explicitly disable Phi-4-multimodal in V1 (#14889)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-16 09:05:40 -07:00 |
|
Kyle Sayers
|
d30aa7e9e6
|
[Bugfix] Limit profiling run sequence length by max_model_len (#14785)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-03-16 07:44:19 -07:00 |
|
Lily Liu
|
d1ad2a57af
|
[V1] [Spec Decode] Fix ngram tests (#14878)
|
2025-03-16 00:29:22 -07:00 |
|