Isotr0py
edf309ebbe
[VLM] Support multimodal inputs for Florence-2 models ( #13320 )
2025-02-27 02:06:41 -08:00
Michael Goin
788f284b53
Fix test_block_fp8.py test for MoE ( #13915 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-02-27 18:00:00 +08:00
Mark McLoughlin
cd711c48b2
[V1][Metrics] Handle preemptions ( #13169 )
2025-02-26 20:04:59 -08:00
Rui Qiao
c9944acbf9
[misc] Rename Ray ADAG to Compiled Graph ( #13928 )
2025-02-26 20:03:28 -08:00
Lucas Wilkinson
f95903909f
[Kernel] FlashMLA integration ( #13747 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-02-27 10:35:08 +08:00
Wallas Henrique
4cb6fa0a9c
[Bugfix] Backend option to disable xgrammar any_whitespace ( #12744 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
2025-02-26 10:52:34 -08:00
Cyrus Leung
934bb99c71
[Bugfix] Update expected token counts for Ultravox tests ( #13895 )
2025-02-26 04:56:50 -08:00
Joe Runde
3f808cc044
[Bugfix] Do not crash V0 engine on input errors ( #13101 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-02-26 19:07:29 +08:00
Florian Greinacher
215bf150a6
[Bugfix] Handle None parameters in Mistral function calls. ( #13786 )
2025-02-26 03:06:21 -08:00
Cyrus Leung
7b700ec8c8
[Bugfix] Add test example for Ultravox v0.5 ( #13890 )
2025-02-26 02:31:43 -08:00
Roger Wang
7ca1da020f
[Misc] Fix input processing for Ultravox ( #13871 )
2025-02-25 23:56:34 -08:00
Jee Jee Li
5157338ed9
[Misc] Improve LoRA spelling ( #13831 )
2025-02-25 23:43:01 -08:00
Harry Mellor
145944cb94
Improve pipeline partitioning ( #13839 )
2025-02-25 18:53:56 -08:00
Lily Liu
5629f26df7
[V1][Spec Decode] Change Spec Decode Rejection Sampling API ( #13729 )
2025-02-25 18:14:48 -08:00
Michael Goin
07c4353057
[Model] Support Grok1 ( #13795 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-02-26 01:07:12 +00:00
Harry Mellor
34e3494e70
Fix failing MyGemma2Embedding test ( #13820 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-25 12:33:03 -08:00
Liangfu Chen
f75aa72732
[Neuron] Add custom_ops for neuron backend ( #13246 )
...
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: George Novack <gnovack@amazon.com>
Co-authored-by: Aoyu Zhang <aoyuzhan@amazon.com>
2025-02-25 11:47:49 -08:00
Jee Jee Li
37b6cb4985
[CI/Build] Fix V1 LoRA failure ( #13767 )
2025-02-25 02:01:15 -08:00
Gregory Shtrasberg
aabeb2688f
[ROCm][Quantization][Kernel] Using HIP FP8 header ( #12593 )
2025-02-25 00:39:59 -08:00
Varun Sundar Rabindranath
03f48b3db6
[Core] LoRA V1 - Add add/pin/list/remove_lora functions ( #13705 )
2025-02-25 00:18:02 -08:00
Harry Mellor
cdc1fa12eb
Remove unused kwargs from model definitions ( #13555 )
2025-02-24 17:13:52 -08:00
afeldman-nm
befc402d34
[V1] V1 engine implements parallel sampling (AsyncLLM and LLMEngine) ( #10980 )
...
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-02-24 08:29:41 -08:00
Jongseok Park
781096e385
Expert Parallelism (EP) Support for DeepSeek V2 ( #12583 )
2025-02-24 07:33:20 -08:00
Kevin H. Luu
f90a375593
[ci] Add logic to change model to S3 path only when S3 CI env var is on ( #13727 )
...
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-63-253.us-west-2.compute.internal>
2025-02-24 06:32:11 +00:00
Nick Hill
cbae7af552
[V1][BugFix] Fix engine core client shutdown hangs ( #13298 )
...
Even though ZMQ context.destroy() is meant to close open sockets before terminating the context, it appears to be necessary to do this explicitly or else it can hang in the context.term() method.
Close zmq sockets explicitly before terminating context, make shutdown of client resource more robust, shut down engine core process prior to terminating zmq context.
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-02-23 13:07:43 -08:00
youkaichao
eb24dc4a45
[v1] torchrun compatibility ( #13642 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-23 22:47:24 +08:00
Isotr0py
ba5106e519
[LMM] Implement merged multimodal processor for whisper ( #13278 )
2025-02-23 01:46:03 -08:00
Kevin H. Luu
2c5e637b57
[ci] Use env var to control whether to use S3 bucket in CI ( #13634 )
2025-02-22 19:19:45 -08:00
Sage Moore
558db8083c
[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no longer has to pass in the context lengths ( #13095 )
2025-02-22 05:25:41 -08:00
Kaixi Hou
e109e598c7
[NVIDIA] Support nvfp4 cutlass gemm ( #13571 )
2025-02-22 05:24:05 -08:00
Keyun Tong
8db1b9d0a1
Support SSL Key Rotation in HTTP Server ( #13495 )
2025-02-22 05:17:44 -08:00
Jee Jee Li
105b8ce4c0
[Misc] Reduce LoRA-related static variable ( #13166 )
2025-02-22 00:21:30 -08:00
Mark McLoughlin
2cb8c1540e
[Metrics] Add --show-hidden-metrics-for-version CLI arg ( #13295 )
2025-02-22 00:20:45 -08:00
Mark McLoughlin
1cd981da4f
[V1][Metrics] Support vllm:cache_config_info ( #13299 )
2025-02-22 00:20:00 -08:00
Lu Fang
bb78fb318e
[v1] Support allowed_token_ids in v1 Sampler ( #13210 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-02-22 14:13:05 +08:00
Keyun Tong
0ffdf8ce0c
[HTTP Server] Make model param optional in request ( #13568 )
2025-02-21 21:55:50 -08:00
Lucas Wilkinson
288cc6c234
[Attention] MLA with chunked prefill ( #12639 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Patrick Horn <patrick.horn@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-02-21 15:30:12 -08:00
Kevin H. Luu
34ad27fe83
[ci] Fix metrics test model path ( #13635 )
2025-02-20 22:12:10 -08:00
Gabriel Marinho
1c3c975766
[FEATURE] Enables /score endpoint for embedding models ( #12846 )
2025-02-20 22:09:47 -08:00
Lingfan Yu
33170081f1
[Neuron][Kernel] Vectorize KV cache load in FlashPagedAttention to maximize DMA bandwidth ( #13245 )
...
Signed-off-by: Lingfan Yu <lingfany@amazon.com>
2025-02-20 17:45:45 -08:00
Joe Runde
bfbc0b32c6
[Frontend] Add backend-specific options for guided decoding ( #13505 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-02-20 15:07:58 -05:00
Harry Mellor
992e5c3d34
Merge similar examples in offline_inference into single basic example ( #12737 )
2025-02-20 04:53:51 -08:00
Kevin H. Luu
a64a84433d
[2/n][ci] S3: Use full model path ( #13564 )
...
Signed-off-by: <>
2025-02-20 01:20:15 -08:00
Kevin H. Luu
aa1e62d0db
[ci] Fix spec decode test ( #13600 )
2025-02-20 16:56:00 +08:00
youkaichao
ba81163997
[core] add sleep and wake up endpoint and v1 support ( #12987 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: cennn <2523403608@qq.com>
Co-authored-by: cennn <2523403608@qq.com>
2025-02-20 12:41:17 +08:00
Jee Jee Li
512368e34a
[Misc] Qwen2.5 VL support LoRA ( #13261 )
2025-02-19 18:37:55 -08:00
Kevin H. Luu
473f51cfd9
[3/n][CI] Load Quantization test models with S3 ( #13570 )
...
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>
2025-02-20 10:12:30 +08:00
Cyrus Leung
377d10bd14
[VLM][Bugfix] Pass processor kwargs properly on init ( #13516 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-02-19 13:13:50 +00:00
Yannick Schnider
423330263b
[Feature] Pluggable platform-specific scheduler ( #13161 )
...
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
2025-02-19 17:16:38 +08:00
Nick Hill
caf7ff4456
[V1][Core] Generic mechanism for handling engine utility ( #13060 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-02-19 17:09:22 +08:00