Szymon Ożóg
aa375dca9f
[Bugfix] Missing quant_config in deepseek embedding layer ( #12836 )
2025-02-06 21:35:09 -08:00
ZSL98
433c4a4923
Make vllm compatible with verl ( #12824 )
...
Co-authored-by: zhangshulai <zhangshulai@bytedance.com>
2025-02-07 11:54:20 +08:00
Lucas Wilkinson
ef533d25fb
[Bugfix] FA2 illegal memory access ( #12848 )
2025-02-06 19:54:07 -08:00
Kevin H. Luu
b260782357
[misc] Revert # 12833 ( #12857 )
...
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>
2025-02-06 16:29:12 -08:00
Lu Fang
741429a4cd
[MISC] Check space in the file names in the pre commit checks ( #12804 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-02-06 15:36:21 -08:00
Yu Chin Fabian Lim
aff404571b
Add Bamba Model ( #10909 )
...
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-02-06 15:22:42 -08:00
Varun Sundar Rabindranath
467a96a541
[V1] LoRA Support ( #10957 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-02-06 09:32:51 -08:00
Isotr0py
8108ac841d
[Bugfix] Fix unsupported FA version check for Turing GPU ( #12828 )
2025-02-06 09:18:22 -08:00
Jitse Klomp
afe74f7a96
[Doc] double quote cmake package in build.inc.md ( #12840 )
2025-02-06 09:17:55 -08:00
youkaichao
09b95e36ab
[torch.compile] PyTorch 2.6 and nightly compatibility ( #12393 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-07 01:09:07 +08:00
Isotr0py
85ac82d228
[Kernel] Make rotary_embedding ops more flexible with input shape ( #12777 )
2025-02-06 08:46:13 -08:00
Cyrus Leung
1e57b1ee63
[Misc] Remove unnecessary decode call ( #12833 )
2025-02-06 08:45:44 -08:00
Kevin H. Luu
e152f29502
[misc] Reduce number of config file requests to HuggingFace ( #12797 )
...
Signed-off-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>
2025-02-06 14:59:18 +00:00
Lucas Wilkinson
c786e757fa
[Attention] Use FA3 for MLA on Hopper ( #12807 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-02-06 11:43:12 +00:00
Simon Mo
cefd56ee35
[Docs] Add Google Cloud Slides ( #12814 )
2025-02-06 01:02:38 -08:00
Dipika Sikka
7ca9934fe7
[Misc] Update w2 scale loading for GPTQMarlinMoE ( #12757 )
2025-02-06 01:02:14 -08:00
youkaichao
0408efc6d0
[Misc] Improve error message for incorrect pynvml ( #12809 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
v0.7.2
2025-02-06 15:23:50 +08:00
Michael Goin
449d1bce02
[Misc] Remove duplicated DeepSeek V2/V3 model definition ( #12793 )
2025-02-05 23:16:20 -08:00
Harry Mellor
1a6fcad4c9
Improve TransformersModel UX ( #12785 )
2025-02-05 22:24:57 -08:00
Lu Fang
56534cd577
[Bugfix] Fix the test_ultravox.py's license ( #12806 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-02-06 13:25:54 +08:00
Sumit Vij
d88506dda4
[Model] LoRA Support for Ultravox model ( #11253 )
2025-02-05 19:54:13 -08:00
Lu Fang
9cdea30b4f
[Misc][Easy] Remove the space from the file name
2025-02-05 19:23:35 -08:00
Lucas Wilkinson
76abd0c881
[Bugfix] Better FP8 supported defaults
2025-02-05 19:22:19 -08:00
Gregory Shtrasberg
5b19b93082
[ROCm][Kernel] Using the correct warp_size value
2025-02-05 19:15:08 -08:00
Cyrus Leung
75404d041b
[VLM] Update compatibility with transformers 4.49
2025-02-05 19:09:45 -08:00
Roger Wang
bf3b79efb8
[VLM] Qwen2.5-VL
2025-02-05 13:31:38 -08:00
Russell Bryant
9a5b1554b4
[Docs] Drop duplicate [source] links
2025-02-05 13:30:50 -08:00
Cyrus Leung
a4ce74c14a
[VLM] Use shared field to pass token ids to model
2025-02-05 13:30:46 -08:00
Rahul Tuli
3b2005e1db
Add: Support for Sparse24Bitmask Compressed Models
2025-02-05 13:30:43 -08:00
Sanju C Sudhakaran
af8486de49
[Hardware][Intel-Gaudi] Enable FusedSDPA support for Intel Gaudi (HPU)
2025-02-05 13:29:45 -08:00
Chen Zhang
4c3aac51e1
Merging PR #12536
...
Merged via CLI script
2025-02-05 13:24:26 -08:00
youkaichao
bc1bdecebf
[core][distributed] exact ray placement control ( #12732 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-06 02:03:19 +08:00
Akash kaothalkar
022bcc701a
[Bugfix] Fix 'ModuleNotFoundError: No module named 'intel_extension_for_pytorch'' for --tensor-parallel-size more than 1 ( #12546 )
2025-02-04 23:11:02 -08:00
Michael Goin
c53dc466b1
[Doc] Remove performance warning for auto_awq.md ( #12743 )
2025-02-04 22:43:11 -08:00
Nick Hill
3d09e592a8
[V1][Misc] Shorten FinishReason enum and use constant strings ( #12760 )
2025-02-04 22:43:02 -08:00
Harry Mellor
fcf2e3d7fc
[Bugfix] Fix OpenVINO model runner ( #12750 )
2025-02-04 22:42:46 -08:00
Michael Goin
58b218d7ae
[Doc] Update PR Reminder with link to Developer Slack ( #12748 )
2025-02-04 22:42:09 -08:00
Kyle Sayers
7ff7a638b6
[Model][Quant] Fix GLM, Fix fused module mappings for quantization ( #12634 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2025-02-05 05:32:06 +00:00
Dipika Sikka
686006a220
[Misc] Bump the compressed-tensors version ( #12736 )
2025-02-04 20:44:48 -08:00
Isotr0py
98fd089fc9
[VLM] Add MLA with pure RoPE support for deepseek-vl2 models ( #12729 )
2025-02-04 20:44:26 -08:00
Harry Mellor
249824c3bf
Refactor Linear handling in TransformersModel ( #12727 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-05 04:31:12 +00:00
Aleksandr Malyshev
64862d106e
[ROCM][AMD][TRITON] Halving warps number for fw_prefill to reduce spilling ( #12713 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
2025-02-05 03:58:22 +00:00
Aviv Keshet
b3a0d01e45
[Core] add and implement VLLM_LOGITS_PROCESSOR_THREADS ( #12368 )
...
Signed-off-by: Aviv Keshet <akeshet@scaledcognition.com>
2025-02-04 18:46:26 -08:00
Lucas Wilkinson
75e94309e8
[Perf] Mem align KV caches for CUDA devices (MLA perf improvement) ( #12676 )
...
Signed-off-by: simon-mo <xmo@berkeley.edu>
Signed-off-by: Lucas Wilkinson <lcwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
2025-02-04 18:22:24 -08:00
Mark McLoughlin
233df6f5c4
[V1][Metrics] Add request_success_total counter, labelled with finish reason ( #12579 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-02-04 19:46:54 -05:00
Cyrus Leung
18016a5e62
[Bugfix] Fix CI failures for InternVL and Mantis models ( #12728 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-02-04 23:54:23 +08:00
Sophie du Couédic
649550f27e
[Build] update requirements of no-device for plugin usage ( #12630 )
...
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
2025-02-04 21:19:12 +08:00
Kero Liang
62467a834a
Avoid unnecessary multi-modal input data copy when len(batch) == 1 ( #12722 )
...
Signed-off-by: imkero <kerorek@outlook.com>
2025-02-04 21:03:19 +08:00
Michael Greenbaum
6469038b14
[Bugfix] Fix loading of fine-tuned models based on Phi-3-Small ( #12689 )
...
Signed-off-by: Michael Greenbaum <mgreenbaum@microsoft.com>
Co-authored-by: Michael Greenbaum <mgreenbaum@microsoft.com>
2025-02-04 20:58:48 +08:00
Isotr0py
815079de8e
[VLM] merged multimodal processor and V1 support for idefics3 ( #12660 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-02-04 20:00:51 +08:00