Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y ( #26633 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Rahul Tuli
05f6846ede
Support llama3 eagle3 head with llama4 verifier ( #25961 )
...
Signed-off-by: rahul-tuli <rtuli@redhat.com>
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
2025-10-06 13:56:08 -04:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort ( #26247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Rahul Tuli
145ac73317
[Bugfix][Speculative Decoding] Fix Eagle3 quantization config issue ( #25883 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
2025-09-29 11:37:20 -04:00
Tyler Michael Smith
a5354b3ed2
[Bugfix][WideEP] Apply TP Attn + EP MoE fix to other models ( #24982 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2025-09-27 14:22:28 +00:00
Woosuk Kwon
1c3ffdbecc
[V0 Deprecation] Remove V0 sampling metadata ( #25345 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-21 10:37:11 -07:00
Alexandre Marques
5931b7e5d9
[Models][Quantization] Add quantization configuration update in Voxtral model ( #24122 )
...
Signed-off-by: Alexandre Marques <almarque@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-10 19:13:56 -07:00
Wenlong Wang
53b42f4102
[BugFix][Spec Decode] Fix out-of-range index triggered by eagle3; re-enable test for LlamaForCausalLMEagle3 ( #24392 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
2025-09-09 21:24:23 -07:00
Lukas Geiger
de533ab2a1
[Models] Improve iteration over layers ( #19497 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-08-29 09:26:34 +08:00
Cyrus Leung
7d67a9d9f9
[mypy] Fix incorrect type hint for EAGLE3 support ( #23617 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-25 23:50:17 -07:00
Chen Zhang
17373dcd93
[Attention] Refactor AttentionMetadata Preparation for Encoder-only Models ( #23154 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-22 05:05:59 +00:00
Rahul Tuli
5a4b4b3729
Add: SupportsEagle3 interface for explicit EAGLE3 support ( #22642 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
2025-08-12 09:24:52 -07:00
Harry Mellor
c49848396d
Refactor sliding window configuration to Transformers best practice ( #21927 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-09 20:50:48 -07:00
wang.yuqi
ca4eb82bcb
[Model] Re-add the implicit conversion feature for as_seq_cls_model ( #21103 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-07-18 07:15:07 +00:00
Minkyu Kim
bd4c1e6fdb
Support for LlamaForSequenceClassification ( #20807 )
...
Signed-off-by: thechaos16 <thechaos16@gmail.com>
2025-07-13 00:09:34 -07:00
Patrick von Platen
14601f5fba
[Config] Refactor mistral configs ( #20570 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
2025-07-07 15:25:10 -07:00
Simon Mo
02f0c7b220
[Misc] Add SPDX-FileCopyrightText ( #19100 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Cyrus Leung
a869baca73
[Bugfix] Fix Llama GGUF initialization ( #18717 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 07:49:22 -07:00
Naveassaf
6d68030f1c
[Model] Add support for YARN in NemotronNAS models ( #18427 )
...
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
2025-05-26 10:31:49 +00:00
Harry Mellor
26d0419309
Update deprecated type hinting in models ( #18132 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-14 22:06:50 -07:00
Cyrus Leung
d62a076e84
[Model] GritLM supports other attention backends ( #18109 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-14 03:33:19 -07:00
Cyrus Leung
afb4429b4f
[CI/Build] Reorganize models tests ( #17459 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-30 23:03:08 -07:00
Wanrui Dai
7fcc4223dc
[Minor][Models] Pass partial_rotary_factor parameter to rope ( #17266 )
...
Signed-off-by: evian <eviantai@u.nus.edu>
Co-authored-by: evian <eviantai@u.nus.edu>
2025-04-28 04:28:59 +00:00
Woosuk Kwon
b278911229
[Minor][Models] Fix Return Types of Llama & Eagle ( #17220 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-04-25 21:54:47 -07:00
Benjamin Chislett
a0e619e62a
[V1][Spec Decode] EAGLE-3 Support ( #16937 )
...
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Co-authored-by: Bryan Lu <yuzhelu@amazon.com>
2025-04-25 15:43:07 -07:00
Woosuk Kwon
b411418ff0
[Chore] Remove Sampler from Model Code ( #17084 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-04-24 02:49:33 -07:00
Lu Fang
55dcce91df
Upstream Llama4 Support to Main ( #16113 )
...
Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com>
Signed-off-by: Chris Thi <chris.c.thi@gmail.com>
Signed-off-by: drisspg <drisspguessous@gmail.com>
Signed-off-by: Jon Swenson <jmswen@gmail.com>
Signed-off-by: Keyun Tong <tongkeyun@gmail.com>
Signed-off-by: Lu Fang <fanglu@meta.com>
Signed-off-by: Xiaodong Wang <xdwang@meta.com>
Signed-off-by: Yang Chen <yangche@fb.com>
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Signed-off-by: Lu Fang <lufang@fb.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-07 08:06:27 -07:00
Tyler Michael Smith
4f5b059f14
Clean up unused padding_idx variables across many model definitions ( #13240 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-03-04 21:27:00 +00:00
Harry Mellor
cdc1fa12eb
Remove unused kwargs from model definitions ( #13555 )
2025-02-24 17:13:52 -08:00
Jee Jee Li
105b8ce4c0
[Misc] Reduce LoRA-related static variable ( #13166 )
2025-02-22 00:21:30 -08:00
Patrick von Platen
d366ccc4e3
[RFC] [Mistral] FP8 format ( #10130 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-02-08 14:12:53 -07:00
Amit Garg
538fab93cd
PR #12718 ( #12718 )
2025-02-07 06:22:37 -08:00
Russell Bryant
e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files ( #12628 )
...
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**
commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com>
Date: Fri Jan 31 14:18:24 2025 -0500
Add SPDX license headers to python source files
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
also be easily used by tools to help manage license compliance.
The Linux Foundation runs license scans against the codebase to help
ensure
we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
More information can be found on the SPDX site:
- https://spdx.dev/learn/handling-license-info/
Signed-off-by: Russell Bryant <rbryant@redhat.com>
commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com>
Date: Fri Jan 31 14:36:32 2025 -0500
Check for SPDX headers using pre-commit
Signed-off-by: Russell Bryant <rbryant@redhat.com>
---------
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-02 11:58:18 -08:00
Pavani Majety
b02fd288b2
[Hardware][NV] Fix Modelopt model loading for k-v-scales for Llama models. ( #11787 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2025-01-29 01:46:12 -08:00
Gregory Shtrasberg
e97f802b2d
[FP8][Kernel] Dynamic kv cache scaling factors computation ( #11906 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
2025-01-23 18:04:03 +00:00
Michael Goin
9aa1519f08
Various cosmetic/comment fixes ( #12089 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
2025-01-16 09:59:06 +00:00
kewang-xlnx
de0526f668
[Misc][Quark] Upstream Quark format to VLLM ( #10765 )
...
Signed-off-by: kewang-xlnx <kewang@xilinx.com>
Signed-off-by: kewang2 <kewang2@amd.com>
Co-authored-by: kewang2 <kewang2@amd.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2025-01-15 11:05:15 -05:00
RunningLeon
97eb97b5a4
[Model]: Support internlm3 ( #12037 )
2025-01-15 11:35:17 +00:00
Jee Jee Li
a3a3ee4e6f
[Misc] Merge bitsandbytes_stacked_params_mapping and packed_modules_mapping ( #11924 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-15 07:49:49 +08:00
Isotr0py
d14e98d924
[Model] Support GGUF models newly added in transformers 4.46.0 ( #9685 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-01-13 00:13:44 +00:00
bjmsong
187e32997c
[Bugfix] Change kv scaling factor by param json on nvidia gpu ( #11688 )
...
Signed-off-by: bjmsong <bjmsong@126.com>
Co-authored-by: bjmsong <bjmsong@126.com>
2025-01-02 21:11:39 +00:00
Cyrus Leung
96d673e0f8
[Bugfix] Fix error handling of unsupported sliding window ( #11213 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-15 10:59:42 -07:00
Cyrus Leung
c889d5888b
[Doc] Explicitly state that PP isn't compatible with speculative decoding yet ( #10975 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-07 17:20:49 +00:00
Cyrus Leung
133707123e
[Model] Replace embedding models with pooling adapter ( #10769 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-01 08:02:54 +08:00
Patrick von Platen
e7cfc4ef4c
[Interleaved ATTN] Support for Mistral-8B ( #10591 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-11-30 07:45:50 +00:00
Cyrus Leung
9b4b150395
[Bugfix] Ignore lm_head when loading embedding models ( #10719 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-27 19:05:29 +00:00
shunxing12345
1209261e93
[Model] Support telechat2 ( #10311 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: xiangw2 <xiangw2@chinatelecom.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
2024-11-27 11:32:35 +00:00
Jee Jee Li
15cc2a9f1a
[Misc]Further reduce BNB static variable ( #10597 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-26 22:54:12 -08:00
youkaichao
45ac4ff270
[bugfix] fix aria model and add torch.compile ( #10645 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-25 18:32:09 -08:00
Cyrus Leung
cf73f0c95e
[Model] Enable optional prefix when loading embedding models ( #10639 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-25 18:14:33 +00:00