hfan
|
fde60ee775
|
[Model] Fix a check for None but the return value was empty list in Gemma3 MM vision_embeddings (#21479)
Signed-off-by: Hongmin Fan <fanhongmin@google.com>
|
2025-07-25 13:46:06 +08:00 |
|
Jason Gu
|
b38bc652ac
|
[Model] Support tensor parallel for timm ViT in Deepseek_vl2 (#21494)
Signed-off-by: wzqd <1057337859@qq.com>
|
2025-07-24 22:45:16 -07:00 |
|
Ning Xie
|
adaf2c6d4f
|
[Bugfix] fix modelscope snapshot_download serialization (#21536)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-07-24 22:44:38 -07:00 |
|
Benji Beck
|
965bc71b04
|
Integrate TensorSchema with shape validation for Phi3VImagePixelInputs (#21232)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-07-24 21:43:52 -07:00 |
|
Varun Sundar Rabindranath
|
2212cd6cfb
|
[Bugfix] DeepGemm utils : Fix hardcoded type-cast (#21517)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-24 20:17:29 -07:00 |
|
Burkhard Ringlein
|
ce3a9b1378
|
[Kernel] adding fused_moe configs for upcoming granite4 (#21332)
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-07-24 20:16:59 -07:00 |
|
Yuxuan Zhang
|
2ce90e5b01
|
Fix GLM-4 PP Missing Layer When using with PP. (#21531)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
|
2025-07-24 20:07:38 -07:00 |
|
Wentao Ye
|
633f6e804b
|
[Bug] Fix DeepGemm Init Error (#21554)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-24 20:07:22 -07:00 |
|
Woosuk Kwon
|
fe56180c7f
|
[MoE] More balanced expert sharding (#21497)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-07-24 15:56:08 -07:00 |
|
Shu Wang
|
1b25f1fe75
|
Update flashinfer CUTLASS MoE Kernel (#21408)
Signed-off-by: Shu Wang. <shuw@nvidia.com>
|
2025-07-24 08:13:31 -07:00 |
|
Harry Mellor
|
13abd0eaf9
|
[Model] Officially support Emu3 with Transformers backend (#21319)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-24 03:22:12 -07:00 |
|
Yuxuan Zhang
|
85bda9e7d0
|
remove GLM-4.5 quantization wrong Code (#21435)
|
2025-07-24 01:52:43 -07:00 |
|
22quinn
|
610852a423
|
[Core] Support model loader plugins (#21067)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-07-24 01:49:44 -07:00 |
|
Nick Hill
|
f0f4de8f26
|
[Misc] Fix duplicate FusedMoEConfig debug messages (#21455)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-24 01:27:30 -07:00 |
|
Chengji Yao
|
e74bfc70e4
|
[TPU][Bugfix] fix moe layer (#21340)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-07-24 00:38:39 -07:00 |
|
Harry Mellor
|
dde295a934
|
Deduplicate Transformers backend code using inheritance (#21461)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-24 00:16:23 -07:00 |
|
Hardik Gupta
|
11599b0e1f
|
feat(gguf_loader): accept HF repo paths & URLs for GGUF (#20793)
Signed-off-by: Hardik <hardikgupta1999@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-07-23 20:21:02 -07:00 |
|
Christian Pinto
|
8560a5b258
|
[Core][Model] PrithviMAE Enablement on vLLM v1 engine (#20577)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
|
2025-07-23 11:00:23 -07:00 |
|
Asher
|
2671334d45
|
[Model] add Hunyuan V1 Dense Model support. (#21368)
Signed-off-by: Asher Zhang <asherszhang@tencent.com>
|
2025-07-23 03:54:08 -07:00 |
|
youkaichao
|
2f5c14de6a
|
add clear messages for deprecated models (#21424)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-07-23 00:03:16 -07:00 |
|
Michael Goin
|
f002e9a870
|
[Cleanup] Only log MoE DP setup warning if DP is enabled (#21315)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-23 00:02:48 -07:00 |
|
Isotr0py
|
4ecedd1806
|
[Bugfix] Fix nightly transformers CI failure (#21427)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-23 00:01:01 -07:00 |
|
Chendi.Xue
|
08d2bd78da
|
[BUGFIX] deepseek-v2-lite failed due to fused_qkv_a_proj name update (#21414)
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
|
2025-07-22 20:33:57 -07:00 |
|
Harry Mellor
|
f154bb9ff0
|
Simplify weight loading in Transformers backend (#21382)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-22 20:29:43 -07:00 |
|
Aritra Roy Gosthipaty
|
2226d5bd85
|
[Bugfix] Decode Tokenized IDs to Strings for hf_processor in llm.chat() with model_impl=transformers (#21353)
Signed-off-by: ariG23498 <aritra.born2fly@gmail.com>
|
2025-07-22 08:27:28 -07:00 |
|
Raushan Turganbay
|
f38ee34a0a
|
[feat] Enable mm caching for transformers backend (#21358)
Signed-off-by: raushan <raushan@huggingface.co>
|
2025-07-22 08:18:46 -07:00 |
|
Benjamin Bartels
|
b194557a6c
|
Adds parallel model weight loading for runai_streamer (#21330)
Signed-off-by: bbartels <benjamin@bartels.dev>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-07-22 08:15:53 -07:00 |
|
Wentao Ye
|
774d0c014b
|
[Perf] Cuda Kernel for Per Token Group Quant (#21083)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-22 07:27:15 -07:00 |
|
Duncan Moss
|
2c8db17cfd
|
[feat]: add SM100 support for cutlass FP8 groupGEMM (#20447)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-22 07:27:12 -07:00 |
|
Mickaël Seznec
|
4fb56914c5
|
[perf] Add fused MLA QKV + strided layernorm (#21116)
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-22 07:07:44 -07:00 |
|
Raghav Ravishankar
|
82b8027be6
|
Add arcee model (#21296)
Signed-off-by: alyosha-swamy <raghav@arcee.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-22 00:57:43 -07:00 |
|
Shu Wang
|
9e23ad9655
|
Update fp4 quantize API (#21327)
Signed-off-by: Shu Wang <shuw@nvidia.com>
|
2025-07-21 23:40:21 -07:00 |
|
Ming Yang
|
e7b2042681
|
Revert "[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE (#20762) (#21334)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-07-21 21:49:01 -07:00 |
|
Himanshu Jaju
|
0ec82edda5
|
[perf] Speed up align sum kernels (#21079)
Signed-off-by: Himanshu Jaju <hj@mistral.ai>
|
2025-07-21 11:19:23 -07:00 |
|
Zhiyu
|
6b46c4b653
|
Add Nvidia ModelOpt config adaptation (#19815)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
|
2025-07-21 10:02:58 -04:00 |
|
Cyrus Leung
|
042af0c8d3
|
[Model][1/N] Support multiple poolers at model level (#21227)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-21 02:22:21 -07:00 |
|
Raushan Turganbay
|
9499e26e2a
|
[Model] Support VLMs with transformers backend (#20543)
Signed-off-by: raushan <raushan@huggingface.co>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-07-20 13:25:50 +00:00 |
|
Calvin Chen
|
51ba839555
|
[Model] use AutoWeightsLoader for bart (#18299)
Signed-off-by: calvin chen <120380290@qq.com>
|
2025-07-20 08:15:50 +00:00 |
|
Yuxuan Zhang
|
10eb24cc91
|
GLM-4 Update (#20736)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Lu Fang <fanglu@fb.com>
|
2025-07-19 22:40:31 +00:00 |
|
Woosuk Kwon
|
752c6ade2e
|
[V0 Deprecation] Deprecate BlockSparse Attention & Phi3-Small (#21217)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-19 13:53:17 -07:00 |
|
Thomas Parnell
|
881e3cbe3b
|
[V1] [Hybrid] Enable piecewise CUDA Graph for mamba layers (#21194)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-07-19 19:27:21 +00:00 |
|
Kaixi Hou
|
6d0734c562
|
[NVIDIA] Add SM100 Flashinfer MoE blockscale fp8 backend for low latency (#20645)
Signed-off-by: kaixih <kaixih@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-19 02:33:01 -07:00 |
|
김종곤
|
3e04107d97
|
[Model] EXAONE 4.0 model support (#21060)
Signed-off-by: Deepfocused <rlawhdrhs27@gmail.com>
Signed-off-by: woongsik <rlawhdrhs27@gmail.com>
|
2025-07-19 14:25:44 +08:00 |
|
Varun Sundar Rabindranath
|
dcc6cfb991
|
[Kernel][Performance] Tweak MoE Batched silu_mul_fp8_quant_deep_gemm kernel (#21193)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-18 23:09:51 -07:00 |
|
Woosuk Kwon
|
dd572c0ab3
|
[V0 Deprecation] Remove V0 Spec Decode workers (#21152)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-18 21:47:50 -07:00 |
|
Jee Jee Li
|
466e878f2a
|
[Quantization] Enable BNB support for more MoE models (#21100)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-18 17:52:02 -07:00 |
|
Rui Qiao
|
217937221b
|
Elastic Expert Parallel Initial Support (#20775)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-07-18 17:46:09 -07:00 |
|
Richard Zou
|
b2eb2b5ad7
|
[Kernel] Apply torch.Tag.needs_fixed_stride_order only for torch==2.6.0 (#19346)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-07-18 14:10:21 -04:00 |
|
Thomas Parnell
|
ed8cbfedf8
|
Let GraniteMoeAttention use YaRN (#21174)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-07-18 05:52:52 -07:00 |
|
Cyrus Leung
|
45badd05d0
|
[Core] Set pooling params based on task and model (#21128)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-18 05:41:17 -07:00 |
|