vllm/vllm at 1867c258bda3bc6adb07090c508fd85e3ceed547 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-22 07:35:01 +08:00

History

Eldar Kurtic 1867c258bd

Fix target matching for fused layers with compressed-tensors (#12617 )

Without this PR
---------------
Quantizing models with llm-compressor and a recipe that explicitly lists
names of layers produces a model that is not loadable by vLLM (i.e.
`vllm serve <model>` fails with `raise ValueError(f"Unable to find
matching target for {module} in the ...`).

Example recipe:
```
recipe = """
quantization_stage:
  run_type: oneshot
  quantization_modifiers:
    GPTQModifier:
      ignore: ["lm_head"]
      config_groups:
        group_0:
          weights:
            num_bits: 4
            type: "int"
            symmetric: true
            strategy: "group"
            group_size: 128
          targets: [
            "model.layers.0.mlp.down_proj",
            "model.layers.2.mlp.down_proj",
            "model.layers.3.mlp.down_proj",
            "model.layers.4.mlp.down_proj",
            "model.layers.5.mlp.down_proj",
            "model.layers.6.mlp.down_proj",
            "model.layers.7.mlp.down_proj",
            "model.layers.8.mlp.down_proj",
            "model.layers.9.mlp.down_proj",
            "model.layers.10.mlp.down_proj",
            "model.layers.11.mlp.down_proj",
            "model.layers.12.mlp.down_proj",
            "model.layers.13.mlp.down_proj",
            "model.layers.14.mlp.down_proj",
            "model.layers.15.mlp.down_proj",
            "model.layers.16.mlp.down_proj",
            "model.layers.17.mlp.down_proj",
            "model.layers.19.mlp.down_proj",
            "model.layers.21.mlp.down_proj",
            "model.layers.22.mlp.down_proj",
            .
            .
            .
          ]
"""
```

To reproduce the vLLM error: 
```bash
vllm serve nm-testing/eldar-test
```

With this PR
------------
Models are loaded correctly without any errors.

2025-02-01 05:07:46 +00:00

..

adapter_commons

[Misc] Clean up and consolidate LRUCache (#11339 )

2024-12-20 00:59:32 +08:00

Set weights_only=True when using torch.load() (#12366 )

2025-01-24 02:17:30 +00:00

[Attention] MLA decode optimizations (#12528 )

2025-01-30 23:49:37 -08:00

[torch.compile] decouple compile sizes and cudagraph sizes (#12243 )

2025-01-24 02:01:30 +08:00

Update pre-commit hooks (#12475 )

2025-01-27 17:23:08 -07:00

device_allocator

[Core] Support fully transparent sleep mode (#11743 )

2025-01-22 14:39:32 +08:00

Update pre-commit hooks (#12475 )

2025-01-27 17:23:08 -07:00

[Attention] MLA decode optimizations (#12528 )

2025-01-30 23:49:37 -08:00

[Misc] fix typo: add missing space in lora adapter error message (#12564 )

2025-01-30 15:39:22 +00:00

[core] add wake_up doc and some sanity check (#12361 )

2025-01-24 02:00:50 +08:00

[Misc] Rename MultiModalInputsV2 -> MultiModalInputs (#12244 )

2025-01-21 07:31:19 +00:00

Rename vllm.logging to vllm.logging_utils (#10134 )

2024-11-08 20:53:24 +00:00

Update pre-commit hooks (#12475 )

2025-01-27 17:23:08 -07:00

Fix target matching for fused layers with compressed-tensors (#12617 )

2025-02-01 05:07:46 +00:00

[Bugfix] Fix broken internvl2 inference with v1 (#12360 )

2025-01-23 17:20:33 +00:00

[Attention] MLA decode optimizations (#12528 )

2025-01-30 23:49:37 -08:00

[core][bugfix] configure env var during import vllm (#12209 )

2025-01-20 19:35:59 +08:00

[Bugfix] Fix incorrect types in LayerwiseProfileResults (#12196 )

2025-01-20 14:59:20 +08:00

Set weights_only=True when using torch.load() (#12366 )

2025-01-24 02:17:30 +00:00

[CI] fix pre-commit error (#12494 )

2025-01-28 06:11:05 +00:00

transformers_utils

[Bugfix] Gracefully handle huggingface hub http error (#12571 )

2025-01-31 08:19:35 +00:00

[Docker] bump up neuron sdk v2.21 (#11593 )

2024-12-30 13:46:14 +08:00

[misc] add cuda runtime version to usage data (#12190 )

2025-01-21 00:31:01 +00:00

[V1] Bugfix: Validate Model Input Length (#12600 )

2025-01-31 18:32:04 -08:00

vllm_flash_attn

[ci][build] fix vllm-flash-attn (#8699 )

2024-09-21 23:24:58 -07:00

[BugFix] fix wrong output when using lora and num_scheduler_steps=8 (#11161 )

2025-02-01 12:52:07 +08:00

__init__.py

[core][bugfix] configure env var during import vllm (#12209 )

2025-01-20 19:35:59 +08:00

_custom_ops.py

[Kernel][Quantization] Integrate block-quantized CUTLASS kernels for DeepSeekV3 (#12587 )

2025-01-31 15:29:11 -08:00

_ipex_ops.py

[Misc][XPU] Upgrade to Pytorch 2.5 for xpu backend (#9823 )

2024-11-06 17:29:03 -08:00

beam_search.py

[Frontend] re-enable multi-modality input in the new beam search implementation (#9427 )

2024-10-29 11:49:47 +00:00

config.py

[Attention] MLA decode optimizations (#12528 )

2025-01-30 23:49:37 -08:00

connections.py

Misc: allow to use proxy in HTTPConnection (#12042 )

2025-01-15 13:16:40 +00:00

envs.py

[Attention] MLA decode optimizations (#12528 )

2025-01-30 23:49:37 -08:00

forward_context.py

[torch.compile] Hide KV cache behind torch.compile boundary (#11677 )

2025-01-10 13:14:42 +08:00

logger.py

[Misc] Move print_*_once from utils to logger (#11298 )

2025-01-09 12:48:12 +08:00

logits_process.py

[Frontend] Bad words sampling parameter (#9717 )

2024-10-26 16:29:38 +00:00

outputs.py

[V1][Frontend] Coalesce bunched RequestOutputs (#12298 )

2025-01-23 17:17:41 -08:00

pooling_params.py

[Doc][4/N] Reorganize API Reference (#11843 )

2025-01-08 21:34:44 +08:00

py.typed

Add py.typed so consumers of vLLM can get type checking (#1509 )

2023-10-30 14:50:47 -07:00

sampling_params.py

[Bugfix] Fix OpenAI parallel sampling when using xgrammar (#11637 )

2024-12-31 03:43:54 +00:00

scalar_type.py

Update pre-commit hooks (#12475 )

2025-01-27 17:23:08 -07:00

scripts.py

[Frontend] Support reasoning content for deepseek r1 (#12473 )

2025-01-29 11:38:08 +08:00

sequence.py

[Bugfix] Multi-sequence broken (#11898 )

2025-01-21 11:51:35 -08:00

tracing.py

[Misc] Remove experimental dep from tracing.py (#12007 )

2025-01-21 11:51:55 -08:00

utils.py

Update pre-commit hooks (#12475 )

2025-01-27 17:23:08 -07:00

version.py

[CI/Build] use setuptools-scm to set __version__ (#4738 )

2024-09-23 09:44:26 -07:00