vllm/layers at 4b1ff1322113aaf337573db90490f5741e65cee6 - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-29 15:34:01 +08:00

History

[Bugfix] Use latency MOE backend as default for Flashinfer and other misc fixes (#27439 )

Signed-off-by: Pavani Majety <pmajety@nvidia.com>

2025-11-07 04:18:39 -08:00

fla

[Bugfix][plugin] fla crash on plugin (#27322 )

2025-11-04 05:27:03 +08:00

fused_moe

[Kernel][Model] Tune fused_moe Triton configs for MiniMax-M2 on H100 (#28200 )

2025-11-06 07:29:46 -08:00

mamba

[V1] [Hybrid] Mamba1 Automatic Prefix Caching (#26377 )

2025-11-02 04:16:23 -08:00

quantization

[Bugfix] Use latency MOE backend as default for Flashinfer and other misc fixes (#27439 )

2025-11-07 04:18:39 -08:00

rotary_embedding

Add llama 4 scaling support (#28145 )

2025-11-06 18:55:17 +00:00

__init__.py

…

activation.py

…

attention_layer_base.py

…

batch_invariant.py

[Feature] Extend batch invariant torch.compile to B200 (#27856 )

2025-11-05 10:04:49 -08:00

kda.py

[Bugfix] Fix KDA output (#27905 )

2025-11-01 11:54:36 +08:00

layernorm.py

[PERF] Decouple projections from GDN custom op. Attempt 2 (#28083 )

2025-11-05 17:01:12 -08:00

lightning_attn.py

…

linear.py

…

logits_processor.py

…

mla.py

[Model] Introduce Kimi Linear to vLLM (#27809 )

2025-10-30 21:02:27 +08:00

pooler.py

[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. (#25524 )

2025-10-30 12:13:05 +00:00

resampler.py

…

utils.py

[ROCm] gemm_a16w16 upstreaming (#26969 )

2025-11-04 16:01:00 -05:00

vocab_parallel_embedding.py

…