Peter Salas
|
74d5543ec5
|
[VLM][Core] Fix exceptions on ragged NestedTensors (#7974)
|
2024-08-29 03:24:31 +00:00 |
|
Mor Zusman
|
fdd9daafa3
|
[Kernel/Model] Migrate mamba_ssm and causal_conv1d kernels to vLLM (#7651)
|
2024-08-28 15:06:52 -07:00 |
|
rasmith
|
e5697d161c
|
[Kernel] [Triton] [AMD] Adding Triton implementations awq_dequantize and awq_gemm to support AWQ (#7386)
|
2024-08-28 15:37:47 -04:00 |
|
Cyrus Leung
|
ef9baee3c5
|
[Bugfix][VLM] Fix incompatibility between #7902 and #7230 (#7948)
|
2024-08-28 08:11:18 -07:00 |
|
Peter Salas
|
fab5f53e2d
|
[Core][VLM] Stack multimodal tensors to represent multiple images within each prompt (#7902)
|
2024-08-28 01:53:56 +00:00 |
|
zifeitong
|
5340a2dccf
|
[Model] Add multi-image input support for LLaVA-Next offline inference (#7230)
|
2024-08-28 07:09:02 +08:00 |
|
Dipika Sikka
|
fc911880cc
|
[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7766)
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
|
2024-08-27 15:07:09 -07:00 |
|
Isotr0py
|
b09c755be8
|
[Bugfix] Fix phi3v incorrect image_idx when using async engine (#7916)
|
2024-08-27 17:36:09 +00:00 |
|
Dipika Sikka
|
015e6cc252
|
[Misc] Update compressed tensors lifecycle to remove prefix from create_weights (#7825)
|
2024-08-26 18:09:34 -06:00 |
|
Dipika Sikka
|
dd9857f5fa
|
[Misc] Update gptq_marlin_24 to use vLLMParameters (#7762)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-08-26 17:44:54 -04:00 |
|
Dipika Sikka
|
665304092d
|
[Misc] Update qqq to use vLLMParameters (#7805)
|
2024-08-26 13:16:15 -06:00 |
|
Isotr0py
|
8aaf3d5347
|
[Model][VLM] Support multi-images inputs for Phi-3-vision models (#7783)
|
2024-08-25 11:51:20 +00:00 |
|
zifeitong
|
80162c44b1
|
[Bugfix] Fix Phi-3v crash when input images are of certain sizes (#7840)
|
2024-08-24 18:16:24 -07:00 |
|
youkaichao
|
7d9ffa2ae1
|
[misc][core] lazy import outlines (#7831)
|
2024-08-24 00:51:38 -07:00 |
|
Tyler Rockwood
|
d81abefd2e
|
[Frontend] add json_schema support from OpenAI protocol (#7654)
|
2024-08-23 23:07:24 -07:00 |
|
Dipika Sikka
|
f1df5dbfd6
|
[Misc] Update marlin to use vLLMParameters (#7803)
|
2024-08-23 14:30:52 -04:00 |
|
Dipika Sikka
|
955b5191c9
|
[Misc] update fp8 to use vLLMParameter (#7437)
|
2024-08-22 08:36:18 -04:00 |
|
Flex Wang
|
4f419c00a6
|
Fix ShardedStateLoader for vllm fp8 quantization (#7708)
|
2024-08-22 08:25:04 -04:00 |
|
Abhinav Goyal
|
a3fce56b88
|
[Speculative Decoding] EAGLE Implementation with Top-1 proposer (#6830)
|
2024-08-22 02:42:24 -07:00 |
|
Woosuk Kwon
|
b3856bef7d
|
[Misc] Use torch.compile for GemmaRMSNorm (#7642)
|
2024-08-22 01:14:13 -07:00 |
|
Michael Goin
|
aae74ef95c
|
Revert "[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527)" (#7764)
|
2024-08-22 03:42:14 +00:00 |
|
zifeitong
|
df1a21131d
|
[Model] Fix Phi-3.5-vision-instruct 'num_crops' issue (#7710)
|
2024-08-22 09:36:24 +08:00 |
|
Dipika Sikka
|
8678a69ab5
|
[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527)
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
|
2024-08-21 16:17:10 -07:00 |
|
Peter Salas
|
1ca0d4f86b
|
[Model] Add UltravoxModel and UltravoxConfig (#7615)
|
2024-08-21 22:49:39 +00:00 |
|
Isotr0py
|
12e1c65bc9
|
[Model] Add AWQ quantization support for InternVL2 model (#7187)
|
2024-08-20 23:18:57 -07:00 |
|
Lucas Wilkinson
|
5288c06aa0
|
[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel (#7174)
|
2024-08-20 07:09:33 -06:00 |
|
jianyizh
|
e6d811dd13
|
[XPU] fallback to native implementation for xpu custom op (#7670)
|
2024-08-20 00:26:09 -07:00 |
|
Zijian Hu
|
f4fc7337bf
|
[Bugfix] support tie_word_embeddings for all models (#5724)
|
2024-08-19 20:00:04 -07:00 |
|
Isotr0py
|
7601cb044d
|
[Core] Support tensor parallelism for GGUF quantization (#7520)
|
2024-08-19 17:30:14 -04:00 |
|
Woosuk Kwon
|
df845b2b46
|
[Misc] Remove Gemma RoPE (#7638)
|
2024-08-19 09:29:31 -07:00 |
|
Peng Guanwen
|
f710fb5265
|
[Core] Use flashinfer sampling kernel when available (#7137)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-08-19 03:24:03 +00:00 |
|
SangBin Cho
|
ff7ec82c4d
|
[Core] Optimize SPMD architecture with delta + serialization optimization (#7109)
|
2024-08-18 17:57:20 -07:00 |
|
Woosuk Kwon
|
200a2ffa6b
|
[Misc] Refactor Llama3 RoPE initialization (#7637)
|
2024-08-18 17:18:12 -07:00 |
|
Woosuk Kwon
|
ab7165f2c7
|
[TPU] Optimize RoPE forward_native2 (#7636)
|
2024-08-18 01:15:10 -07:00 |
|
Roger Wang
|
bbf55c4805
|
[VLM] Refactor MultiModalConfig initialization and profiling (#7530)
|
2024-08-17 13:30:55 -07:00 |
|
Jee Jee Li
|
1ef13cf92f
|
[Misc]Fix BitAndBytes exception messages (#7626)
|
2024-08-17 12:02:14 -07:00 |
|
Besher Alkurdi
|
e73f76eec6
|
[Model] Pipeline parallel support for JAIS (#7603)
|
2024-08-17 11:11:09 -07:00 |
|
youkaichao
|
eed020f673
|
[misc] use nvml to get consistent device name (#7582)
|
2024-08-16 21:15:13 -07:00 |
|
Michael Goin
|
44f26a9466
|
[Model] Align nemotron config with final HF state and fix lm-eval-small (#7611)
|
2024-08-16 15:56:34 -07:00 |
|
bnellnm
|
37fd47e780
|
[Kernel] fix types used in aqlm and ggml kernels to support dynamo (#7596)
|
2024-08-16 14:00:11 -07:00 |
|
Michael Goin
|
855866caa9
|
[Kernel] Add tuned triton configs for ExpertsInt8 (#7601)
|
2024-08-16 11:37:01 -07:00 |
|
Mor Zusman
|
7fc23be81c
|
[Kernel] W8A16 Int8 inside FusedMoE (#7415)
|
2024-08-16 10:06:51 -07:00 |
|
Charlie Fu
|
e837b624f2
|
[Feature][Hardware][Amd] Add fp8 Linear Layer for Rocm (#7210)
|
2024-08-16 10:06:30 -07:00 |
|
Michael Goin
|
21313e09e3
|
[Bugfix] Fix default weight loading for scalars (#7534)
|
2024-08-15 13:10:22 -07:00 |
|
Kyle Sayers
|
f55a9aea45
|
[Misc] Revert compressed-tensors code reuse (#7521)
|
2024-08-14 15:07:37 -07:00 |
|
Cyrus Leung
|
3f674a49b5
|
[VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126)
|
2024-08-14 17:55:42 +00:00 |
|
Chang Su
|
c134a46402
|
Fix empty output when temp is too low (#2937)
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-08-14 05:31:44 +00:00 |
|
youkaichao
|
16422ea76f
|
[misc][plugin] add plugin system implementation (#7426)
|
2024-08-13 16:24:17 -07:00 |
|
Kyle Sayers
|
373538f973
|
[Misc] compressed-tensors code reuse (#7277)
|
2024-08-13 19:05:15 -04:00 |
|
Dipika Sikka
|
b1e5afc3e7
|
[Misc] Update awq and awq_marlin to use vLLMParameters (#7422)
|
2024-08-13 17:08:20 -04:00 |
|