Isotr0py
|
dd2a6a82e3
|
[Bugfix] Fix internlm2 tensor parallel inference (#8055)
|
2024-09-02 23:48:56 +08:00 |
|
Lily Liu
|
e6a26ed037
|
[SpecDecode][Kernel] Flashinfer Rejection Sampling (#7244)
|
2024-09-01 21:23:29 -07:00 |
|
Shawn Tan
|
f8d60145b4
|
[Model] Add Granite model (#7436)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-09-01 18:37:18 -07:00 |
|
Roger Wang
|
5b86b19954
|
[Misc] Optional installation of audio related packages (#8063)
|
2024-09-01 14:46:57 -07:00 |
|
Cyrus Leung
|
d05f0a9db2
|
[Bugfix] Fix import error in Phi-3.5-MoE (#8052)
|
2024-08-30 22:26:55 -07:00 |
|
Wenxiang
|
1248e8506a
|
[Model] Adding support for MSFT Phi-3.5-MoE (#7729)
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Zeqi Lin <zelin@microsoft.com>
Co-authored-by: Zeqi Lin <Zeqi.Lin@microsoft.com>
|
2024-08-30 13:42:57 -06:00 |
|
Jungho Christopher Cho
|
f97be32d1d
|
[VLM][Model] TP support for ViTs (#7186)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-08-30 08:19:27 -07:00 |
|
Cyrus Leung
|
afd39a4511
|
[Bugfix] Fix import error in Exaone model (#8034)
|
2024-08-30 08:03:28 -07:00 |
|
Yohan Na
|
dc13e99348
|
[MODEL] add Exaone model support (#7819)
|
2024-08-29 23:34:20 -07:00 |
|
afeldman-nm
|
428dd1445e
|
[Core] Logprobs support in Multi-step (#7652)
|
2024-08-29 19:19:08 -07:00 |
|
chenqianfzh
|
4664ceaad6
|
support bitsandbytes 8-bit and FP4 quantized models (#7445)
|
2024-08-29 19:09:08 -04:00 |
|
Harsha vardhan manoj Bikki
|
257afc37c5
|
[Neuron] Adding support for context-lenght, token-gen buckets. (#7885)
Co-authored-by: Harsha Bikki <harbikh@amazon.com>
|
2024-08-29 13:58:14 -07:00 |
|
Dipika Sikka
|
86a677de42
|
[misc] update tpu int8 to use new vLLM Parameters (#7973)
|
2024-08-29 16:46:55 -04:00 |
|
Isotr0py
|
d78789ac16
|
[Bugfix] Fix incorrect vocal embedding shards for GGUF model in tensor parallelism (#7954)
|
2024-08-29 15:54:49 -04:00 |
|
Peter Salas
|
74d5543ec5
|
[VLM][Core] Fix exceptions on ragged NestedTensors (#7974)
|
2024-08-29 03:24:31 +00:00 |
|
Mor Zusman
|
fdd9daafa3
|
[Kernel/Model] Migrate mamba_ssm and causal_conv1d kernels to vLLM (#7651)
|
2024-08-28 15:06:52 -07:00 |
|
rasmith
|
e5697d161c
|
[Kernel] [Triton] [AMD] Adding Triton implementations awq_dequantize and awq_gemm to support AWQ (#7386)
|
2024-08-28 15:37:47 -04:00 |
|
Cyrus Leung
|
ef9baee3c5
|
[Bugfix][VLM] Fix incompatibility between #7902 and #7230 (#7948)
|
2024-08-28 08:11:18 -07:00 |
|
Peter Salas
|
fab5f53e2d
|
[Core][VLM] Stack multimodal tensors to represent multiple images within each prompt (#7902)
|
2024-08-28 01:53:56 +00:00 |
|
zifeitong
|
5340a2dccf
|
[Model] Add multi-image input support for LLaVA-Next offline inference (#7230)
|
2024-08-28 07:09:02 +08:00 |
|
Dipika Sikka
|
fc911880cc
|
[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7766)
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
|
2024-08-27 15:07:09 -07:00 |
|
Isotr0py
|
b09c755be8
|
[Bugfix] Fix phi3v incorrect image_idx when using async engine (#7916)
|
2024-08-27 17:36:09 +00:00 |
|
Dipika Sikka
|
015e6cc252
|
[Misc] Update compressed tensors lifecycle to remove prefix from create_weights (#7825)
|
2024-08-26 18:09:34 -06:00 |
|
Dipika Sikka
|
dd9857f5fa
|
[Misc] Update gptq_marlin_24 to use vLLMParameters (#7762)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-08-26 17:44:54 -04:00 |
|
Dipika Sikka
|
665304092d
|
[Misc] Update qqq to use vLLMParameters (#7805)
|
2024-08-26 13:16:15 -06:00 |
|
Isotr0py
|
8aaf3d5347
|
[Model][VLM] Support multi-images inputs for Phi-3-vision models (#7783)
|
2024-08-25 11:51:20 +00:00 |
|
zifeitong
|
80162c44b1
|
[Bugfix] Fix Phi-3v crash when input images are of certain sizes (#7840)
|
2024-08-24 18:16:24 -07:00 |
|
youkaichao
|
7d9ffa2ae1
|
[misc][core] lazy import outlines (#7831)
|
2024-08-24 00:51:38 -07:00 |
|
Tyler Rockwood
|
d81abefd2e
|
[Frontend] add json_schema support from OpenAI protocol (#7654)
|
2024-08-23 23:07:24 -07:00 |
|
Dipika Sikka
|
f1df5dbfd6
|
[Misc] Update marlin to use vLLMParameters (#7803)
|
2024-08-23 14:30:52 -04:00 |
|
Dipika Sikka
|
955b5191c9
|
[Misc] update fp8 to use vLLMParameter (#7437)
|
2024-08-22 08:36:18 -04:00 |
|
Flex Wang
|
4f419c00a6
|
Fix ShardedStateLoader for vllm fp8 quantization (#7708)
|
2024-08-22 08:25:04 -04:00 |
|
Abhinav Goyal
|
a3fce56b88
|
[Speculative Decoding] EAGLE Implementation with Top-1 proposer (#6830)
|
2024-08-22 02:42:24 -07:00 |
|
Woosuk Kwon
|
b3856bef7d
|
[Misc] Use torch.compile for GemmaRMSNorm (#7642)
|
2024-08-22 01:14:13 -07:00 |
|
Michael Goin
|
aae74ef95c
|
Revert "[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527)" (#7764)
|
2024-08-22 03:42:14 +00:00 |
|
zifeitong
|
df1a21131d
|
[Model] Fix Phi-3.5-vision-instruct 'num_crops' issue (#7710)
|
2024-08-22 09:36:24 +08:00 |
|
Dipika Sikka
|
8678a69ab5
|
[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527)
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
|
2024-08-21 16:17:10 -07:00 |
|
Peter Salas
|
1ca0d4f86b
|
[Model] Add UltravoxModel and UltravoxConfig (#7615)
|
2024-08-21 22:49:39 +00:00 |
|
Isotr0py
|
12e1c65bc9
|
[Model] Add AWQ quantization support for InternVL2 model (#7187)
|
2024-08-20 23:18:57 -07:00 |
|
Lucas Wilkinson
|
5288c06aa0
|
[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel (#7174)
|
2024-08-20 07:09:33 -06:00 |
|
jianyizh
|
e6d811dd13
|
[XPU] fallback to native implementation for xpu custom op (#7670)
|
2024-08-20 00:26:09 -07:00 |
|
Zijian Hu
|
f4fc7337bf
|
[Bugfix] support tie_word_embeddings for all models (#5724)
|
2024-08-19 20:00:04 -07:00 |
|
Isotr0py
|
7601cb044d
|
[Core] Support tensor parallelism for GGUF quantization (#7520)
|
2024-08-19 17:30:14 -04:00 |
|
Woosuk Kwon
|
df845b2b46
|
[Misc] Remove Gemma RoPE (#7638)
|
2024-08-19 09:29:31 -07:00 |
|
Peng Guanwen
|
f710fb5265
|
[Core] Use flashinfer sampling kernel when available (#7137)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-08-19 03:24:03 +00:00 |
|
SangBin Cho
|
ff7ec82c4d
|
[Core] Optimize SPMD architecture with delta + serialization optimization (#7109)
|
2024-08-18 17:57:20 -07:00 |
|
Woosuk Kwon
|
200a2ffa6b
|
[Misc] Refactor Llama3 RoPE initialization (#7637)
|
2024-08-18 17:18:12 -07:00 |
|
Woosuk Kwon
|
ab7165f2c7
|
[TPU] Optimize RoPE forward_native2 (#7636)
|
2024-08-18 01:15:10 -07:00 |
|
Roger Wang
|
bbf55c4805
|
[VLM] Refactor MultiModalConfig initialization and profiling (#7530)
|
2024-08-17 13:30:55 -07:00 |
|
Jee Jee Li
|
1ef13cf92f
|
[Misc]Fix BitAndBytes exception messages (#7626)
|
2024-08-17 12:02:14 -07:00 |
|