Maxime Fournioux
|
fe2e10c71b
|
Add example of helm chart for vllm deployment on k8s (#9199)
Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
|
2024-12-10 09:19:27 +00:00 |
|
Gene Der Su
|
82c73fd510
|
[Bugfix] cuda error running llama 3.2 (#11047)
|
2024-12-10 07:41:11 +00:00 |
|
Diego Marinho
|
bfd610430c
|
Update README.md (#11034)
|
2024-12-09 23:08:10 -08:00 |
|
Jeff Cook
|
e35879c276
|
[Bugfix] Fix xgrammar failing to read a vocab_size from LlavaConfig on PixtralHF. (#11043)
|
2024-12-10 14:54:22 +08:00 |
|
youkaichao
|
ebf778061d
|
monitor metrics of tokens per step using cudagraph batchsizes (#11031)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-09 22:35:36 -08:00 |
|
Tyler Michael Smith
|
28b3a1c7e5
|
[V1] Multiprocessing Tensor Parallel Support for v1 (#9856)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-10 06:28:14 +00:00 |
|
Patrick von Platen
|
bc192a2b09
|
[Pixtral] Improve loading (#11040)
|
2024-12-10 06:09:32 +00:00 |
|
Joe Runde
|
980ad394a8
|
[Frontend] Use request id from header (#10968)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-12-10 13:46:29 +08:00 |
|
Cyrus Leung
|
391d7b2763
|
[Bugfix] Fix usage of deprecated decorator (#11025)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-10 13:45:47 +08:00 |
|
Isotr0py
|
d1f6d1c8af
|
[Model] Add has_weight to RMSNorm and re-enable weights loading tracker for Mamba (#10739)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-10 10:23:07 +08:00 |
|
Michael Goin
|
6d525288c1
|
[Docs] Add dedicated tool calling page to docs (#10554)
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-09 20:15:34 -05:00 |
|
Woosuk Kwon
|
6faec54505
|
[V1] Do not store None in self.generators (#11038)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-09 15:08:19 -08:00 |
|
Richard Liu
|
5ed5d5f128
|
Build tpu image in release pipeline (#10936)
Signed-off-by: Richard Liu <ricliu@google.com>
Co-authored-by: Kevin H. Luu <kevin@anyscale.com>
|
2024-12-09 23:07:48 +00:00 |
|
Gregory Shtrasberg
|
b63ba84832
|
[ROCm][bugfix] scpecilative decoding worker class (#11035)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2024-12-09 14:00:29 -08:00 |
|
xendo
|
9c6459e4cb
|
[Neuron] Upgrade neuron to 2.20.2 (#11016)
Signed-off-by: Jerzy Zagorski <jzagorsk@amazon.com>
Co-authored-by: Jerzy Zagorski <jzagorsk@amazon.com>
|
2024-12-09 13:53:24 -08:00 |
|
youkaichao
|
1a2f8fb828
|
[v1] fix use compile sizes (#11000)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-09 13:47:24 -08:00 |
|
Konrad Zawora
|
cbcbdb1ceb
|
[Bugfix][Hardware][Gaudi] Bump vllm_hpu_extension version (#11028)
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
|
2024-12-09 13:21:06 -08:00 |
|
Isotr0py
|
a811dd6608
|
[Model] merged input processor for Phi-3-Vision models (#10977)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-12-09 12:55:10 -08:00 |
|
Jee Jee Li
|
ca871491ed
|
[Misc][LoRA] Abstract PunicaWrapper (#10955)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-09 12:54:44 -08:00 |
|
Woosuk Kwon
|
3b61cb450d
|
[V1] Further reduce CPU overheads in flash-attn (#10989)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-09 12:38:46 -08:00 |
|
Kevin H. Luu
|
edc4fa3188
|
[ci/build] Recompile CI dependencies list with Python 3.12 (#11013)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-12-09 11:46:58 -08:00 |
|
Varun Sundar Rabindranath
|
25b79d9fd3
|
[V1] Input Batch Relocation (#10962)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-12-09 09:33:41 -08:00 |
|
wangxiyuan
|
aea2fc38c3
|
[Platform] Move async output check to platform (#10768)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2024-12-09 17:24:46 +00:00 |
|
Russell Bryant
|
e691b26f6f
|
[Core] Require xgrammar >= 0.1.6 (#11021)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-12-09 16:44:27 +00:00 |
|
Roger Wang
|
c690357928
|
[V1] Fix Detokenizer loading in AsyncLLM (#10997)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-12-09 16:27:10 +00:00 |
|
youkaichao
|
d1c2e15eb3
|
[torch.compile] add dynamo time tracking (#11005)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-08 23:09:04 -08:00 |
|
Roger Wang
|
af7c4a92e6
|
[Doc][V1] Add V1 support column for multimodal models (#10998)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-12-08 22:29:16 -08:00 |
|
youkaichao
|
46004e83a2
|
[misc] clean up and unify logging (#10999)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-08 17:28:27 -08:00 |
|
youkaichao
|
43b05fa314
|
[torch.compile][misc] fix comments (#10993)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-08 11:18:18 -08:00 |
|
Roger Wang
|
a11f326528
|
[V1] Initial support of multimodal models for V1 re-arch (#10699)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-12-08 12:50:51 +00:00 |
|
youkaichao
|
fd57d2b534
|
[torch.compile] allow candidate compile sizes (#10984)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-08 11:05:21 +00:00 |
|
youkaichao
|
7be15d9356
|
[core][misc] remove use_dummy driver for _run_workers (#10920)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-07 12:06:08 -08:00 |
|
youkaichao
|
1b62745b1d
|
[core][executor] simplify instance id (#10976)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-07 09:33:45 -08:00 |
|
zhou fan
|
78029b34ed
|
[BugFix][Kernel]: fix illegal memory access in causal_conv1d when conv_states is None (#10928)
Signed-off-by: xffxff <1247714429@qq.com>
|
2024-12-08 01:21:18 +08:00 |
|
Cyrus Leung
|
c889d5888b
|
[Doc] Explicitly state that PP isn't compatible with speculative decoding yet (#10975)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 17:20:49 +00:00 |
|
Cyrus Leung
|
39e227c7ae
|
[Model] Update multi-modal processor to support Mantis(LLaVA) model (#10711)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 17:10:05 +00:00 |
|
Cyrus Leung
|
1c768fe537
|
[Doc] Explicitly state that InternVL 2.5 is supported (#10978)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 16:58:02 +00:00 |
|
Cyrus Leung
|
bf0e382e16
|
[Model] Composite weight loading for multimodal Qwen2 (#10944)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 07:22:52 -07:00 |
|
Isotr0py
|
b26b4cd03c
|
[Misc][LoRA] Refactor and clean MergedQKVParallelLinearWithLora implementation (#10958)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-07 18:33:49 +08:00 |
|
Gregory Shtrasberg
|
f13cf9ad50
|
[Build] Fix for the Wswitch-bool clang warning (#10060)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2024-12-07 09:03:44 +00:00 |
|
Cyrus Leung
|
955fa9533a
|
[3/N] Support and implement merged input processor for LLaVA model (#10676)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-07 00:50:58 -08:00 |
|
Jee Jee Li
|
acf092d348
|
[Bugfix] Fix test-pipeline.yaml (#10973)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-07 12:08:54 +08:00 |
|
Russell Bryant
|
69d357ba12
|
[Core] Cleanup startup logging a bit (#10961)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-12-07 02:30:23 +00:00 |
|
youkaichao
|
dcdc3fafe5
|
[ci] fix broken tests (#10956)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-06 11:25:47 -08:00 |
|
youkaichao
|
c05cfb67da
|
[misc] fix typo (#10960)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-06 11:25:20 -08:00 |
|
Sam Stoelinga
|
7406274041
|
[Doc] add KubeAI to serving integrations (#10837)
Signed-off-by: Sam Stoelinga <sammiestoel@gmail.com>
|
2024-12-06 17:03:56 +00:00 |
|
Michael Goin
|
8b59631855
|
[Core] Support Lark grammars for XGrammar (#10870)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-06 08:34:29 -07:00 |
|
youkaichao
|
a1887f2c96
|
[torch.compile] fix deprecated code (#10948)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-06 11:01:23 +00:00 |
|
Cyrus Leung
|
222f5b082a
|
[CI/Build] Fix broken multimodal test (#10950)
|
2024-12-06 10:41:23 +00:00 |
|
youkaichao
|
b031a455a9
|
[torch.compile] add logging for compilation time (#10941)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-06 10:07:15 +00:00 |
|