Harry Mellor
|
482cdc494e
|
[Doc] Rename offline inference examples (#11927)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-01-10 23:50:29 +08:00 |
|
youkaichao
|
241ad7b301
|
[ci] Fix sampler tests (#11922)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-10 20:45:33 +08:00 |
|
Harry Mellor
|
d85c47d6ad
|
Replace "online inference" with "online serving" (#11923)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-01-10 12:05:56 +00:00 |
|
Cyrus Leung
|
65097ca0af
|
[Doc] Add model development API Reference (#11884)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-09 09:43:40 +00:00 |
|
Robert Shaw
|
56fe4c297c
|
[TPU][Quantization] TPU W8A8 (#11785)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-01-08 19:33:29 +00:00 |
|
Li, Jiang
|
2f7024987e
|
[CI/Build][Bugfix] Fix CPU CI image clean up (#11836)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-01-08 15:18:28 +00:00 |
|
Cyrus Leung
|
6cd40a5bfe
|
[Doc][4/N] Reorganize API Reference (#11843)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-08 21:34:44 +08:00 |
|
Harry Mellor
|
aba8d6ee00
|
[Doc] Move examples into categories (#11840)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-01-08 13:09:53 +00:00 |
|
Yuan
|
1e4ce295ae
|
[CI][CPU] adding build number to docker image name (#11788)
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
|
2025-01-07 07:28:01 +00:00 |
|
Liangfu Chen
|
898cdf033e
|
[CI] Fix neuron CI and run offline tests (#11779)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
|
2025-01-06 21:36:10 -08:00 |
|
Jee Jee Li
|
b278557935
|
[Kernel][LoRA]Punica prefill kernels fusion (#11234)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Abatom <abzhonghua@gmail.com>
Co-authored-by: Zhonghua Deng <abatom@163.com>
|
2025-01-07 04:01:39 +00:00 |
|
Aurick Qiao
|
e1a5c2f0a1
|
[Model] Whisper model implementation (#11280)
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
|
2025-01-03 16:39:19 +08:00 |
|
Kevin H. Luu
|
fd3a62a122
|
[perf-benchmark] Fix dependency for steps in benchmark pipeline (#11710)
|
2025-01-02 22:38:37 -08:00 |
|
Kevin H. Luu
|
ccb1aabcca
|
[benchmark] Remove dependency for H100 benchmark step (#11572)
|
2024-12-30 12:27:07 -08:00 |
|
Cyrus Leung
|
8d9b6721e7
|
[VLM] Abstract out multi-modal data parsing in merged processor (#11620)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-30 15:01:35 +00:00 |
|
youkaichao
|
b12e87f942
|
[platforms] enable platform plugins (#11602)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-30 20:24:45 +08:00 |
|
Simon Mo
|
048fc57a0f
|
[CI] Unboock H100 Benchmark (#11419)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2024-12-22 14:17:43 -08:00 |
|
youkaichao
|
72d9c316d3
|
[cd][release] fix race conditions (#11407)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-22 00:39:11 -08:00 |
|
youkaichao
|
4a9139780a
|
[cd][release] add pypi index for every commit and nightly build (#11404)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-12-21 23:53:44 -08:00 |
|
youkaichao
|
7801f56ed7
|
[ci][gh200] dockerfile clean up (#11351)
Signed-off-by: drikster80 <ed.sealing@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: drikster80 <ed.sealing@gmail.com>
Co-authored-by: cenzhiyao <2523403608@qq.com>
|
2024-12-19 18:13:06 -08:00 |
|
Yuan
|
a985f7af9f
|
[CI] Adding CPU docker pipeline (#11261)
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Co-authored-by: Kevin H. Luu <kevin@anyscale.com>
|
2024-12-19 11:46:55 -08:00 |
|
Wallas Henrique
|
8b79f9e107
|
[Bugfix] Fix guided decoding with tokenizer mode mistral (#11046)
|
2024-12-17 22:34:08 -08:00 |
|
youkaichao
|
35bae114a8
|
fix gh200 tests on main (#11246)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-16 17:22:38 -08:00 |
|
youkaichao
|
c301616ed2
|
[ci][tests] add gh200 tests (#11244)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-16 15:53:18 -08:00 |
|
Varun Sundar Rabindranath
|
efbce85f4d
|
[misc] Layerwise profile updates (#10242)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-12-16 18:14:57 +00:00 |
|
Cyrus Leung
|
eeec9e3390
|
[Frontend] Separate pooling APIs in offline inference (#11129)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-13 10:40:07 +00:00 |
|
youkaichao
|
62de37a38e
|
[core][distributed] initialization from StatelessProcessGroup (#10986)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-12 09:04:19 +00:00 |
|
Cyrus Leung
|
d1e21a979b
|
[CI/Build] Split up VLM tests (#11083)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-12 06:18:16 +08:00 |
|
hissu-hyvarinen
|
b2f775456e
|
[CI/Build] Enable prefix caching test for AMD (#11098)
Signed-off-by: Hissu Hyvarinen <hissu.hyvarinen@amd.com>
|
2024-12-11 15:23:37 +00:00 |
|
Richard Liu
|
5ed5d5f128
|
Build tpu image in release pipeline (#10936)
Signed-off-by: Richard Liu <ricliu@google.com>
Co-authored-by: Kevin H. Luu <kevin@anyscale.com>
|
2024-12-09 23:07:48 +00:00 |
|
Cyrus Leung
|
39e227c7ae
|
[Model] Update multi-modal processor to support Mantis(LLaVA) model (#10711)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 17:10:05 +00:00 |
|
Jee Jee Li
|
acf092d348
|
[Bugfix] Fix test-pipeline.yaml (#10973)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-07 12:08:54 +08:00 |
|
youkaichao
|
9743d64e4e
|
[ci][build] add tests for python only compilation (#10915)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-05 08:54:47 -08:00 |
|
Kevin H. Luu
|
7883c2bbe7
|
[benchmark] Make H100 benchmark optional (#10908)
|
2024-12-04 17:02:17 -08:00 |
|
Kevin H. Luu
|
c92acb9693
|
[ci/build] Update vLLM postmerge ECR repo (#10887)
|
2024-12-04 09:01:20 +00:00 |
|
Kevin H. Luu
|
c9ca4fce3f
|
[ci/build] Job to build and push release image (#10877)
|
2024-12-04 15:02:40 +08:00 |
|
Kevin H. Luu
|
fa2dea61df
|
[ci/build] Change queue name for Release jobs (#10875)
|
2024-12-04 15:02:16 +08:00 |
|
Yan Ma
|
2f2cdc745a
|
[MISC][XPU] quick fix for XPU CI (#10859)
Signed-off-by: yan ma <yan.ma@intel.com>
|
2024-12-03 17:16:31 +00:00 |
|
Jee Jee Li
|
a4cf256159
|
[Bugfix] Fix QKVParallelLinearWithShardedLora bias bug (#10844)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-03 12:10:29 +08:00 |
|
Yan Ma
|
519cc6ca12
|
[Misc][XPU] Avoid torch compile for XPU platform (#10747)
Signed-off-by: yan ma <yan.ma@intel.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-12-02 17:53:55 +00:00 |
|
Kuntai Du
|
0590ec3fd9
|
[Core] Implement disagg prefill by StatelessProcessGroup (#10502)
This PR provides initial support for single-node disaggregated prefill in 1P1D scenario.
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Co-authored-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: YaoJiayi <120040070@link.cuhk.edu.cn>
|
2024-12-01 19:01:00 -06:00 |
|
Cyrus Leung
|
133707123e
|
[Model] Replace embedding models with pooling adapter (#10769)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-01 08:02:54 +08:00 |
|
Ricky Xu
|
519e8e4182
|
[v1] EngineArgs for better config handling for v1 (#10382)
Signed-off-by: rickyx <rickyx@anyscale.com>
|
2024-11-25 21:09:43 -08:00 |
|
youkaichao
|
eda2b3589c
|
Revert "Print running script to enhance CI log readability" (#10601)
|
2024-11-23 21:31:47 -08:00 |
|
Jee Jee Li
|
1c445dca51
|
[CI/Build] Print running script to enhance CI log readability (#10594)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-24 03:57:13 +00:00 |
|
Jee Jee Li
|
1700c543a5
|
[Bugfix] Fix LoRA weight sharding (#10450)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-11-23 17:23:17 -08:00 |
|
Nishidha
|
651f6c31ac
|
For ppc64le, disabled tests for now and addressed space issues (#10538)
|
2024-11-23 09:33:53 +00:00 |
|
kliuae
|
7c25fe45a6
|
[AMD] Add support for GGUF quantization on ROCm (#10254)
|
2024-11-22 21:14:49 -08:00 |
|
Simon Mo
|
aed074860a
|
[Benchmark] Add new H100 machine (#10547)
|
2024-11-21 18:27:20 -08:00 |
|
Yunmeng
|
edec3385b6
|
[CI][Installation] Avoid uploading CUDA 11.8 wheel (#10535)
Signed-off-by: simon-mo <simon.mo@hey.com>
Co-authored-by: simon-mo <simon.mo@hey.com>
|
2024-11-21 13:03:58 -08:00 |
|