youkaichao
|
db87eb6c67
|
[torch.compile] use size tuning for specific sizes (#10933)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-05 20:30:41 -08:00 |
|
youkaichao
|
9743d64e4e
|
[ci][build] add tests for python only compilation (#10915)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-05 08:54:47 -08:00 |
|
Konrad Zawora
|
a43065272f
|
[Misc][Gaudi] Avoid torch.compile and enable lazy collectives (#10897)
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
|
2024-12-05 08:47:46 -08:00 |
|
Isotr0py
|
998eeafe58
|
[CI/Build] Bump test transformers version (#10106)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-05 16:05:52 +00:00 |
|
Jee Jee Li
|
571da8fc43
|
[Misc][LoRA] Clean up the function interface of Punica (#10917)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-05 13:22:28 +00:00 |
|
Travis Johnson
|
39c89e71a8
|
[Misc] Update llama 3.2 template to support system prompt with images (#10901)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-12-05 05:54:06 +00:00 |
|
Jee Jee Li
|
1f958a7d52
|
[Bugfix] Fix BNB loader target_modules (#10720)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-05 13:20:26 +08:00 |
|
Cyrus Leung
|
aa39a8e175
|
[Doc] Create a new "Usage" section (#10827)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-05 11:19:35 +08:00 |
|
Michael Goin
|
8d370e91cb
|
[Bugfix] Fallback to outlines for complex json schemas (#10899)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-05 11:14:06 +08:00 |
|
Kevin H. Luu
|
7883c2bbe7
|
[benchmark] Make H100 benchmark optional (#10908)
|
2024-12-04 17:02:17 -08:00 |
|
Woosuk Kwon
|
2a56e1264f
|
[V1] Fix when max_model_len is not divisible by block_size (#10903)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-04 16:54:05 -08:00 |
|
Daniele
|
e4c34c23de
|
[CI/Build] improve python-only dev setup (#9621)
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-12-04 21:48:13 +00:00 |
|
Chendi.Xue
|
82eb5ea8f3
|
Benchmark serving structured output (#10880)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-12-04 16:28:21 -05:00 |
|
Isotr0py
|
10398b4706
|
[Model] Consolidate ViTs attention implementation without mask (#10893)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-04 18:11:08 +00:00 |
|
Xin Yang
|
01d079fd8e
|
[LoRA] Change lora_tokenizers capacity (#10796)
Signed-off-by: Xin Yang <xyang19@gmail.com>
|
2024-12-04 17:40:16 +00:00 |
|
Kevin H. Luu
|
c92acb9693
|
[ci/build] Update vLLM postmerge ECR repo (#10887)
|
2024-12-04 09:01:20 +00:00 |
|
jianzheng
|
8db957ee3a
|
[bugfix] fixed parameter “n” when set parameter “bestof” > 1 (#10854)
Signed-off-by: jianzheng <57654625+o2363286@users.noreply.github.com>
|
2024-12-04 08:48:22 +00:00 |
|
Kevin H. Luu
|
c9ca4fce3f
|
[ci/build] Job to build and push release image (#10877)
|
2024-12-04 15:02:40 +08:00 |
|
Kevin H. Luu
|
fa2dea61df
|
[ci/build] Change queue name for Release jobs (#10875)
|
2024-12-04 15:02:16 +08:00 |
|
wangxiyuan
|
b5b647b084
|
Drop ROCm load format check (#10767)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2024-12-04 04:32:21 +00:00 |
|
Tyler Michael Smith
|
d2bd88b122
|
[CI/Build] Replace mean with torch.all in test_pynccl.py (#10876)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-04 03:23:21 +00:00 |
|
Chendi.Xue
|
381ac93bb5
|
[Benchmark] Benchmark structured output with datasets (#10557)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
|
2024-12-03 17:21:06 -07:00 |
|
Gregory Shtrasberg
|
a061fe601e
|
[Build][Bugfix] Using the correct type hint (#10866)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2024-12-03 15:47:55 -05:00 |
|
tomeras91
|
7c32b6861e
|
[Frontend] correctly record prefill and decode time metrics (#10853)
Signed-off-by: Tomer Asida <tomera@ai21.com>
|
2024-12-03 19:13:31 +00:00 |
|
Michael Goin
|
7090c27bb2
|
[Bugfix] Only require XGrammar on x86 (#10865)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-03 10:32:21 -08:00 |
|
Yan Ma
|
2f2cdc745a
|
[MISC][XPU] quick fix for XPU CI (#10859)
Signed-off-by: yan ma <yan.ma@intel.com>
|
2024-12-03 17:16:31 +00:00 |
|
Alexander Matveev
|
3bc94cab69
|
[V1] VLM - Run the mm_mapper preprocessor in the frontend process (#10640)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-03 10:33:10 +00:00 |
|
Yang Zheng
|
f6084f6324
|
[Speculative Decoding] Move indices to device before filtering output (#10850)
Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>
|
2024-12-03 17:01:39 +08:00 |
|
Aaron Pham
|
9323a3153b
|
[Core][Performance] Add XGrammar support for guided decoding and set it as default (#10785)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-12-03 15:17:00 +08:00 |
|
Cyrus Leung
|
3257d449fa
|
[Misc] Remove deprecated names (#10817)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-03 06:52:57 +00:00 |
|
Russell Bryant
|
ef51831ee8
|
[Doc] Add github links for source code references (#10672)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-03 06:46:07 +00:00 |
|
youkaichao
|
dc5ce861bf
|
[torch.compile] remove compilation_context and simplify code (#10838)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-03 06:19:02 +00:00 |
|
youkaichao
|
21fe7b481a
|
[core][distributed] add pynccl broadcast (#10843)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-03 04:53:23 +00:00 |
|
Jee Jee Li
|
a4cf256159
|
[Bugfix] Fix QKVParallelLinearWithShardedLora bias bug (#10844)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-03 12:10:29 +08:00 |
|
zixuanzhang226
|
d746268e92
|
[Model] support bitsandbytes quantization with minicpm model (#10842)
Signed-off-by: Ubuntu <zixuanzhang@bytedance.com>
|
2024-12-03 03:06:41 +00:00 |
|
Michael Goin
|
4433195ab7
|
[Bugfix] Prevent benchmark_throughput.py from using duplicated random prompts (#10753)
|
2024-12-03 02:26:15 +00:00 |
|
Isotr0py
|
4c05edb33a
|
[Model] Add TP and BNB quantization support to LlavaMultiModalProjector (#10834)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-12-02 23:06:09 +00:00 |
|
Jani Monoses
|
9b14d978aa
|
Fix openvino on GPU (#10793)
|
2024-12-02 18:52:19 +00:00 |
|
Yan Ma
|
519cc6ca12
|
[Misc][XPU] Avoid torch compile for XPU platform (#10747)
Signed-off-by: yan ma <yan.ma@intel.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-12-02 17:53:55 +00:00 |
|
Jee Jee Li
|
b45f0d7946
|
[Misc][LoRA] Move the implementation of lora bias to punica.py (#10829)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-02 17:53:36 +00:00 |
|
youkaichao
|
a4c4daf364
|
[misc] use out argument for flash attention (#10822)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-02 10:50:10 +00:00 |
|
Cyrus Leung
|
e95f275f57
|
[CI/Build] Update mistral_common version for tests and docs (#10825)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-02 10:26:10 +00:00 |
|
zhou fan
|
ef31eabc68
|
[Model]: add some tests for aria model (#10770)
Signed-off-by: xffxff <1247714429@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-12-02 05:36:36 +00:00 |
|
wangxiyuan
|
995a148575
|
[doc]Update config docstring (#10732)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2024-12-02 04:14:45 +00:00 |
|
youkaichao
|
63a164172d
|
[misc] remove xverse modeling file (#10814)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-02 03:27:13 +00:00 |
|
Maximilien de Bayser
|
e25810ae29
|
Fill TorchSDPAAttentionMetadata seq_lens_field for prefill (#10799)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2024-12-02 10:05:32 +08:00 |
|
Woosuk Kwon
|
073a4bd1c0
|
[Kernel] Use out arg in flash_attn_varlen_func (#10811)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-01 17:55:39 -08:00 |
|
cduk
|
b7954776fd
|
[core] Avoid metrics log noise when idle - include speculative decodi… (#10809)
|
2024-12-02 01:49:48 +00:00 |
|
Isotr0py
|
b18c9bbaba
|
[Model] Add BNB support to Llava and Pixtral-HF (#10795)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-02 01:31:09 +00:00 |
|
Kuntai Du
|
0590ec3fd9
|
[Core] Implement disagg prefill by StatelessProcessGroup (#10502)
This PR provides initial support for single-node disaggregated prefill in 1P1D scenario.
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Co-authored-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: YaoJiayi <120040070@link.cuhk.edu.cn>
|
2024-12-01 19:01:00 -06:00 |
|