zhou fan
|
78029b34ed
|
[BugFix][Kernel]: fix illegal memory access in causal_conv1d when conv_states is None (#10928)
Signed-off-by: xffxff <1247714429@qq.com>
|
2024-12-08 01:21:18 +08:00 |
|
Cyrus Leung
|
c889d5888b
|
[Doc] Explicitly state that PP isn't compatible with speculative decoding yet (#10975)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 17:20:49 +00:00 |
|
Cyrus Leung
|
39e227c7ae
|
[Model] Update multi-modal processor to support Mantis(LLaVA) model (#10711)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 17:10:05 +00:00 |
|
Cyrus Leung
|
1c768fe537
|
[Doc] Explicitly state that InternVL 2.5 is supported (#10978)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 16:58:02 +00:00 |
|
Cyrus Leung
|
bf0e382e16
|
[Model] Composite weight loading for multimodal Qwen2 (#10944)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 07:22:52 -07:00 |
|
Isotr0py
|
b26b4cd03c
|
[Misc][LoRA] Refactor and clean MergedQKVParallelLinearWithLora implementation (#10958)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-07 18:33:49 +08:00 |
|
Gregory Shtrasberg
|
f13cf9ad50
|
[Build] Fix for the Wswitch-bool clang warning (#10060)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2024-12-07 09:03:44 +00:00 |
|
Cyrus Leung
|
955fa9533a
|
[3/N] Support and implement merged input processor for LLaVA model (#10676)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-07 00:50:58 -08:00 |
|
Jee Jee Li
|
acf092d348
|
[Bugfix] Fix test-pipeline.yaml (#10973)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-07 12:08:54 +08:00 |
|
Russell Bryant
|
69d357ba12
|
[Core] Cleanup startup logging a bit (#10961)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-12-07 02:30:23 +00:00 |
|
youkaichao
|
dcdc3fafe5
|
[ci] fix broken tests (#10956)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-06 11:25:47 -08:00 |
|
youkaichao
|
c05cfb67da
|
[misc] fix typo (#10960)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-06 11:25:20 -08:00 |
|
Sam Stoelinga
|
7406274041
|
[Doc] add KubeAI to serving integrations (#10837)
Signed-off-by: Sam Stoelinga <sammiestoel@gmail.com>
|
2024-12-06 17:03:56 +00:00 |
|
Michael Goin
|
8b59631855
|
[Core] Support Lark grammars for XGrammar (#10870)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-06 08:34:29 -07:00 |
|
youkaichao
|
a1887f2c96
|
[torch.compile] fix deprecated code (#10948)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-06 11:01:23 +00:00 |
|
Cyrus Leung
|
222f5b082a
|
[CI/Build] Fix broken multimodal test (#10950)
|
2024-12-06 10:41:23 +00:00 |
|
youkaichao
|
b031a455a9
|
[torch.compile] add logging for compilation time (#10941)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-06 10:07:15 +00:00 |
|
youkaichao
|
db87eb6c67
|
[torch.compile] use size tuning for specific sizes (#10933)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-05 20:30:41 -08:00 |
|
youkaichao
|
9743d64e4e
|
[ci][build] add tests for python only compilation (#10915)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-05 08:54:47 -08:00 |
|
Konrad Zawora
|
a43065272f
|
[Misc][Gaudi] Avoid torch.compile and enable lazy collectives (#10897)
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
|
2024-12-05 08:47:46 -08:00 |
|
Isotr0py
|
998eeafe58
|
[CI/Build] Bump test transformers version (#10106)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-05 16:05:52 +00:00 |
|
Jee Jee Li
|
571da8fc43
|
[Misc][LoRA] Clean up the function interface of Punica (#10917)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-05 13:22:28 +00:00 |
|
Travis Johnson
|
39c89e71a8
|
[Misc] Update llama 3.2 template to support system prompt with images (#10901)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-12-05 05:54:06 +00:00 |
|
Jee Jee Li
|
1f958a7d52
|
[Bugfix] Fix BNB loader target_modules (#10720)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-05 13:20:26 +08:00 |
|
Cyrus Leung
|
aa39a8e175
|
[Doc] Create a new "Usage" section (#10827)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-05 11:19:35 +08:00 |
|
Michael Goin
|
8d370e91cb
|
[Bugfix] Fallback to outlines for complex json schemas (#10899)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-05 11:14:06 +08:00 |
|
Kevin H. Luu
|
7883c2bbe7
|
[benchmark] Make H100 benchmark optional (#10908)
|
2024-12-04 17:02:17 -08:00 |
|
Woosuk Kwon
|
2a56e1264f
|
[V1] Fix when max_model_len is not divisible by block_size (#10903)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-04 16:54:05 -08:00 |
|
Daniele
|
e4c34c23de
|
[CI/Build] improve python-only dev setup (#9621)
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-12-04 21:48:13 +00:00 |
|
Chendi.Xue
|
82eb5ea8f3
|
Benchmark serving structured output (#10880)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-12-04 16:28:21 -05:00 |
|
Isotr0py
|
10398b4706
|
[Model] Consolidate ViTs attention implementation without mask (#10893)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-04 18:11:08 +00:00 |
|
Xin Yang
|
01d079fd8e
|
[LoRA] Change lora_tokenizers capacity (#10796)
Signed-off-by: Xin Yang <xyang19@gmail.com>
|
2024-12-04 17:40:16 +00:00 |
|
Kevin H. Luu
|
c92acb9693
|
[ci/build] Update vLLM postmerge ECR repo (#10887)
|
2024-12-04 09:01:20 +00:00 |
|
jianzheng
|
8db957ee3a
|
[bugfix] fixed parameter “n” when set parameter “bestof” > 1 (#10854)
Signed-off-by: jianzheng <57654625+o2363286@users.noreply.github.com>
|
2024-12-04 08:48:22 +00:00 |
|
Kevin H. Luu
|
c9ca4fce3f
|
[ci/build] Job to build and push release image (#10877)
|
2024-12-04 15:02:40 +08:00 |
|
Kevin H. Luu
|
fa2dea61df
|
[ci/build] Change queue name for Release jobs (#10875)
|
2024-12-04 15:02:16 +08:00 |
|
wangxiyuan
|
b5b647b084
|
Drop ROCm load format check (#10767)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2024-12-04 04:32:21 +00:00 |
|
Tyler Michael Smith
|
d2bd88b122
|
[CI/Build] Replace mean with torch.all in test_pynccl.py (#10876)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-04 03:23:21 +00:00 |
|
Chendi.Xue
|
381ac93bb5
|
[Benchmark] Benchmark structured output with datasets (#10557)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
|
2024-12-03 17:21:06 -07:00 |
|
Gregory Shtrasberg
|
a061fe601e
|
[Build][Bugfix] Using the correct type hint (#10866)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2024-12-03 15:47:55 -05:00 |
|
tomeras91
|
7c32b6861e
|
[Frontend] correctly record prefill and decode time metrics (#10853)
Signed-off-by: Tomer Asida <tomera@ai21.com>
|
2024-12-03 19:13:31 +00:00 |
|
Michael Goin
|
7090c27bb2
|
[Bugfix] Only require XGrammar on x86 (#10865)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-03 10:32:21 -08:00 |
|
Yan Ma
|
2f2cdc745a
|
[MISC][XPU] quick fix for XPU CI (#10859)
Signed-off-by: yan ma <yan.ma@intel.com>
|
2024-12-03 17:16:31 +00:00 |
|
Alexander Matveev
|
3bc94cab69
|
[V1] VLM - Run the mm_mapper preprocessor in the frontend process (#10640)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-03 10:33:10 +00:00 |
|
Yang Zheng
|
f6084f6324
|
[Speculative Decoding] Move indices to device before filtering output (#10850)
Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>
|
2024-12-03 17:01:39 +08:00 |
|
Aaron Pham
|
9323a3153b
|
[Core][Performance] Add XGrammar support for guided decoding and set it as default (#10785)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-12-03 15:17:00 +08:00 |
|
Cyrus Leung
|
3257d449fa
|
[Misc] Remove deprecated names (#10817)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-03 06:52:57 +00:00 |
|
Russell Bryant
|
ef51831ee8
|
[Doc] Add github links for source code references (#10672)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-03 06:46:07 +00:00 |
|
youkaichao
|
dc5ce861bf
|
[torch.compile] remove compilation_context and simplify code (#10838)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-03 06:19:02 +00:00 |
|
youkaichao
|
21fe7b481a
|
[core][distributed] add pynccl broadcast (#10843)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-03 04:53:23 +00:00 |
|