youkaichao
|
4a9139780a
|
[cd][release] add pypi index for every commit and nightly build (#11404)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-12-21 23:53:44 -08:00 |
|
Roger Wang
|
29c748930e
|
[CI] Fix flaky entrypoint tests (#11403)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-12-21 21:08:44 -08:00 |
|
Roger Wang
|
c2d1b075ba
|
[Bugfix] Fix issues for Pixtral-Large-Instruct-2411 (#11393)
Signed-off-by: ywang96 <ywang@example.com>
Co-authored-by: ywang96 <ywang@example.com>
|
2024-12-21 10:15:03 +00:00 |
|
Ricky Xu
|
584f0ae40d
|
[V1] Make AsyncLLMEngine v1-v0 opaque (#11383)
Signed-off-by: Ricky Xu <xuchen727@hotmail.com>
|
2024-12-21 15:14:08 +08:00 |
|
George
|
51ff216d85
|
[Bugfix] update should_ignore_layer (#11354)
Signed-off-by: George Ohashi <george@neuralmagic.com>
|
2024-12-21 06:36:23 +00:00 |
|
Woosuk Kwon
|
dd2b5633dd
|
[V1][Bugfix] Skip hashing empty or None mm_data (#11386)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-21 14:22:21 +09:00 |
|
Jiaxin Shan
|
47a0b615b4
|
Add ray[default] to wget to run distributed inference out of box (#11265)
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
|
2024-12-20 13:54:55 -08:00 |
|
youkaichao
|
5d2248d81a
|
[doc] explain nccl requirements for rlhf (#11381)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-20 13:00:56 -08:00 |
|
Michael Goin
|
d573aeadcc
|
[Bugfix] Don't log OpenAI field aliases as ignored (#11378)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-20 19:03:50 +00:00 |
|
omer-dayan
|
995f56236b
|
[Core] Loading model from S3 using RunAI Model Streamer as optional loader (#10192)
Signed-off-by: OmerD <omer@run.ai>
|
2024-12-20 16:46:24 +00:00 |
|
Daniele
|
7c7aa37c69
|
[CI/Build] fix pre-compiled wheel install for exact tag (#11373)
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
|
2024-12-21 00:14:40 +08:00 |
|
Roger Wang
|
04139ade59
|
[V1] Fix profiling for models with merged input processor (#11370)
Signed-off-by: ywang96 <ywang@roblox.com>
|
2024-12-20 12:04:21 +00:00 |
|
youkaichao
|
1ecc645b8f
|
[doc] backward compatibility for 0.6.4 (#11359)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-19 21:33:53 -08:00 |
|
youkaichao
|
c954f21ac0
|
[misc] add early error message for custom ops (#11355)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-19 21:18:25 -08:00 |
|
Wallas Henrique
|
86c2d8fd1c
|
[Bugfix] Fix spec decoding when seed is none in a batch (#10863)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
|
2024-12-20 05:15:31 +00:00 |
|
Michael Goin
|
b880ffb87e
|
[Misc] Add tqdm progress bar during graph capture (#11349)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-20 04:35:18 +00:00 |
|
youkaichao
|
7801f56ed7
|
[ci][gh200] dockerfile clean up (#11351)
Signed-off-by: drikster80 <ed.sealing@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: drikster80 <ed.sealing@gmail.com>
Co-authored-by: cenzhiyao <2523403608@qq.com>
|
2024-12-19 18:13:06 -08:00 |
|
Akash kaothalkar
|
48edab8041
|
[Bugfix][Hardware][POWERPC] Fix auto dtype failure in case of POWER10 (#11331)
Signed-off-by: Akash Kaothalkar <0052v2@linux.vnet.ibm.com>
|
2024-12-20 01:32:07 +00:00 |
|
Yuan
|
a985f7af9f
|
[CI] Adding CPU docker pipeline (#11261)
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Co-authored-by: Kevin H. Luu <kevin@anyscale.com>
|
2024-12-19 11:46:55 -08:00 |
|
yangzhibin
|
e461c262f0
|
[Misc] Remove unused vllm/block.py (#11336)
|
2024-12-19 17:54:24 +00:00 |
|
Isotr0py
|
276738ce0f
|
[Bugfix] Fix broken CPU compressed-tensors test (#11338)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-19 17:37:31 +00:00 |
|
Cyrus Leung
|
cdf22afdda
|
[Misc] Clean up and consolidate LRUCache (#11339)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-20 00:59:32 +08:00 |
|
Isotr0py
|
e24113a8fe
|
[Model] Refactor Qwen2-VL to use merged multimodal processor (#11258)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-19 16:28:00 +00:00 |
|
Roger Wang
|
7379b3d4b2
|
[V1] Fix multimodal profiling for Molmo (#11325)
Signed-off-by: ywang96 <ywang@example.com>
Co-authored-by: ywang96 <ywang@example.com>
|
2024-12-19 16:27:22 +00:00 |
|
Yehoshua Cohen
|
6c7f881541
|
[Model] Add JambaForSequenceClassification model (#10860)
Signed-off-by: Yehoshua Cohen <yehoshuaco@ai21.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Yehoshua Cohen <yehoshuaco@ai21.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-19 22:48:06 +08:00 |
|
Cyrus Leung
|
a0f7d53beb
|
[Bugfix] Cleanup Pixtral HF code (#11333)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-19 13:22:00 +00:00 |
|
Yanyi Liu
|
5aef49806d
|
[Feature] Add load generation config from model (#11164)
Signed-off-by: liuyanyi <wolfsonliu@163.com>
Signed-off-by: Yanyi Liu <wolfsonliu@163.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-12-19 10:50:38 +00:00 |
|
Varun Sundar Rabindranath
|
98356735ac
|
[misc] benchmark_throughput : Add LoRA (#11267)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-12-19 15:43:16 +08:00 |
|
Rui Qiao
|
f26c4aeecb
|
[Misc] Optimize ray worker initialization time (#11275)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-12-18 23:38:02 -08:00 |
|
Varun Sundar Rabindranath
|
8936316d58
|
[Kernel] Refactor Cutlass c3x (#10049)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-12-19 07:00:18 +00:00 |
|
Cyrus Leung
|
6142ef0ada
|
[VLM] Merged multimodal processor for Qwen2-Audio (#11303)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-19 06:14:17 +00:00 |
|
Chen Zhang
|
c6b0a7d3ba
|
[V1] Simplify prefix caching logic by removing num_evictable_computed_blocks (#11310)
|
2024-12-19 04:17:12 +00:00 |
|
Michael Goin
|
a30482f054
|
[CI] Expand test_guided_generate to test all backends (#11313)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-19 04:00:38 +00:00 |
|
Travis Johnson
|
17ca964273
|
[Model] IBM Granite 3.1 (#11307)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-12-19 11:27:24 +08:00 |
|
Tyler Michael Smith
|
5a9da2e6e9
|
[Bugfix][Build/CI] Fix sparse CUTLASS compilation on CUDA [12.0, 12.2) (#11311)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-19 02:43:30 +00:00 |
|
Alexander Matveev
|
fdea8ec167
|
[V1] VLM - enable processor cache by default (#11305)
Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>
|
2024-12-18 18:54:46 -05:00 |
|
Joe Runde
|
ca5f54a9b9
|
[Bugfix] fix minicpmv test (#11304)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-12-18 10:34:26 -08:00 |
|
Kunshang Ji
|
f954fe0e65
|
[FIX] update openai version (#11287)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2024-12-18 10:17:05 -08:00 |
|
Simon Mo
|
362cff1eb3
|
[CI][Misc] Remove Github Action Release Workflow (#11274)
|
2024-12-18 10:16:53 -08:00 |
|
Isotr0py
|
996aa70f00
|
[Bugfix] Fix broken phi3-v mm_processor_kwargs tests (#11263)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-18 10:16:40 -08:00 |
|
Dipika Sikka
|
60508ffda9
|
[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995)
Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com>
Co-authored-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2024-12-18 09:57:16 -05:00 |
|
Yan Ma
|
f04e407e6b
|
[MISC][XPU]update ipex link for CI fix (#11278)
|
2024-12-17 22:34:23 -08:00 |
|
Wallas Henrique
|
8b79f9e107
|
[Bugfix] Fix guided decoding with tokenizer mode mistral (#11046)
|
2024-12-17 22:34:08 -08:00 |
|
Konrad Zawora
|
866fa4550d
|
[Bugfix] Restore support for larger block sizes (#11259)
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
|
2024-12-17 16:39:07 -08:00 |
|
Cody Yu
|
bf8717ebae
|
[V1] Prefix caching for vision language models (#11187)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-12-17 16:37:59 -08:00 |
|
Michael Goin
|
c77eb8a33c
|
[Bugfix] Set temperature=0.7 in test_guided_choice_chat (#11264)
|
2024-12-17 16:34:06 -08:00 |
|
Joe Runde
|
2d1b9baa8f
|
[Bugfix] Fix request cancellation without polling (#11190)
v0.6.5
|
2024-12-17 12:26:32 -08:00 |
|
Isotr0py
|
f9ecbb18bf
|
[Misc] Allow passing logits_soft_cap for xformers backend (#11252)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-17 00:37:04 -08:00 |
|
Roger Wang
|
02222a0256
|
[Misc] Kernel Benchmark for RMSNorm (#11241)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Xiaoyu Zhang <BBuf@users.noreply.github.com>
|
2024-12-17 06:57:02 +00:00 |
|
Tyler Michael Smith
|
2bfdbf2a36
|
[V1][Core] Use weakref.finalize instead of atexit (#11242)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-16 22:11:33 -08:00 |
|