3392 Commits

Author SHA1 Message Date
sroy745
b41fb9d3b1
[Encoder Decoder] Update Mllama to run with both FlashAttention and XFormers (#9982)
Signed-off-by: Sourashis Roy <sroy@roblox.com>
2024-11-12 10:53:57 -08:00
Woosuk Kwon
7c65527918
[V1] Use pickle for serializing EngineCoreRequest & Add multimodal inputs to EngineCoreRequest (#10245)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-12 08:57:14 -08:00
zifeitong
47db6ec831
[Frontend] Add per-request number of cached token stats (#10174) 2024-11-12 16:42:28 +00:00
Jie Fu (傅杰)
176fcb1c71
[Bugfix] Fix QwenModel argument (#10262)
Signed-off-by: Jie Fu <jiefu@tencent.com>
2024-11-12 16:36:51 +00:00
Jee Jee Li
a838ba7254
[Misc]Fix Idefics3Model argument (#10255)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-12 13:07:11 +00:00
Guillaume Calmettes
36c513a076
[BugFix] Do not raise a ValueError when tool_choice is set to the supported none option and tools are not defined. (#10000)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
2024-11-12 11:13:46 +00:00
Yuan
d201d41973
[CI][CPU]refactor CPU tests to allow to bind with different cores (#10222)
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
2024-11-12 10:07:32 +00:00
youkaichao
3a28f18b0b
[doc] explain the class hierarchy in vLLM (#10240)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-11 22:56:44 -08:00
Aleksandr Malyshev
812c981fa0
Splitting attention kernel file (#10091)
Signed-off-by: maleksan85 <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
2024-11-11 22:55:07 -08:00
Jee Jee Li
7f5edb5900
[Misc][LoRA] Replace hardcoded cuda device with configurable argument (#10223)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-12 11:10:15 +08:00
youkaichao
eea55cca5b
[1/N] torch.compile user interface design (#10237)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-11 18:01:06 -08:00
Russell Bryant
9cdba9669c
[Doc] Update help text for --distributed-executor-backend (#10231)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-12 09:55:09 +08:00
youkaichao
d1c6799b88
[doc] update debugging guide (#10236)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-11 15:21:12 -08:00
Robert Shaw
6ace6fba2c
[V1] AsyncLLM Implementation (#9826)
Signed-off-by: Nick Hill <nickhill@us.ibm.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-11-11 23:05:38 +00:00
Nikolai Shcheglov
08f93e7439
Make shutil rename in python_only_dev (#10233)
Signed-off-by: shcheglovnd <shcheglovnd@avride.ai>
2024-11-11 14:29:19 -08:00
Woosuk Kwon
9d5b4e4dea
[V1] Enable custom ops with piecewise CUDA graphs (#10228)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-11 11:58:07 -08:00
youkaichao
8a7fe47d32
[misc][distributed] auto port selection and disable tests (#10226)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-11 11:54:59 -08:00
Yuan Tang
4800339c62
Add docs on serving with Llama Stack (#10183)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2024-11-11 11:28:55 -08:00
Woosuk Kwon
fe15729a2b
[V1] Use custom ops for piecewise CUDA graphs (#10227)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-11 11:26:48 -08:00
youkaichao
330e82d34a
[v1][torch.compile] support managing cudagraph buffer (#10203)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-11 11:10:27 -08:00
Woosuk Kwon
d7a4f2207b
[V1] Do not use inductor for piecewise CUDA graphs (#10225)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-11 11:05:57 -08:00
Woosuk Kwon
f9dadfbee3
[V1] Fix detokenizer ports (#10224)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-11 10:42:07 -08:00
dependabot[bot]
25144ceed0
Bump actions/setup-python from 5.2.0 to 5.3.0 (#10209)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-11 17:24:10 +00:00
youkaichao
e6de9784d2
[core][distributed] add stateless process group (#10216)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-11 09:02:14 -08:00
Yangcheng Li
36fc439de0
[Doc] fix doc string typo in block_manager swap_out function (#10212) 2024-11-11 08:53:07 -08:00
harrywu
874f551b36
[Metrics] add more metrics (#4464)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-12 00:17:38 +08:00
Isotr0py
2cebda42bb
[Bugfix][Hardware][CPU] Fix broken encoder-decoder CPU runner (#10218)
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-11 12:37:58 +00:00
Roger Wang
5fb1f935b0
[V1] Allow tokenizer_mode and trust_remote_code for Detokenizer (#10211)
Signed-off-by: Roger Wang <ywang@roblox.com>
2024-11-11 18:01:18 +08:00
Jee Jee Li
36e4acd02a
[LoRA][Kernel] Remove the unused libentry module (#10214)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-11 09:43:23 +00:00
Isotr0py
58170d6503
[Hardware][CPU] Add embedding models support for CPU backend (#10193)
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-11 08:54:28 +00:00
dependabot[bot]
9804ac7c7c
Bump the patch-update group with 5 updates (#10210)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-11 07:22:40 +00:00
youkaichao
f89d18ff74
[6/N] pass whole config to inner model (#10205)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-11 06:41:46 +00:00
youkaichao
f0f2e5638e
[doc] improve debugging code (#10206)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-10 17:49:40 -08:00
yansh97
ad9a78bf64
[Doc] Fix typo error in vllm/entrypoints/openai/cli_args.py (#10196) 2024-11-11 00:14:22 +00:00
youkaichao
73b9083e99
[misc] improve cloudpickle registration and tests (#10202)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-11 00:10:53 +00:00
Shawn Du
20cf2f553c
[Misc] small fixes to function tracing file path (#9543)
Signed-off-by: Shawn Du <shawnd200@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-11-10 15:21:06 -08:00
Yongzao
bfb7d61a7c
[doc] Polish the integration with huggingface doc (#10195)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-11-10 10:22:04 -08:00
FuryMartin
19682023b6
[Doc] Fix typo error in CONTRIBUTING.md (#10190)
Signed-off-by: FuryMartin <furymartin9910@outlook.com>
2024-11-10 07:47:24 +00:00
youkaichao
9fa4bdde9d
[ci][build] limit cmake version (#10188)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-09 16:27:26 -08:00
Cyrus Leung
51c2e1fcef
[CI/Build] Split up models tests (#10069)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-09 11:39:14 -08:00
Krishna Mandal
b09895a618
[Frontend][Core] Override HF config.json via CLI (#5836)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-09 16:19:27 +00:00
cjackal
d88bff1b96
[Frontend] add add_request_id middleware (#9594)
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
2024-11-09 10:18:29 +00:00
Zhao Yingzhuo
9e37266420
bugfix: fix the bug that stream generate not work (#2756) 2024-11-09 10:09:48 +00:00
youkaichao
8a4358ecb5
[doc] explaining the integration with huggingface (#10173)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-09 01:02:54 -08:00
youkaichao
bd46357ad9
[bugfix] fix broken tests of mlp speculator (#10177)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-09 00:04:50 -08:00
bnellnm
f192aeba74
[Bugfix] Enable some fp8 and quantized fullgraph tests (#10171)
Signed-off-by: Bill Nell <bill@neuralmagic.com>
2024-11-09 08:01:27 +00:00
Chendi.Xue
8e1529dc57
[CI/Build] Add run-hpu-test.sh script (#10167)
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
2024-11-09 06:26:52 +00:00
youkaichao
1a95f10ee7
[5/N] pass the whole config to model (#9983)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-09 14:17:28 +08:00
Cyrus Leung
49d2a41a86
[Doc] Adjust RunLLM location (#10176)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-08 20:07:10 -08:00
Isotr0py
47672f38b5
[CI/Build] Fix VLM broadcast tests tensor_parallel_size passing (#10161)
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-09 04:02:59 +00:00