Varun Sundar Rabindranath
03f48b3db6
[Core] LoRA V1 - Add add/pin/list/remove_lora functions ( #13705 )
2025-02-25 00:18:02 -08:00
Michael Goin
4d251ad00e
Fix CompressedTensorsWNA16MoE with grouped scales ( #13769 )
2025-02-25 00:17:14 -08:00
Michael Goin
18e505930d
[Bugfix] Support MLA for CompressedTensorsWNA16 ( #13725 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-02-25 06:10:31 +00:00
Lucas Wilkinson
4a8cfc7551
[Bugfix] Fix deepseek-v2 error: "missing 1 required positional argument: 'residual'" ( #13802 )
2025-02-24 20:33:59 -08:00
Mark McLoughlin
bc32bc73aa
[V1][Metrics] Implement vllm:lora_requests_info metric ( #13504 )
2025-02-24 20:01:33 -08:00
wangxiyuan
ab1091d5f2
[Misc][Attention][Quantization] init property earlier ( #13733 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-02-25 03:19:30 +00:00
Tyler Michael Smith
1e15aaef56
[Bugfix][Quantization] Fix FP8 + EP ( #13784 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-02-25 10:54:17 +08:00
cjackal
51010a1807
[Misc] set single whitespace between log sentences ( #13771 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
2025-02-25 10:26:12 +08:00
Eli Boyarski
7196a3b1db
[Doc] arg_utils.py: fixed a typo ( #13785 )
2025-02-24 18:23:04 -08:00
Harry Mellor
cdc1fa12eb
Remove unused kwargs from model definitions ( #13555 )
2025-02-24 17:13:52 -08:00
Robert Shaw
f61528d46d
[Misc][Chore] Clean Up AsyncOutputProcessing Logs ( #13780 )
2025-02-24 16:39:07 -08:00
Robert Shaw
1f0ae3ed0a
[Misc] Clean Up EngineArgs.create_engine_config ( #13734 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
2025-02-24 13:52:21 -05:00
Michael Goin
db986c19ea
Fix precommit fail in fused_moe intermediate_cache2 chunking ( #13772 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-02-24 09:25:47 -08:00
Roger Wang
227578480d
Revert "[V1][Core] Fix memory issue with logits & sampling" ( #13775 )
2025-02-24 09:16:05 -08:00
afeldman-nm
befc402d34
[V1] V1 engine implements parallel sampling (AsyncLLM and LLMEngine) ( #10980 )
...
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-02-24 08:29:41 -08:00
Nicolò Lucchesi
444b0f0f62
[Misc][Docs] Raise error when flashinfer is not installed and VLLM_ATTENTION_BACKEND is set ( #12513 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-02-24 10:43:21 -05:00
Zhonghua Deng
ccc00515fd
[BugFix] Illegal memory access for MoE On H20 ( #13693 )
2025-02-24 07:37:32 -08:00
Jongseok Park
781096e385
Expert Parallelism (EP) Support for DeepSeek V2 ( #12583 )
2025-02-24 07:33:20 -08:00
Roger Meier
7940d8a6a7
[CI/Build] add python-json-logger to requirements-common ( #12842 )
2025-02-24 06:10:33 -08:00
Roger Meier
c0e3ecd6d2
[Bugfix] fix(logging): add missing opening square bracket ( #13011 )
2025-02-24 06:10:25 -08:00
Mengqing Cao
23eca9cf68
[model][refactor] remove cuda hard code in models and layers ( #13658 )
2025-02-24 06:10:14 -08:00
Roger Wang
437b76ff59
[V1][Core] Fix memory issue with logits & sampling ( #13721 )
2025-02-24 06:10:06 -08:00
Kevin H. Luu
f90a375593
[ci] Add logic to change model to S3 path only when S3 CI env var is on ( #13727 )
...
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-63-253.us-west-2.compute.internal>
2025-02-24 06:32:11 +00:00
Huy Do
e7ef74e26e
Fix some issues with benchmark data output ( #13641 )
...
Signed-off-by: Huy Do <huydhn@gmail.com>
2025-02-24 10:23:18 +08:00
Nick Hill
cbae7af552
[V1][BugFix] Fix engine core client shutdown hangs ( #13298 )
...
Even though ZMQ context.destroy() is meant to close open sockets before terminating the context, it appears to be necessary to do this explicitly or else it can hang in the context.term() method.
Close zmq sockets explicitly before terminating context, make shutdown of client resource more robust, shut down engine core process prior to terminating zmq context.
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-02-23 13:07:43 -08:00
youkaichao
eb24dc4a45
[v1] torchrun compatibility ( #13642 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-23 22:47:24 +08:00
Roger Wang
9bebc9512f
[Misc] Deprecate --dataset from benchmark_serving.py ( #13708 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-02-23 13:32:20 +00:00
Nick Hill
5a2ba16f5c
[Core][Distributed] Use IPC (domain socket) ZMQ socket for local comms ( #13688 )
2025-02-23 02:54:29 -08:00
Isotr0py
ba5106e519
[LMM] Implement merged multimodal processor for whisper ( #13278 )
2025-02-23 01:46:03 -08:00
Kyle Sayers
d5ca2110f1
[Quant] BaiChuan SupportsQuant ( #13710 )
2025-02-22 19:21:15 -08:00
Kevin H. Luu
2c5e637b57
[ci] Use env var to control whether to use S3 bucket in CI ( #13634 )
2025-02-22 19:19:45 -08:00
Andy Lo
322d2a27d6
[BugFix] Minor: logger import in attention backend ( #13706 )
...
Signed-off-by: Andy Lo <andy@mistral.ai>
2025-02-22 16:51:13 -08:00
Roger Wang
82e0d601fc
[CI/Build] Fix pre-commit errors from #13571 ( #13709 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-02-22 16:50:38 -08:00
Daniele
78ac0f591d
[CI/Build] fix uv caching in Dockerfile ( #13611 )
2025-02-22 08:25:20 -08:00
Yan Ma
b56155e7f3
[XPU]fix setuptools version for xpu ( #13548 )
2025-02-22 08:05:35 -08:00
Helena Kloosterman
382f66fb08
[Bugfix] Fix boolean conversion for OpenVINO env variable ( #13615 )
2025-02-22 08:04:12 -08:00
Cyrus Leung
8354f6640c
[Doc] Dockerfile instructions for optional dependencies and dev transformers ( #13699 )
2025-02-22 06:04:31 -08:00
Gregory Shtrasberg
c904fdddf6
[ROCm] Apply FP8 weights padding to values not divisible by 512 bytes on ROCm ( #13231 )
2025-02-22 05:54:38 -08:00
Sage Moore
558db8083c
[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no longer has to pass in the context lengths ( #13095 )
2025-02-22 05:25:41 -08:00
Kaixi Hou
e109e598c7
[NVIDIA] Support nvfp4 cutlass gemm ( #13571 )
2025-02-22 05:24:05 -08:00
Keyun Tong
8db1b9d0a1
Support SSL Key Rotation in HTTP Server ( #13495 )
2025-02-22 05:17:44 -08:00
youkaichao
2382ad29d1
[ci] fix linter ( #13701 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-22 20:28:59 +08:00
youkaichao
3e472d882a
[core] set up data parallel communication ( #13591 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-22 19:28:59 +08:00
Cyrus Leung
7f6bae561c
[CI/Build] Fix pre-commit errors ( #13696 )
2025-02-22 00:31:26 -08:00
Jee Jee Li
105b8ce4c0
[Misc] Reduce LoRA-related static variable ( #13166 )
2025-02-22 00:21:30 -08:00
Mark McLoughlin
2cb8c1540e
[Metrics] Add --show-hidden-metrics-for-version CLI arg ( #13295 )
2025-02-22 00:20:45 -08:00
Mark McLoughlin
1cd981da4f
[V1][Metrics] Support vllm:cache_config_info ( #13299 )
2025-02-22 00:20:00 -08:00
Yu Chin Fabian Lim
fca20841c2
Correction to TP logic for Mamba Mixer 2 when Num Groups not divisible by TP Size ( #13660 )
2025-02-22 00:19:10 -08:00
Jennifer Zhao
da31b5333e
[Bugfix] V1 Memory Profiling: V0 Sampler Integration without Rejection Sampler ( #13594 )
...
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-02-22 00:08:29 -08:00
Lu Fang
bb78fb318e
[v1] Support allowed_token_ids in v1 Sampler ( #13210 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-02-22 14:13:05 +08:00