Kyle Sayers
|
3f9b7ab9f5
|
[Doc] Update examples to remove SparseAutoModelForCausalLM (#12062)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-01-15 06:36:01 +00:00 |
|
youkaichao
|
ad34c0df0f
|
[core] platform agnostic executor via collective_rpc (#11256)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-15 13:45:21 +08:00 |
|
Rui Qiao
|
f218f9c24d
|
[core] Turn off GPU communication overlap for Ray executor (#12051)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-01-15 05:19:55 +00:00 |
|
Elfie Guo
|
0794e7446e
|
[Misc] Add multipstep chunked-prefill support for FlashInfer (#10467)
|
2025-01-15 12:47:49 +08:00 |
|
Woosuk Kwon
|
b7ee940a82
|
[V1][BugFix] Fix edge case in VLM scheduling (#12065)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-01-14 20:21:28 -08:00 |
|
Shanshan Shen
|
9ddac56311
|
[Platform] move current_memory_usage() into platform (#11369)
Signed-off-by: Shanshan Shen <467638484@qq.com>
|
2025-01-15 03:38:25 +00:00 |
|
Konrad Zawora
|
1a51b9f872
|
[HPU][Bugfix] Don't use /dev/accel/accel0 for HPU autodetection in setup.py (#12046)
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
|
2025-01-15 02:59:18 +00:00 |
|
Jee Jee Li
|
42f5e7c52a
|
[Kernel] Support MulAndSilu (#11624)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-15 02:29:53 +00:00 |
|
Jee Jee Li
|
a3a3ee4e6f
|
[Misc] Merge bitsandbytes_stacked_params_mapping and packed_modules_mapping (#11924)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-15 07:49:49 +08:00 |
|
maang-h
|
87054a57ab
|
[Doc]: Update the Json Example of the Engine Arguments document (#12045)
|
2025-01-14 17:03:04 +00:00 |
|
Harry Mellor
|
c9d6ff530b
|
Explain where the engine args go when using Docker (#12041)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-01-14 16:05:50 +00:00 |
|
Chen Zhang
|
a2d2acb4c8
|
[Bugfix][Kernel] Give unique name to BlockSparseFlashAttention (#12040)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-01-14 15:45:05 +00:00 |
|
wangxiyuan
|
2e0e017610
|
[Platform] Add output for Attention Backend (#11981)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-01-14 13:27:04 +00:00 |
|
Chen Zhang
|
1f18adb245
|
[Kernel] Revert the API change of Attention.forward (#12038)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-01-14 20:59:32 +08:00 |
|
Cyrus Leung
|
bb354e6b2d
|
[Bugfix] Fix various bugs in multi-modal processor (#12031)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-14 12:16:11 +00:00 |
|
youkaichao
|
ff39141a49
|
[HPU][misc] add comments for explanation (#12034)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-14 19:24:06 +08:00 |
|
TJian
|
8a1f938e6f
|
[Doc] Update Quantization Hardware Support Documentation (#12025)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-01-14 04:37:52 +00:00 |
|
Konrad Zawora
|
078da31903
|
[HPU][Bugfix] set_forward_context and CI test execution (#12014)
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
|
2025-01-14 11:04:18 +08:00 |
|
Woosuk Kwon
|
1a401252b5
|
[Docs] Add Sky Computing Lab to project intro (#12019)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-01-13 17:24:36 -08:00 |
|
Steve Luo
|
f35ec461fc
|
[Bugfix] Fix deepseekv3 gate bias error (#12002)
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2025-01-13 13:43:51 -07:00 |
|
Yikun Jiang
|
289b5191d5
|
[Doc] Fix build from source and installation link in README.md (#12013)
Signed-off-by: Yikun <yikunkero@gmail.com>
|
2025-01-13 17:23:59 +00:00 |
|
elijah
|
c6db21313c
|
bugfix: Fix signature mismatch in benchmark's get_tokenizer function (#11982)
Signed-off-by: elijah <f1renze.142857@gmail.com>
|
2025-01-13 15:22:07 +00:00 |
|
Shanshan Shen
|
a7d59688fb
|
[Platform] Move get_punica_wrapper() function to Platform (#11516)
Signed-off-by: Shanshan Shen <467638484@qq.com>
|
2025-01-13 13:12:10 +00:00 |
|
youkaichao
|
458e63a2c6
|
[platform] add device_control env var (#12009)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-13 20:59:09 +08:00 |
|
Harry Mellor
|
e8c23ff989
|
[Doc] Organise installation documentation into categories and tabs (#11935)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-01-13 12:27:36 +00:00 |
|
Roger Wang
|
cd8249903f
|
[Doc][V1] Update model implementation guide for V1 support (#11998)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-01-13 11:58:54 +00:00 |
|
Chen Zhang
|
0f8cafe2d1
|
[Kernel] unified_attention for Attention.forward (#11967)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-01-13 19:28:53 +08:00 |
|
Alex Brooks
|
5340a30d01
|
Fix Max Token ID for Qwen-VL-Chat (#11980)
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
|
2025-01-13 08:37:48 +00:00 |
|
youkaichao
|
89ce62a316
|
[platform] add ray_device_key (#11948)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-13 16:20:52 +08:00 |
|
Chenguang Li
|
c3f05b09a0
|
[Misc]Minor Changes about Worker (#11555)
Signed-off-by: Chenguang Li <757486878@qq.com>
|
2025-01-13 15:47:05 +08:00 |
|
Concurrensee
|
cf6bbcb493
|
[Misc] Fix Deepseek V2 fp8 kv-scale remapping (#11947)
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
|
2025-01-12 23:05:06 -08:00 |
|
Sungjae Lee
|
80ea3af1a0
|
[CI][Spec Decode] fix: broken test for EAGLE model (#11972)
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
|
2025-01-13 06:50:35 +00:00 |
|
Siyuan Li
|
9dd02d85ca
|
[Bug] Fix usage of .transpose() and .view() consecutively. (#11979)
|
2025-01-13 06:24:10 +00:00 |
|
Yangcheng Li
|
f7b3ba82c3
|
[MISC] fix typo in kv transfer send recv test (#11983)
|
2025-01-13 05:07:48 +00:00 |
|
Robert Shaw
|
619ae268c3
|
[V1] [2/n] Logging and Metrics - OutputProcessor Abstraction (#11973)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2025-01-13 04:54:10 +00:00 |
|
Isotr0py
|
d14e98d924
|
[Model] Support GGUF models newly added in transformers 4.46.0 (#9685)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-01-13 00:13:44 +00:00 |
|
Robert Shaw
|
9597a095f2
|
[V1][Core][1/n] Logging and Metrics (#11962)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2025-01-12 21:02:02 +00:00 |
|
Avshalom Manevich
|
263a870ee1
|
[Hardware][TPU] workaround fix for MoE on TPU (#11764)
|
2025-01-12 10:53:51 -05:00 |
|
Akshat Tripathi
|
8bddb73512
|
[Hardware][CPU] Multi-LoRA implementation for the CPU backend (#11100)
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Oleg Mosalov <oleg@krai.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Oleg Mosalov <oleg@krai.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-01-12 13:01:52 +00:00 |
|
Isotr0py
|
f967e51f38
|
[Model] Initialize support for Deepseek-VL2 models (#11578)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-01-12 00:17:24 -08:00 |
|
Rafael Vasquez
|
43f3d9e699
|
[CI/Build] Add markdown linter (#11857)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
|
2025-01-12 00:17:13 -08:00 |
|
Roger Wang
|
b25cfab9a0
|
[V1] Avoid sending text prompt to core engine (#11963)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-01-12 06:36:38 +00:00 |
|
sixgod
|
4b657d3292
|
[Model] Add cogagent model support vLLM (#11742)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-01-11 19:05:56 +00:00 |
|
Nicolò Lucchesi
|
d697dc01b4
|
[Bugfix] Fix RobertaModel loading (#11940)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-01-11 14:05:09 +00:00 |
|
Cyrus Leung
|
a991f7d508
|
[Doc] Basic guide for writing unit tests for new models (#11951)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-11 21:27:24 +08:00 |
|
Cyrus Leung
|
7a3a83e3b8
|
[CI/Build] Move model-specific multi-modal processing tests (#11934)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-11 13:50:05 +08:00 |
|
shaochangxu
|
c32a7c7c0c
|
[Bugfix] fused_experts_impl wrong compute type for float32 (#11921)
Signed-off-by: shaochangxu.scx <shaochangxu.scx@antgroup.com>
Co-authored-by: shaochangxu.scx <shaochangxu.scx@antgroup.com>
|
2025-01-11 13:49:39 +08:00 |
|
Sungjae Lee
|
2118d0565c
|
[Bugfix][SpecDecode] Adjust Eagle model architecture to align with intended design (#11672)
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
|
2025-01-10 20:49:38 -08:00 |
|
youkaichao
|
899136b857
|
[ci] fix broken distributed-tests-4-gpus (#11937)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-11 09:07:24 +08:00 |
|
Fred Reiss
|
c9f09a4fe8
|
[mypy] Fix mypy warnings in api_server.py (#11941)
Signed-off-by: Fred Reiss <frreiss@us.ibm.com>
|
2025-01-11 01:04:58 +00:00 |
|