Cyrus Leung
4bdf7ac593
[Bugfix] Fix SHM cache initialization ( #26427 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-09 02:48:04 -07:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort ( #26247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Ben Browning
ea25a76c05
[BugFix] Use async Mistral Tokenizer in Chat Completions ( #26134 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-10-04 09:42:08 +08:00
Joe Runde
4db72e57f6
[Bugfix][Refactor] Unify model management in frontend ( #11660 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-01-01 02:21:51 +00:00
Jiaxin Shan
85362f028c
[Misc][LoRA] Ensure Lora Adapter requests return adapter name ( #11094 )
...
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-12 09:25:16 +00:00
Jiaxin Shan
260d40b5ea
[Core] Support Lora lineage and base model metadata management ( #6315 )
2024-09-20 06:20:56 +00:00
Alexander Matveev
7c7714d856
[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH ( #8157 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-09-18 13:56:58 +00:00
Jiaxin Shan
db3bf7c991
[Core] Support load and unload LoRA in api server ( #6566 )
...
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2024-09-05 18:10:33 -07:00