17 Commits

Author SHA1 Message Date
Janson Lau
b2253d1807
Update model.py 2025-07-27 23:42:47 +08:00
Janson Lau
c21638c56c
Update model.py 2025-07-27 23:36:35 +08:00
Janson Lau
292b8a34d8
Create model.py 2025-07-27 23:34:37 +08:00
Janson Lau
b265f3795c
Delete inference/model.py 2025-07-27 21:54:05 +08:00
Janson Lau
9fabdf8ae6
Create model.py @greptile 2025-07-27 21:32:16 +08:00
Janson Lau
e1daf07be1
Delete inference/model.py 2025-07-27 21:31:51 +08:00
Janson Lau
55f36bafc7
Create model.py 2025-07-27 21:30:22 +08:00
Janson Lau
e5f8de034b
Delete inference/model.py 2025-07-27 21:30:04 +08:00
huxuedan
d29a967601 modify the explanation of MLA 2025-02-26 17:07:39 +08:00
Xingkai Yu
1398800ebf
fix scores mask 2025-02-14 20:26:45 +08:00
Xingkai Yu
5ee97a83f0
fix comment 2025-02-07 16:42:55 +08:00
Xingkai Yu
87a01053e4
Merge pull request #556 from XxAlonexX/main
Fix Linear Layer Bias Initialization
2025-02-05 16:23:02 +08:00
XxAlonexX
6a30b43249 Fix Linear Layer Bias Initialization 2025-02-04 10:38:45 +05:30
Roman Fitzjalen
2756e130c2 clarify assertion error 2025-01-28 13:16:54 +01:00
enoch kan
bc77f22afc Updated model.py docstrings 2025-01-05 18:24:31 +00:00
GeeeekExplorer
fd011c11aa torch rmsnorm 2025-01-05 14:33:48 +08:00
stack-heap-overflow
4c2fdb8f55 Release DeepSeek-V3 2024-12-26 19:01:57 +08:00