safety_rope_v20260514 — DINOv3-S binary (cvat2 #10 +32 task 增量)
cvat2 #10 person bbox + safety_rope_use attr。Schema 改回 v9 binary (wrong/correct)。Unknown 已 filter。
ppe-demo handler 直接 hot-swap weight 即可(ckpt 內 labels + thr 完備)。
v20260514 變更(vs v20260513):
- head: Linear(256, 3) → Linear(256, 2)(v9 schema)
- filter unknown 樣本(manifest 已 0 樣本,no-op 安全)
- ckpt 內 labels=["wrong","correct"] 不再含 unknown
- ckpt 內 thr=0.2729(v9 + v511 都沒存讓 handler load 報錯)
- class weights wrong=1.5, correct=1.0(v9 原比例)
三版對照
| v9 | v20260512 | v20260513 | v20260514 (this, +21 task) |
| test AP | 0.9336 | 0.9308 | 0.9144 | 0.9242 |
| test F1 | 0.8770 | 0.8567 | 0.8707 | 0.8650 |
| test accuracy | 0.9070 | 0.8821 | 0.9004 | 0.8910 |
| test thr | 0.79 | 0.054 | 0.893 | 0.273 |
| val thr (deployed) | — | 0.114 | 0.124 | 0.2729 |
| train rows (after unknown filter) | — | 7644 | 8147 | 10299 |
| test rows | — | 2630 | 2953 | 2953 |
Test 指標
| AP | F1 | P | R | accuracy | thr | TP/FP/FN/TN |
| 0.9242 | 0.8650 | 0.8350 | 0.8974 |
0.8910 | 0.273 |
1032/204/118/1599 |
Per-class metrics
| class | P | R | F1 | support |
| wrong | 0.9313 | 0.8869 | 0.9085 | 1803 |
| correct | 0.8350 | 0.8974 | 0.8650 | 1150 |
Confusion matrix (rows=true, cols=pred)
| wrong | correct | row total |
| wrong | 1599 | 204 | 1803 |
| correct | 118 | 1032 | 1150 |
| col total | 1717 | 1236 | 2953 |
|---|
Training history
| ep | train_loss | val_AP | val_F1 | val_acc | val_thr |
| 1 | 0.5288 | 0.6223 | 0.6663 | 0.7475 | 0.470 |
| 2 | 0.2704 | 0.9400 | 0.8861 | 0.9173 | 0.147 |
| 3 | 0.1490 | 0.9512 | 0.8865 | 0.9173 | 0.260 |
| 4 | 0.1129 | 0.9511 | 0.8857 | 0.9169 | 0.259 |
| 5 | 0.3221 | 0.6575 | 0.6856 | 0.7008 | 0.306 |
| 6 | 0.3478 | 0.9494 | 0.8899 | 0.9169 | 0.163 |
| 7 | 0.0894 | 0.9533 | 0.9017 | 0.9258 | 0.081 |
| 8 | 0.0726 | 0.9499 | 0.8835 | 0.9106 | 0.271 |
| 9 | 0.0631 | 0.9543 | 0.8879 | 0.9192 | 0.600 |
| 10 | 0.0598 | 0.9570 | 0.8954 | 0.9247 | 0.358 |
| 11 | 0.0516 | 0.9554 | 0.9002 | 0.9266 | 0.231 |
| 12 | 0.0403 | 0.9556 | 0.8972 | 0.9240 | 0.225 |
| 13 | 0.0316 | 0.9489 | 0.8955 | 0.9229 | 0.216 |
| 14 | 0.0285 | 0.9537 | 0.8947 | 0.9240 | 0.414 |
| 15 | 0.0262 | 0.9508 | 0.8905 | 0.9207 | 0.602 |
| 16 | 0.0231 | 0.9535 | 0.8915 | 0.9221 | 0.689 |
| 17 | 0.0186 | 0.9520 | 0.8953 | 0.9232 | 0.347 |
| 18 | 0.0198 | 0.9506 | 0.8962 | 0.9255 | 0.721 |
Sample inference (test, 4 per class,box 顏色 = truth)
truth=wrong pred=wrong (P(correct)=1.5%, thr=27.3%)

truth=wrong pred=wrong (P(correct)=0.0%, thr=27.3%)

truth=wrong pred=wrong (P(correct)=0.0%, thr=27.3%)

truth=wrong pred=wrong (P(correct)=8.0%, thr=27.3%)

truth=correct pred=correct (P(correct)=99.9%, thr=27.3%)

truth=correct pred=correct (P(correct)=97.8%, thr=27.3%)

truth=correct pred=correct (P(correct)=99.9%, thr=27.3%)

truth=correct pred=correct (P(correct)=99.9%, thr=27.3%)

FP audit — top-8 最自信錯判(sampled 300 test imgs)
truth=wrong → pred=correct (conf=99.9%)

truth=correct → pred=wrong (conf=99.8%)

truth=wrong → pred=correct (conf=99.7%)

truth=correct → pred=wrong (conf=99.4%)

truth=correct → pred=wrong (conf=98.7%)

truth=wrong → pred=correct (conf=96.8%)

truth=wrong → pred=correct (conf=94.2%)

truth=correct → pred=wrong (conf=88.3%)

Config
{
"version": "v20260514",
"backbone_name": "vit_small_patch16_dinov3",
"arch": "DINOv3-S + RoIAlign + MLP 2-cls (wrong/correct) + photometric + random_erase + camaug — v9 binary復刻",
"params_M": 22.47245,
"img_size": [
1280,
720
],
"feat_ch": 384,
"expand": {
"x": 1.0,
"y_top": 0.2,
"y_bot": 1.5
},
"jitter": {
"center": 0.2,
"size": [
0.7,
1.4
],
"ex_x": [
0.5,
1.5
],
"ex_yt": [
0.5,
2.0
],
"ex_yb": [
0.7,
1.3
]
},
"class_weights": {
"wrong": 1.5,
"correct": 1.0
},
"labels": [
"wrong",
"correct"
],
"thr": 0.27294921875,
"best_val_AP": 0.9570321404906651,
"best_epoch": 10,
"epochs_run": 18,
"total_train_time_s": 2504.8716027736664,
"test_metrics": {
"ap": 0.9241776957176995,
"acc": 0.8909583474432781,
"p": 0.8349514563106796,
"r": 0.8973913043478261,
"f1": 0.865046102263202,
"thr": 0.27294921875,
"tp": 1032,
"fp": 204,
"fn": 118,
"tn": 1599,
"n_pos": 1150,
"n_total": 2953,
"per_class": {
"wrong": {
"precision": 0.9312754804892254,
"recall": 0.8868552412645591,
"f1": 0.9085227272727273,
"support": 1803
},
"correct": {
"precision": 0.8349514563106796,
"recall": 0.8973913043478261,
"f1": 0.865046102263202,
"support": 1150
}
},
"confusion_matrix": [
[
1599,
204
],
[
118,
1032
]
]
},
"hyperparams": {
"batch": 8,
"epochs": 20,
"lr": 5e-05,
"wd": 0.01,
"patience": 8
}
}