safety_rope_v20260511 — DINOv3-S + RoI Align HD1280 (3-class)
cvat2 #10 person bbox + safety_rope_use 屬性。架構:DINOv3-S backbone + RoIAlign(7×7) + MLP head 3-cls (unknown/correct/wrong)。class weights = [1.0, 1.5, 1.5],照 v9 stack 沿用 photometric + random_erase + camaug。
實際資料分佈:train/val/test 都 0 個 unknown 樣本 — 等同 v9 的 2-class (correct/wrong) 任務。模型架構保留 3-class output 給未來擴展,但 unknown logit 此次未訓練。
v9 對照(公平:用 active 2-cls 平均)
| v9 (2-class, ViT-S DINOv3) | v20260511 (3-class arch) |
| test AP / mean active AP | 0.9336 | 0.9581 |
| test F1 / mean active F1 | 0.8770 | 0.8956 |
| test accuracy | 0.9070 | 0.9015 |
| macro AP (including unknown=0 support) | — | 0.6387 |
| macro F1 (including unknown=0 support) | — | 0.5971 |
Test 指標
| macro AP | macro F1 | accuracy | N |
| 0.6387 | 0.5971 | 0.9015 | 2630 |
Per-class metrics
| class | AP | P | R | F1 | support |
| unknown | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0 |
| correct | 0.9461 | 0.9170 | 0.8291 | 0.8708 | 1053 |
| wrong | 0.9701 | 0.8927 | 0.9499 | 0.9204 | 1577 |
Confusion matrix (rows=true, cols=pred)
| unknown | correct | wrong | row total |
| unknown | 0 | 0 | 0 | 0 |
| correct | 0 | 873 | 180 | 1053 |
| wrong | 0 | 79 | 1498 | 1577 |
| col total | 0 | 952 | 1678 | 2630 |
|---|
Sample inference (4 per class,inner box = truth, outer box = expanded RoI used as model input)
truth=correct pred=correct (100%) [u=0 c=100 w=0]

truth=correct pred=correct (100%) [u=0 c=100 w=0]

truth=correct pred=correct (100%) [u=0 c=100 w=0]

truth=correct pred=correct (100%) [u=0 c=100 w=0]

truth=wrong pred=wrong (100%) [u=0 c=0 w=100]

truth=wrong pred=wrong (100%) [u=0 c=0 w=100]

truth=wrong pred=wrong (100%) [u=0 c=0 w=100]

truth=wrong pred=wrong (100%) [u=0 c=0 w=100]

FP audit — top-8 最自信的錯判(sampled from 200 test)
box 顏色 = truth class;標題顯示 predicted class。
truth=correct → pred=wrong (100%)

truth=wrong → pred=correct (100%)

truth=correct → pred=wrong (100%)

truth=correct → pred=wrong (97%)

truth=correct → pred=wrong (92%)

truth=correct → pred=wrong (92%)

truth=wrong → pred=correct (90%)

truth=wrong → pred=correct (87%)

Training history
| ep | train_loss | val_macroAP | val_macroF1 | val_acc |
| 1 | 0.6700 | 0.5051 | 0.4500 | 0.6768 |
| 2 | 0.5897 | 0.5418 | 0.3952 | 0.6110 |
| 3 | 0.2450 | 0.6210 | 0.5519 | 0.8282 |
| 4 | 0.1439 | 0.6194 | 0.5571 | 0.8372 |
| 5 | 0.1088 | 0.5991 | 0.5499 | 0.8270 |
| 6 | 0.0856 | 0.6146 | 0.5545 | 0.8330 |
| 7 | 0.0769 | 0.6108 | 0.5572 | 0.8372 |
| 8 | 0.0638 | 0.6230 | 0.5909 | 0.8917 |
| 9 | 0.0604 | 0.6163 | 0.5567 | 0.8354 |
| 10 | 0.0442 | 0.6157 | 0.5779 | 0.8689 |
| 11 | 0.0422 | 0.6181 | 0.5853 | 0.8803 |
| 12 | 0.0340 | 0.6183 | 0.5881 | 0.8863 |
| 13 | 0.0305 | 0.6107 | 0.5709 | 0.8588 |
| 14 | 0.0231 | 0.6170 | 0.5794 | 0.8701 |
| 15 | 0.0214 | 0.6131 | 0.5748 | 0.8642 |
| 16 | 0.0179 | 0.6108 | 0.5809 | 0.8737 |
Config
{
"version": "v20260511",
"backbone_name": "vit_small_patch16_dinov3",
"arch": "DINOv3 + RoIAlign + MLP 3-cls (unknown/correct/wrong) + photometric + random_erase + camaug",
"params_M": 22.472707,
"img_size": [
1280,
720
],
"feat_ch": 384,
"expand": {
"x": 1.0,
"y_top": 0.2,
"y_bot": 1.5
},
"jitter": {
"center": 0.2,
"size": [
0.7,
1.4
],
"ex_x": [
0.5,
1.5
],
"ex_yt": [
0.5,
2.0
],
"ex_yb": [
0.7,
1.3
]
},
"class_weights": {
"unknown": 1.0,
"correct": 1.5,
"wrong": 1.5
},
"labels": [
"unknown",
"correct",
"wrong"
],
"best_val_macro_ap": 0.6230394742125506,
"best_epoch": 8,
"epochs_run": 16,
"total_train_time_s": 1614.3719201087952,
"test_metrics": {
"macro_ap": 0.6387341458750754,
"macro_f1": 0.5970843500567578,
"acc": 0.9015209125475285,
"per_class_ap": {
"unknown": 0.0,
"correct": 0.9460903759356688,
"wrong": 0.9701120616895574
},
"per_class": {
"unknown": {
"precision": 0.0,
"recall": 0.0,
"f1": 0.0,
"support": 0
},
"correct": {
"precision": 0.917016806722689,
"recall": 0.8290598290598291,
"f1": 0.8708229426433916,
"support": 1053
},
"wrong": {
"precision": 0.8927294398092968,
"recall": 0.9499048826886494,
"f1": 0.9204301075268817,
"support": 1577
}
},
"confusion_matrix": [
[
0,
0,
0
],
[
0,
873,
180
],
[
0,
79,
1498
]
],
"n": 2630
},
"hyperparams": {
"batch": 8,
"epochs": 20,
"lr": 5e-05,
"wd": 0.01,
"patience": 8
}
}