safety_rope_v20260512 — DINOv3-S binary (v9 復刻 + ckpt 補 thr)

cvat2 #10 person bbox + safety_rope_use attr。Schema 改回 v9 binary (wrong/correct)。Unknown 已 filter。
ppe-demo handler 直接 hot-swap weight 即可(ckpt 內 labels + thr 完備)。

v20260512 變更(vs v20260511):

三版對照

v9 (2-class baseline)v20260511 (3-class arch)v20260512 (binary 復刻)
test AP0.93360.9581 (mean active)0.9308
test F10.87700.8956 (mean active)0.8567
test accuracy0.90700.90150.8821
test thr0.790.054
val thr (deployed)0.054
headLinear(256, 2)Linear(256, 3)Linear(256, 2)
ckpt labels["wrong","correct"]["unknown","correct","wrong"]["wrong","correct"]
ckpt thrmissingmissing0.0543 ✓

Test 指標

APF1PRaccuracythrTP/FP/FN/TN
0.93080.85670.83440.8803 0.88210.054 927/184/126/1393

Per-class metrics

classPRF1support
wrong0.91710.88330.89991577
correct0.83440.88030.85671053

Confusion matrix (rows=true, cols=pred)

wrongcorrectrow total
wrong13931841577
correct1269271053
col total151911112630

Training history

eptrain_lossval_APval_F1val_accval_thr
10.48560.89600.80730.84140.255
20.18590.93180.85230.87070.114
30.12650.92450.83710.84740.118
40.12840.89760.81340.83300.239
50.09230.92980.84980.86420.350
60.08110.92000.83160.85100.325
70.06800.91960.84390.86300.258
80.06600.90040.84310.86650.245
90.05290.93020.85850.87070.342
100.04270.92020.84770.85940.140

Sample inference (test, 4 per class,box 顏色 = truth)

truth=wrong pred=wrong (P(correct)=0.6%, thr=5.4%)
truth=wrong pred=wrong (P(correct)=0.0%, thr=5.4%)
truth=wrong pred=wrong (P(correct)=1.7%, thr=5.4%)
truth=wrong pred=wrong (P(correct)=0.1%, thr=5.4%)
truth=correct pred=correct (P(correct)=98.9%, thr=5.4%)
truth=correct pred=correct (P(correct)=95.5%, thr=5.4%)
truth=correct pred=correct (P(correct)=99.1%, thr=5.4%)
truth=correct pred=correct (P(correct)=99.6%, thr=5.4%)

FP audit — top-8 最自信錯判(sampled 300 test imgs)

truth=correct → pred=wrong (conf=99.9%)
truth=correct → pred=wrong (conf=99.7%)
truth=correct → pred=wrong (conf=99.6%)
truth=correct → pred=wrong (conf=99.1%)
truth=correct → pred=wrong (conf=98.9%)
truth=correct → pred=wrong (conf=98.7%)
truth=correct → pred=wrong (conf=97.7%)
truth=correct → pred=wrong (conf=97.7%)

Config

{
  "version": "v20260512",
  "backbone_name": "vit_small_patch16_dinov3",
  "arch": "DINOv3-S + RoIAlign + MLP 2-cls (wrong/correct) + photometric + random_erase + camaug — v9 binary復刻",
  "params_M": 22.47245,
  "img_size": [
    1280,
    720
  ],
  "feat_ch": 384,
  "expand": {
    "x": 1.0,
    "y_top": 0.2,
    "y_bot": 1.5
  },
  "jitter": {
    "center": 0.2,
    "size": [
      0.7,
      1.4
    ],
    "ex_x": [
      0.5,
      1.5
    ],
    "ex_yt": [
      0.5,
      2.0
    ],
    "ex_yb": [
      0.7,
      1.3
    ]
  },
  "class_weights": {
    "wrong": 1.5,
    "correct": 1.0
  },
  "labels": [
    "wrong",
    "correct"
  ],
  "thr": 0.054290771484375,
  "best_val_AP": 0.9318118478968604,
  "best_epoch": 2,
  "epochs_run": 10,
  "total_train_time_s": 1008.9956252574921,
  "test_metrics": {
    "ap": 0.9307922369815115,
    "acc": 0.8821292775665399,
    "p": 0.8343834383438344,
    "r": 0.8803418803418803,
    "f1": 0.8567467652495379,
    "thr": 0.054290771484375,
    "tp": 927,
    "fp": 184,
    "fn": 126,
    "tn": 1393,
    "n_pos": 1053,
    "n_total": 2630,
    "per_class": {
      "wrong": {
        "precision": 0.9170506912442397,
        "recall": 0.8833227647431833,
        "f1": 0.8998708010335917,
        "support": 1577
      },
      "correct": {
        "precision": 0.8343834383438344,
        "recall": 0.8803418803418803,
        "f1": 0.8567467652495379,
        "support": 1053
      }
    },
    "confusion_matrix": [
      [
        1393,
        184
      ],
      [
        126,
        927
      ]
    ]
  },
  "hyperparams": {
    "batch": 8,
    "epochs": 20,
    "lr": 5e-05,
    "wd": 0.01,
    "patience": 8
  }
}