safety_rope_v20260511 — DINOv3-S + RoI Align HD1280 (3-class)

cvat2 #10 person bbox + safety_rope_use 屬性。架構：DINOv3-S backbone + RoIAlign(7×7) + MLP head 3-cls (unknown/correct/wrong)。class weights = [1.0, 1.5, 1.5]，照 v9 stack 沿用 photometric + random_erase + camaug。

實際資料分佈：train/val/test 都 0 個 unknown 樣本 — 等同 v9 的 2-class (correct/wrong) 任務。模型架構保留 3-class output 給未來擴展，但 unknown logit 此次未訓練。

v9 對照（公平：用 active 2-cls 平均）

	v9 (2-class, ViT-S DINOv3)	v20260511 (3-class arch)
test AP / mean active AP	0.9336	0.9581
test F1 / mean active F1	0.8770	0.8956
test accuracy	0.9070	0.9015
macro AP (including unknown=0 support)	—	0.6387
macro F1 (including unknown=0 support)	—	0.5971

Test 指標

macro AP	macro F1	accuracy	N
0.6387	0.5971	0.9015	2630

Per-class metrics

class	AP	P	R	F1	support
unknown	0.0000	0.0000	0.0000	0.0000	0
correct	0.9461	0.9170	0.8291	0.8708	1053
wrong	0.9701	0.8927	0.9499	0.9204	1577

Confusion matrix (rows=true, cols=pred)

	correct	wrong	row total
unknown	0	0	0
correct	873	180	1053
wrong	79	1498	1577
col total	952	1678	2630

Sample inference (4 per class，inner box = truth, outer box = expanded RoI used as model input)

truth=correct pred=correct (100%) [u=0 c=100 w=0]

truth=wrong pred=wrong (100%) [u=0 c=0 w=100]

FP audit — top-8 最自信的錯判（sampled from 200 test）

box 顏色 = truth class；標題顯示 predicted class。

truth=correct → pred=wrong (100%)

truth=wrong → pred=correct (100%)

truth=correct → pred=wrong (100%)

truth=correct → pred=wrong (97%)

truth=correct → pred=wrong (92%)

truth=wrong → pred=correct (90%)

truth=wrong → pred=correct (87%)

Training history

ep	train_loss	val_macroAP	val_macroF1	val_acc
1	0.6700	0.5051	0.4500	0.6768
2	0.5897	0.5418	0.3952	0.6110
3	0.2450	0.6210	0.5519	0.8282
4	0.1439	0.6194	0.5571	0.8372
5	0.1088	0.5991	0.5499	0.8270
6	0.0856	0.6146	0.5545	0.8330
7	0.0769	0.6108	0.5572	0.8372
8	0.0638	0.6230	0.5909	0.8917
9	0.0604	0.6163	0.5567	0.8354
10	0.0442	0.6157	0.5779	0.8689
11	0.0422	0.6181	0.5853	0.8803
12	0.0340	0.6183	0.5881	0.8863
13	0.0305	0.6107	0.5709	0.8588
14	0.0231	0.6170	0.5794	0.8701
15	0.0214	0.6131	0.5748	0.8642
16	0.0179	0.6108	0.5809	0.8737

Config

{
  "version": "v20260511",
  "backbone_name": "vit_small_patch16_dinov3",
  "arch": "DINOv3 + RoIAlign + MLP 3-cls (unknown/correct/wrong) + photometric + random_erase + camaug",
  "params_M": 22.472707,
  "img_size": [
    1280,
    720
  ],
  "feat_ch": 384,
  "expand": {
    "x": 1.0,
    "y_top": 0.2,
    "y_bot": 1.5
  },
  "jitter": {
    "center": 0.2,
    "size": [
      0.7,
      1.4
    ],
    "ex_x": [
      0.5,
      1.5
    ],
    "ex_yt": [
      0.5,
      2.0
    ],
    "ex_yb": [
      0.7,
      1.3
    ]
  },
  "class_weights": {
    "unknown": 1.0,
    "correct": 1.5,
    "wrong": 1.5
  },
  "labels": [
    "unknown",
    "correct",
    "wrong"
  ],
  "best_val_macro_ap": 0.6230394742125506,
  "best_epoch": 8,
  "epochs_run": 16,
  "total_train_time_s": 1614.3719201087952,
  "test_metrics": {
    "macro_ap": 0.6387341458750754,
    "macro_f1": 0.5970843500567578,
    "acc": 0.9015209125475285,
    "per_class_ap": {
      "unknown": 0.0,
      "correct": 0.9460903759356688,
      "wrong": 0.9701120616895574
    },
    "per_class": {
      "unknown": {
        "precision": 0.0,
        "recall": 0.0,
        "f1": 0.0,
        "support": 0
      },
      "correct": {
        "precision": 0.917016806722689,
        "recall": 0.8290598290598291,
        "f1": 0.8708229426433916,
        "support": 1053
      },
      "wrong": {
        "precision": 0.8927294398092968,
        "recall": 0.9499048826886494,
        "f1": 0.9204301075268817,
        "support": 1577
      }
    },
    "confusion_matrix": [
      [
        0,
        0,
        0
      ],
      [
        0,
        873,
        180
      ],
      [
        0,
        79,
        1498
      ]
    ],
    "n": 2630
  },
  "hyperparams": {
    "batch": 8,
    "epochs": 20,
    "lr": 5e-05,
    "wd": 0.01,
    "patience": 8
  }
}