safety_rope_v20260511 — DINOv3-S + RoI Align HD1280 (3-class)

cvat2 #10 person bbox + safety_rope_use 屬性。架構:DINOv3-S backbone + RoIAlign(7×7) + MLP head 3-cls (unknown/correct/wrong)。class weights = [1.0, 1.5, 1.5],照 v9 stack 沿用 photometric + random_erase + camaug。

實際資料分佈:train/val/test 都 0 個 unknown 樣本 — 等同 v9 的 2-class (correct/wrong) 任務。模型架構保留 3-class output 給未來擴展,但 unknown logit 此次未訓練。

v9 對照(公平:用 active 2-cls 平均)

v9 (2-class, ViT-S DINOv3)v20260511 (3-class arch)
test AP / mean active AP0.93360.9581
test F1 / mean active F10.87700.8956
test accuracy0.90700.9015
macro AP (including unknown=0 support)0.6387
macro F1 (including unknown=0 support)0.5971

Test 指標

macro APmacro F1accuracyN
0.63870.59710.90152630

Per-class metrics

classAPPRF1support
unknown0.00000.00000.00000.00000
correct0.94610.91700.82910.87081053
wrong0.97010.89270.94990.92041577

Confusion matrix (rows=true, cols=pred)

unknowncorrectwrongrow total
unknown0000
correct08731801053
wrong07914981577
col total095216782630

Sample inference (4 per class,inner box = truth, outer box = expanded RoI used as model input)

truth=correct pred=correct (100%) [u=0 c=100 w=0]
truth=correct pred=correct (100%) [u=0 c=100 w=0]
truth=correct pred=correct (100%) [u=0 c=100 w=0]
truth=correct pred=correct (100%) [u=0 c=100 w=0]
truth=wrong pred=wrong (100%) [u=0 c=0 w=100]
truth=wrong pred=wrong (100%) [u=0 c=0 w=100]
truth=wrong pred=wrong (100%) [u=0 c=0 w=100]
truth=wrong pred=wrong (100%) [u=0 c=0 w=100]

FP audit — top-8 最自信的錯判(sampled from 200 test)

box 顏色 = truth class;標題顯示 predicted class。

truth=correct → pred=wrong (100%)
truth=wrong → pred=correct (100%)
truth=correct → pred=wrong (100%)
truth=correct → pred=wrong (97%)
truth=correct → pred=wrong (92%)
truth=correct → pred=wrong (92%)
truth=wrong → pred=correct (90%)
truth=wrong → pred=correct (87%)

Training history

eptrain_lossval_macroAPval_macroF1val_acc
10.67000.50510.45000.6768
20.58970.54180.39520.6110
30.24500.62100.55190.8282
40.14390.61940.55710.8372
50.10880.59910.54990.8270
60.08560.61460.55450.8330
70.07690.61080.55720.8372
80.06380.62300.59090.8917
90.06040.61630.55670.8354
100.04420.61570.57790.8689
110.04220.61810.58530.8803
120.03400.61830.58810.8863
130.03050.61070.57090.8588
140.02310.61700.57940.8701
150.02140.61310.57480.8642
160.01790.61080.58090.8737

Config

{
  "version": "v20260511",
  "backbone_name": "vit_small_patch16_dinov3",
  "arch": "DINOv3 + RoIAlign + MLP 3-cls (unknown/correct/wrong) + photometric + random_erase + camaug",
  "params_M": 22.472707,
  "img_size": [
    1280,
    720
  ],
  "feat_ch": 384,
  "expand": {
    "x": 1.0,
    "y_top": 0.2,
    "y_bot": 1.5
  },
  "jitter": {
    "center": 0.2,
    "size": [
      0.7,
      1.4
    ],
    "ex_x": [
      0.5,
      1.5
    ],
    "ex_yt": [
      0.5,
      2.0
    ],
    "ex_yb": [
      0.7,
      1.3
    ]
  },
  "class_weights": {
    "unknown": 1.0,
    "correct": 1.5,
    "wrong": 1.5
  },
  "labels": [
    "unknown",
    "correct",
    "wrong"
  ],
  "best_val_macro_ap": 0.6230394742125506,
  "best_epoch": 8,
  "epochs_run": 16,
  "total_train_time_s": 1614.3719201087952,
  "test_metrics": {
    "macro_ap": 0.6387341458750754,
    "macro_f1": 0.5970843500567578,
    "acc": 0.9015209125475285,
    "per_class_ap": {
      "unknown": 0.0,
      "correct": 0.9460903759356688,
      "wrong": 0.9701120616895574
    },
    "per_class": {
      "unknown": {
        "precision": 0.0,
        "recall": 0.0,
        "f1": 0.0,
        "support": 0
      },
      "correct": {
        "precision": 0.917016806722689,
        "recall": 0.8290598290598291,
        "f1": 0.8708229426433916,
        "support": 1053
      },
      "wrong": {
        "precision": 0.8927294398092968,
        "recall": 0.9499048826886494,
        "f1": 0.9204301075268817,
        "support": 1577
      }
    },
    "confusion_matrix": [
      [
        0,
        0,
        0
      ],
      [
        0,
        873,
        180
      ],
      [
        0,
        79,
        1498
      ]
    ],
    "n": 2630
  },
  "hyperparams": {
    "batch": 8,
    "epochs": 20,
    "lr": 5e-05,
    "wd": 0.01,
    "patience": 8
  }
}