safety_rope_v20260513 — DINOv3-S binary (cvat2 #10 +32 task 增量)

cvat2 #10 person bbox + safety_rope_use attr。Schema 改回 v9 binary (wrong/correct)。Unknown 已 filter。
ppe-demo handler 直接 hot-swap weight 即可(ckpt 內 labels + thr 完備)。

v20260513 變更(vs v20260512):

三版對照

v9 (2-class baseline)v20260512 (binary 復刻)v20260513 (this, +32 task)
test AP0.93360.93080.9144
test F10.87700.85670.8707
test accuracy0.90700.88210.9004
test thr0.790.0540.893
val thr (deployed)0.1140.8926
train rows (after unknown filter)7644
test rows26302953

Test 指標

APF1PRaccuracythrTP/FP/FN/TN
0.91440.87070.88080.8609 0.90040.893 990/134/160/1669

Per-class metrics

classPRF1support
wrong0.91250.92570.91911803
correct0.88080.86090.87071150

Confusion matrix (rows=true, cols=pred)

wrongcorrectrow total
wrong16691341803
correct1609901150
col total182911242953

Training history

eptrain_lossval_APval_F1val_accval_thr
10.51710.55980.65580.58230.372
20.31470.92830.84150.87010.433
30.13520.92990.84590.86650.130
40.10470.92560.84640.86060.202
50.09380.94110.85540.86950.352
60.07520.95940.90190.91560.314
70.06920.95670.90870.92160.286
80.05660.95370.89130.90840.364
90.05100.95390.88980.90600.282
100.05460.95190.88400.90190.493
110.03580.94890.88350.89890.370
120.03440.95990.90600.92040.124
130.02690.96160.91640.92940.124
140.02640.95730.88920.90720.165
150.01700.94740.87280.89410.764
160.01710.95930.89990.91440.087
170.01490.95820.90780.92100.133
180.01420.95700.90560.91980.211
190.01440.95710.90490.91860.154
200.01140.95710.90490.91860.159

Sample inference (test, 4 per class,box 顏色 = truth)

truth=wrong pred=wrong (P(correct)=0.0%, thr=89.3%)
truth=wrong pred=wrong (P(correct)=0.0%, thr=89.3%)
truth=wrong pred=wrong (P(correct)=0.0%, thr=89.3%)
truth=wrong pred=wrong (P(correct)=10.1%, thr=89.3%)
truth=correct pred=correct (P(correct)=100.0%, thr=89.3%)
truth=correct pred=correct (P(correct)=100.0%, thr=89.3%)
truth=correct pred=correct (P(correct)=100.0%, thr=89.3%)
truth=correct pred=correct (P(correct)=100.0%, thr=89.3%)

FP audit — top-8 最自信錯判(sampled 300 test imgs)

truth=correct → pred=wrong (conf=100.0%)
truth=wrong → pred=correct (conf=100.0%)
truth=correct → pred=wrong (conf=100.0%)
truth=wrong → pred=correct (conf=100.0%)
truth=correct → pred=wrong (conf=99.9%)
truth=correct → pred=wrong (conf=99.9%)
truth=wrong → pred=correct (conf=99.8%)
truth=wrong → pred=correct (conf=99.1%)

Config

{
  "version": "v20260513",
  "backbone_name": "vit_small_patch16_dinov3",
  "arch": "DINOv3-S + RoIAlign + MLP 2-cls (wrong/correct) + photometric + random_erase + camaug — v9 binary復刻",
  "params_M": 22.47245,
  "img_size": [
    1280,
    720
  ],
  "feat_ch": 384,
  "expand": {
    "x": 1.0,
    "y_top": 0.2,
    "y_bot": 1.5
  },
  "jitter": {
    "center": 0.2,
    "size": [
      0.7,
      1.4
    ],
    "ex_x": [
      0.5,
      1.5
    ],
    "ex_yt": [
      0.5,
      2.0
    ],
    "ex_yb": [
      0.7,
      1.3
    ]
  },
  "class_weights": {
    "wrong": 1.5,
    "correct": 1.0
  },
  "labels": [
    "wrong",
    "correct"
  ],
  "thr": 0.892578125,
  "best_val_AP": 0.961629893247703,
  "best_epoch": 13,
  "epochs_run": 20,
  "total_train_time_s": 2130.6874334812164,
  "test_metrics": {
    "ap": 0.9144094126954356,
    "acc": 0.9004402302742973,
    "p": 0.8807829181494662,
    "r": 0.8608695652173913,
    "f1": 0.870712401055409,
    "thr": 0.892578125,
    "tp": 990,
    "fp": 134,
    "fn": 160,
    "tn": 1669,
    "n_pos": 1150,
    "n_total": 2953,
    "per_class": {
      "wrong": {
        "precision": 0.9125205030071077,
        "recall": 0.9256794231835829,
        "f1": 0.9190528634361234,
        "support": 1803
      },
      "correct": {
        "precision": 0.8807829181494662,
        "recall": 0.8608695652173913,
        "f1": 0.870712401055409,
        "support": 1150
      }
    },
    "confusion_matrix": [
      [
        1669,
        134
      ],
      [
        160,
        990
      ]
    ]
  },
  "hyperparams": {
    "batch": 8,
    "epochs": 20,
    "lr": 5e-05,
    "wd": 0.01,
    "patience": 8
  }
}