safety_rope_v20260513 — DINOv3-S binary (cvat2 #10 +32 task 增量)

cvat2 #10 person bbox + safety_rope_use attr。Schema 改回 v9 binary (wrong/correct)。Unknown 已 filter。
ppe-demo handler 直接 hot-swap weight 即可（ckpt 內 labels + thr 完備）。

v20260513 變更（vs v20260512）：

head: Linear(256, 3) → Linear(256, 2)（v9 schema）
filter unknown 樣本（manifest 已 0 樣本，no-op 安全）
ckpt 內 labels=["wrong","correct"] 不再含 unknown
ckpt 內 thr=0.8926（v9 + v511 都沒存讓 handler load 報錯）
class weights wrong=1.5, correct=1.0（v9 原比例）

三版對照

	v9 (2-class baseline)	v20260512 (binary 復刻)	v20260513 (this, +32 task)
test AP	0.9336	0.9308	0.9144
test F1	0.8770	0.8567	0.8707
test accuracy	0.9070	0.8821	0.9004
test thr	0.79	0.054	0.893
val thr (deployed)	—	0.114	0.8926
train rows (after unknown filter)	—	7644	—
test rows	—	2630	2953

Test 指標

AP	F1	P	R	accuracy	thr	TP/FP/FN/TN
0.9144	0.8707	0.8808	0.8609	0.9004	0.893	990/134/160/1669

Per-class metrics

class	P	R	F1	support
wrong	0.9125	0.9257	0.9191	1803
correct	0.8808	0.8609	0.8707	1150

Confusion matrix (rows=true, cols=pred)

	wrong	correct	row total
wrong	1669	134	1803
correct	160	990	1150
col total	1829	1124	2953

Training history

ep	train_loss	val_AP	val_F1	val_acc	val_thr
1	0.5171	0.5598	0.6558	0.5823	0.372
2	0.3147	0.9283	0.8415	0.8701	0.433
3	0.1352	0.9299	0.8459	0.8665	0.130
4	0.1047	0.9256	0.8464	0.8606	0.202
5	0.0938	0.9411	0.8554	0.8695	0.352
6	0.0752	0.9594	0.9019	0.9156	0.314
7	0.0692	0.9567	0.9087	0.9216	0.286
8	0.0566	0.9537	0.8913	0.9084	0.364
9	0.0510	0.9539	0.8898	0.9060	0.282
10	0.0546	0.9519	0.8840	0.9019	0.493
11	0.0358	0.9489	0.8835	0.8989	0.370
12	0.0344	0.9599	0.9060	0.9204	0.124
13	0.0269	0.9616	0.9164	0.9294	0.124
14	0.0264	0.9573	0.8892	0.9072	0.165
15	0.0170	0.9474	0.8728	0.8941	0.764
16	0.0171	0.9593	0.8999	0.9144	0.087
17	0.0149	0.9582	0.9078	0.9210	0.133
18	0.0142	0.9570	0.9056	0.9198	0.211
19	0.0144	0.9571	0.9049	0.9186	0.154
20	0.0114	0.9571	0.9049	0.9186	0.159

Sample inference (test, 4 per class，box 顏色 = truth)

truth=wrong pred=wrong (P(correct)=0.0%, thr=89.3%)

truth=wrong pred=wrong (P(correct)=10.1%, thr=89.3%)

truth=correct pred=correct (P(correct)=100.0%, thr=89.3%)

FP audit — top-8 最自信錯判（sampled 300 test imgs）

truth=correct → pred=wrong (conf=100.0%)

truth=wrong → pred=correct (conf=100.0%)

truth=correct → pred=wrong (conf=100.0%)

truth=wrong → pred=correct (conf=100.0%)

truth=correct → pred=wrong (conf=99.9%)

truth=wrong → pred=correct (conf=99.8%)

truth=wrong → pred=correct (conf=99.1%)

Config

{
  "version": "v20260513",
  "backbone_name": "vit_small_patch16_dinov3",
  "arch": "DINOv3-S + RoIAlign + MLP 2-cls (wrong/correct) + photometric + random_erase + camaug — v9 binary復刻",
  "params_M": 22.47245,
  "img_size": [
    1280,
    720
  ],
  "feat_ch": 384,
  "expand": {
    "x": 1.0,
    "y_top": 0.2,
    "y_bot": 1.5
  },
  "jitter": {
    "center": 0.2,
    "size": [
      0.7,
      1.4
    ],
    "ex_x": [
      0.5,
      1.5
    ],
    "ex_yt": [
      0.5,
      2.0
    ],
    "ex_yb": [
      0.7,
      1.3
    ]
  },
  "class_weights": {
    "wrong": 1.5,
    "correct": 1.0
  },
  "labels": [
    "wrong",
    "correct"
  ],
  "thr": 0.892578125,
  "best_val_AP": 0.961629893247703,
  "best_epoch": 13,
  "epochs_run": 20,
  "total_train_time_s": 2130.6874334812164,
  "test_metrics": {
    "ap": 0.9144094126954356,
    "acc": 0.9004402302742973,
    "p": 0.8807829181494662,
    "r": 0.8608695652173913,
    "f1": 0.870712401055409,
    "thr": 0.892578125,
    "tp": 990,
    "fp": 134,
    "fn": 160,
    "tn": 1669,
    "n_pos": 1150,
    "n_total": 2953,
    "per_class": {
      "wrong": {
        "precision": 0.9125205030071077,
        "recall": 0.9256794231835829,
        "f1": 0.9190528634361234,
        "support": 1803
      },
      "correct": {
        "precision": 0.8807829181494662,
        "recall": 0.8608695652173913,
        "f1": 0.870712401055409,
        "support": 1150
      }
    },
    "confusion_matrix": [
      [
        1669,
        134
      ],
      [
        160,
        990
      ]
    ]
  },
  "hyperparams": {
    "batch": 8,
    "epochs": 20,
    "lr": 5e-05,
    "wd": 0.01,
    "patience": 8
  }
}