safety_rope_v20260514 — DINOv3-S binary (cvat2 #10 +32 task 增量)

cvat2 #10 person bbox + safety_rope_use attr。Schema 改回 v9 binary (wrong/correct)。Unknown 已 filter。
ppe-demo handler 直接 hot-swap weight 即可（ckpt 內 labels + thr 完備）。

v20260514 變更（vs v20260513）：

head: Linear(256, 3) → Linear(256, 2)（v9 schema）
filter unknown 樣本（manifest 已 0 樣本，no-op 安全）
ckpt 內 labels=["wrong","correct"] 不再含 unknown
ckpt 內 thr=0.2729（v9 + v511 都沒存讓 handler load 報錯）
class weights wrong=1.5, correct=1.0（v9 原比例）

三版對照

	v9	v20260512	v20260513	v20260514 (this, +21 task)
test AP	0.9336	0.9308	0.9144	0.9242
test F1	0.8770	0.8567	0.8707	0.8650
test accuracy	0.9070	0.8821	0.9004	0.8910
test thr	0.79	0.054	0.893	0.273
val thr (deployed)	—	0.114	0.124	0.2729
train rows (after unknown filter)	—	7644	8147	10299
test rows	—	2630	2953	2953

Test 指標

AP	F1	P	R	accuracy	thr	TP/FP/FN/TN
0.9242	0.8650	0.8350	0.8974	0.8910	0.273	1032/204/118/1599

Per-class metrics

class	P	R	F1	support
wrong	0.9313	0.8869	0.9085	1803
correct	0.8350	0.8974	0.8650	1150

Confusion matrix (rows=true, cols=pred)

	wrong	correct	row total
wrong	1599	204	1803
correct	118	1032	1150
col total	1717	1236	2953

Training history

ep	train_loss	val_AP	val_F1	val_acc	val_thr
1	0.5288	0.6223	0.6663	0.7475	0.470
2	0.2704	0.9400	0.8861	0.9173	0.147
3	0.1490	0.9512	0.8865	0.9173	0.260
4	0.1129	0.9511	0.8857	0.9169	0.259
5	0.3221	0.6575	0.6856	0.7008	0.306
6	0.3478	0.9494	0.8899	0.9169	0.163
7	0.0894	0.9533	0.9017	0.9258	0.081
8	0.0726	0.9499	0.8835	0.9106	0.271
9	0.0631	0.9543	0.8879	0.9192	0.600
10	0.0598	0.9570	0.8954	0.9247	0.358
11	0.0516	0.9554	0.9002	0.9266	0.231
12	0.0403	0.9556	0.8972	0.9240	0.225
13	0.0316	0.9489	0.8955	0.9229	0.216
14	0.0285	0.9537	0.8947	0.9240	0.414
15	0.0262	0.9508	0.8905	0.9207	0.602
16	0.0231	0.9535	0.8915	0.9221	0.689
17	0.0186	0.9520	0.8953	0.9232	0.347
18	0.0198	0.9506	0.8962	0.9255	0.721

Sample inference (test, 4 per class，box 顏色 = truth)

truth=wrong pred=wrong (P(correct)=1.5%, thr=27.3%)

truth=wrong pred=wrong (P(correct)=0.0%, thr=27.3%)

truth=wrong pred=wrong (P(correct)=8.0%, thr=27.3%)

truth=correct pred=correct (P(correct)=99.9%, thr=27.3%)

truth=correct pred=correct (P(correct)=97.8%, thr=27.3%)

truth=correct pred=correct (P(correct)=99.9%, thr=27.3%)

FP audit — top-8 最自信錯判（sampled 300 test imgs）

truth=wrong → pred=correct (conf=99.9%)

truth=correct → pred=wrong (conf=99.8%)

truth=wrong → pred=correct (conf=99.7%)

truth=correct → pred=wrong (conf=99.4%)

truth=correct → pred=wrong (conf=98.7%)

truth=wrong → pred=correct (conf=96.8%)

truth=wrong → pred=correct (conf=94.2%)

truth=correct → pred=wrong (conf=88.3%)

Config

{
  "version": "v20260514",
  "backbone_name": "vit_small_patch16_dinov3",
  "arch": "DINOv3-S + RoIAlign + MLP 2-cls (wrong/correct) + photometric + random_erase + camaug — v9 binary復刻",
  "params_M": 22.47245,
  "img_size": [
    1280,
    720
  ],
  "feat_ch": 384,
  "expand": {
    "x": 1.0,
    "y_top": 0.2,
    "y_bot": 1.5
  },
  "jitter": {
    "center": 0.2,
    "size": [
      0.7,
      1.4
    ],
    "ex_x": [
      0.5,
      1.5
    ],
    "ex_yt": [
      0.5,
      2.0
    ],
    "ex_yb": [
      0.7,
      1.3
    ]
  },
  "class_weights": {
    "wrong": 1.5,
    "correct": 1.0
  },
  "labels": [
    "wrong",
    "correct"
  ],
  "thr": 0.27294921875,
  "best_val_AP": 0.9570321404906651,
  "best_epoch": 10,
  "epochs_run": 18,
  "total_train_time_s": 2504.8716027736664,
  "test_metrics": {
    "ap": 0.9241776957176995,
    "acc": 0.8909583474432781,
    "p": 0.8349514563106796,
    "r": 0.8973913043478261,
    "f1": 0.865046102263202,
    "thr": 0.27294921875,
    "tp": 1032,
    "fp": 204,
    "fn": 118,
    "tn": 1599,
    "n_pos": 1150,
    "n_total": 2953,
    "per_class": {
      "wrong": {
        "precision": 0.9312754804892254,
        "recall": 0.8868552412645591,
        "f1": 0.9085227272727273,
        "support": 1803
      },
      "correct": {
        "precision": 0.8349514563106796,
        "recall": 0.8973913043478261,
        "f1": 0.865046102263202,
        "support": 1150
      }
    },
    "confusion_matrix": [
      [
        1599,
        204
      ],
      [
        118,
        1032
      ]
    ]
  },
  "hyperparams": {
    "batch": 8,
    "epochs": 20,
    "lr": 5e-05,
    "wd": 0.01,
    "patience": 8
  }
}