safety_rope_v20260513 — DINOv3-S binary (cvat2 #10 +32 task 增量)
cvat2 #10 person bbox + safety_rope_use attr。Schema 改回 v9 binary (wrong/correct)。Unknown 已 filter。
ppe-demo handler 直接 hot-swap weight 即可(ckpt 內 labels + thr 完備)。
v20260513 變更(vs v20260512):
- head: Linear(256, 3) → Linear(256, 2)(v9 schema)
- filter unknown 樣本(manifest 已 0 樣本,no-op 安全)
- ckpt 內 labels=["wrong","correct"] 不再含 unknown
- ckpt 內 thr=0.8926(v9 + v511 都沒存讓 handler load 報錯)
- class weights wrong=1.5, correct=1.0(v9 原比例)
三版對照
| v9 (2-class baseline) | v20260512 (binary 復刻) | v20260513 (this, +32 task) |
| test AP | 0.9336 | 0.9308 | 0.9144 |
| test F1 | 0.8770 | 0.8567 | 0.8707 |
| test accuracy | 0.9070 | 0.8821 | 0.9004 |
| test thr | 0.79 | 0.054 | 0.893 |
| val thr (deployed) | — | 0.114 | 0.8926 |
| train rows (after unknown filter) | — | 7644 | — |
| test rows | — | 2630 | 2953 |
Test 指標
| AP | F1 | P | R | accuracy | thr | TP/FP/FN/TN |
| 0.9144 | 0.8707 | 0.8808 | 0.8609 |
0.9004 | 0.893 |
990/134/160/1669 |
Per-class metrics
| class | P | R | F1 | support |
| wrong | 0.9125 | 0.9257 | 0.9191 | 1803 |
| correct | 0.8808 | 0.8609 | 0.8707 | 1150 |
Confusion matrix (rows=true, cols=pred)
| wrong | correct | row total |
| wrong | 1669 | 134 | 1803 |
| correct | 160 | 990 | 1150 |
| col total | 1829 | 1124 | 2953 |
|---|
Training history
| ep | train_loss | val_AP | val_F1 | val_acc | val_thr |
| 1 | 0.5171 | 0.5598 | 0.6558 | 0.5823 | 0.372 |
| 2 | 0.3147 | 0.9283 | 0.8415 | 0.8701 | 0.433 |
| 3 | 0.1352 | 0.9299 | 0.8459 | 0.8665 | 0.130 |
| 4 | 0.1047 | 0.9256 | 0.8464 | 0.8606 | 0.202 |
| 5 | 0.0938 | 0.9411 | 0.8554 | 0.8695 | 0.352 |
| 6 | 0.0752 | 0.9594 | 0.9019 | 0.9156 | 0.314 |
| 7 | 0.0692 | 0.9567 | 0.9087 | 0.9216 | 0.286 |
| 8 | 0.0566 | 0.9537 | 0.8913 | 0.9084 | 0.364 |
| 9 | 0.0510 | 0.9539 | 0.8898 | 0.9060 | 0.282 |
| 10 | 0.0546 | 0.9519 | 0.8840 | 0.9019 | 0.493 |
| 11 | 0.0358 | 0.9489 | 0.8835 | 0.8989 | 0.370 |
| 12 | 0.0344 | 0.9599 | 0.9060 | 0.9204 | 0.124 |
| 13 | 0.0269 | 0.9616 | 0.9164 | 0.9294 | 0.124 |
| 14 | 0.0264 | 0.9573 | 0.8892 | 0.9072 | 0.165 |
| 15 | 0.0170 | 0.9474 | 0.8728 | 0.8941 | 0.764 |
| 16 | 0.0171 | 0.9593 | 0.8999 | 0.9144 | 0.087 |
| 17 | 0.0149 | 0.9582 | 0.9078 | 0.9210 | 0.133 |
| 18 | 0.0142 | 0.9570 | 0.9056 | 0.9198 | 0.211 |
| 19 | 0.0144 | 0.9571 | 0.9049 | 0.9186 | 0.154 |
| 20 | 0.0114 | 0.9571 | 0.9049 | 0.9186 | 0.159 |
Sample inference (test, 4 per class,box 顏色 = truth)
truth=wrong pred=wrong (P(correct)=0.0%, thr=89.3%)

truth=wrong pred=wrong (P(correct)=0.0%, thr=89.3%)

truth=wrong pred=wrong (P(correct)=0.0%, thr=89.3%)

truth=wrong pred=wrong (P(correct)=10.1%, thr=89.3%)

truth=correct pred=correct (P(correct)=100.0%, thr=89.3%)

truth=correct pred=correct (P(correct)=100.0%, thr=89.3%)

truth=correct pred=correct (P(correct)=100.0%, thr=89.3%)

truth=correct pred=correct (P(correct)=100.0%, thr=89.3%)

FP audit — top-8 最自信錯判(sampled 300 test imgs)
truth=correct → pred=wrong (conf=100.0%)

truth=wrong → pred=correct (conf=100.0%)

truth=correct → pred=wrong (conf=100.0%)

truth=wrong → pred=correct (conf=100.0%)

truth=correct → pred=wrong (conf=99.9%)

truth=correct → pred=wrong (conf=99.9%)

truth=wrong → pred=correct (conf=99.8%)

truth=wrong → pred=correct (conf=99.1%)

Config
{
"version": "v20260513",
"backbone_name": "vit_small_patch16_dinov3",
"arch": "DINOv3-S + RoIAlign + MLP 2-cls (wrong/correct) + photometric + random_erase + camaug — v9 binary復刻",
"params_M": 22.47245,
"img_size": [
1280,
720
],
"feat_ch": 384,
"expand": {
"x": 1.0,
"y_top": 0.2,
"y_bot": 1.5
},
"jitter": {
"center": 0.2,
"size": [
0.7,
1.4
],
"ex_x": [
0.5,
1.5
],
"ex_yt": [
0.5,
2.0
],
"ex_yb": [
0.7,
1.3
]
},
"class_weights": {
"wrong": 1.5,
"correct": 1.0
},
"labels": [
"wrong",
"correct"
],
"thr": 0.892578125,
"best_val_AP": 0.961629893247703,
"best_epoch": 13,
"epochs_run": 20,
"total_train_time_s": 2130.6874334812164,
"test_metrics": {
"ap": 0.9144094126954356,
"acc": 0.9004402302742973,
"p": 0.8807829181494662,
"r": 0.8608695652173913,
"f1": 0.870712401055409,
"thr": 0.892578125,
"tp": 990,
"fp": 134,
"fn": 160,
"tn": 1669,
"n_pos": 1150,
"n_total": 2953,
"per_class": {
"wrong": {
"precision": 0.9125205030071077,
"recall": 0.9256794231835829,
"f1": 0.9190528634361234,
"support": 1803
},
"correct": {
"precision": 0.8807829181494662,
"recall": 0.8608695652173913,
"f1": 0.870712401055409,
"support": 1150
}
},
"confusion_matrix": [
[
1669,
134
],
[
160,
990
]
]
},
"hyperparams": {
"batch": 8,
"epochs": 20,
"lr": 5e-05,
"wd": 0.01,
"patience": 8
}
}