safety_rope_v20260514 — DINOv3-S binary (cvat2 #10 +32 task 增量)

cvat2 #10 person bbox + safety_rope_use attr。Schema 改回 v9 binary (wrong/correct)。Unknown 已 filter。
ppe-demo handler 直接 hot-swap weight 即可(ckpt 內 labels + thr 完備)。

v20260514 變更(vs v20260513):

三版對照

v9v20260512v20260513v20260514 (this, +21 task)
test AP0.93360.93080.91440.9242
test F10.87700.85670.87070.8650
test accuracy0.90700.88210.90040.8910
test thr0.790.0540.8930.273
val thr (deployed)0.1140.1240.2729
train rows (after unknown filter)7644814710299
test rows263029532953

Test 指標

APF1PRaccuracythrTP/FP/FN/TN
0.92420.86500.83500.8974 0.89100.273 1032/204/118/1599

Per-class metrics

classPRF1support
wrong0.93130.88690.90851803
correct0.83500.89740.86501150

Confusion matrix (rows=true, cols=pred)

wrongcorrectrow total
wrong15992041803
correct11810321150
col total171712362953

Training history

eptrain_lossval_APval_F1val_accval_thr
10.52880.62230.66630.74750.470
20.27040.94000.88610.91730.147
30.14900.95120.88650.91730.260
40.11290.95110.88570.91690.259
50.32210.65750.68560.70080.306
60.34780.94940.88990.91690.163
70.08940.95330.90170.92580.081
80.07260.94990.88350.91060.271
90.06310.95430.88790.91920.600
100.05980.95700.89540.92470.358
110.05160.95540.90020.92660.231
120.04030.95560.89720.92400.225
130.03160.94890.89550.92290.216
140.02850.95370.89470.92400.414
150.02620.95080.89050.92070.602
160.02310.95350.89150.92210.689
170.01860.95200.89530.92320.347
180.01980.95060.89620.92550.721

Sample inference (test, 4 per class,box 顏色 = truth)

truth=wrong pred=wrong (P(correct)=1.5%, thr=27.3%)
truth=wrong pred=wrong (P(correct)=0.0%, thr=27.3%)
truth=wrong pred=wrong (P(correct)=0.0%, thr=27.3%)
truth=wrong pred=wrong (P(correct)=8.0%, thr=27.3%)
truth=correct pred=correct (P(correct)=99.9%, thr=27.3%)
truth=correct pred=correct (P(correct)=97.8%, thr=27.3%)
truth=correct pred=correct (P(correct)=99.9%, thr=27.3%)
truth=correct pred=correct (P(correct)=99.9%, thr=27.3%)

FP audit — top-8 最自信錯判(sampled 300 test imgs)

truth=wrong → pred=correct (conf=99.9%)
truth=correct → pred=wrong (conf=99.8%)
truth=wrong → pred=correct (conf=99.7%)
truth=correct → pred=wrong (conf=99.4%)
truth=correct → pred=wrong (conf=98.7%)
truth=wrong → pred=correct (conf=96.8%)
truth=wrong → pred=correct (conf=94.2%)
truth=correct → pred=wrong (conf=88.3%)

Config

{
  "version": "v20260514",
  "backbone_name": "vit_small_patch16_dinov3",
  "arch": "DINOv3-S + RoIAlign + MLP 2-cls (wrong/correct) + photometric + random_erase + camaug — v9 binary復刻",
  "params_M": 22.47245,
  "img_size": [
    1280,
    720
  ],
  "feat_ch": 384,
  "expand": {
    "x": 1.0,
    "y_top": 0.2,
    "y_bot": 1.5
  },
  "jitter": {
    "center": 0.2,
    "size": [
      0.7,
      1.4
    ],
    "ex_x": [
      0.5,
      1.5
    ],
    "ex_yt": [
      0.5,
      2.0
    ],
    "ex_yb": [
      0.7,
      1.3
    ]
  },
  "class_weights": {
    "wrong": 1.5,
    "correct": 1.0
  },
  "labels": [
    "wrong",
    "correct"
  ],
  "thr": 0.27294921875,
  "best_val_AP": 0.9570321404906651,
  "best_epoch": 10,
  "epochs_run": 18,
  "total_train_time_s": 2504.8716027736664,
  "test_metrics": {
    "ap": 0.9241776957176995,
    "acc": 0.8909583474432781,
    "p": 0.8349514563106796,
    "r": 0.8973913043478261,
    "f1": 0.865046102263202,
    "thr": 0.27294921875,
    "tp": 1032,
    "fp": 204,
    "fn": 118,
    "tn": 1599,
    "n_pos": 1150,
    "n_total": 2953,
    "per_class": {
      "wrong": {
        "precision": 0.9312754804892254,
        "recall": 0.8868552412645591,
        "f1": 0.9085227272727273,
        "support": 1803
      },
      "correct": {
        "precision": 0.8349514563106796,
        "recall": 0.8973913043478261,
        "f1": 0.865046102263202,
        "support": 1150
      }
    },
    "confusion_matrix": [
      [
        1599,
        204
      ],
      [
        118,
        1032
      ]
    ]
  },
  "hyperparams": {
    "batch": 8,
    "epochs": 20,
    "lr": 5e-05,
    "wd": 0.01,
    "patience": 8
  }
}