🦺 Safety Rope v7b 訓練報告(新冠軍)

version: v20260503_p10_dinov3_re_v7b_clean · 訓練日 2026-05-03 · backbone vit_small_patch16_dinov3 @ 1280×720
arch: ViT-S patch16 + RoIAlign + MLP 2-cls,外擴 1.0 / 0.2 / 1.5(X / Y_top / Y_bot)
data: manifest_v3_clean.csv(修了 cvat task_id != data_id 的 path bug,226 task / 12291 train / 2953 val / 4863 test)

TL;DR — 一刀拿到全方位進步

0.9283
test AP
vs v6: +4.0pp
0.8574
test F1
vs v6: +3.1pp
0.863
Precision
vs v6: +2.4pp
0.851
Recall
vs v6: +3.7pp
251
FP(誤報)
vs v6: -32 (-11%)
277
FN(漏報)
vs v6: -59
三件事疊加帶來大躍升:(1) 修 manifest path bug(cvat2 task_id != data_id 30%,舊 export 撈到別 task 的 frame 餵訓練 → 修後 train 從 11082 變 12291,乾淨 14% data)+ (2) 226 task 全資料(v1 是 178 task subset) + (3) rotation+blur camaug(攝影機擺位微差 augmentation)。

五版對照表(test set)

版本datatest_APF1PRTPFPFNTNtest_n
v1178 taskpath bug 0.91670.8449 0.8540.836 14142412782627 4560
v4226 task no augpath bug 0.86510.8102 0.7800.843 15284322842531 4775
v6226 task +camaugpath bug 0.88840.8267 0.8390.815 14762833362680 4775
v7aclean 226 no augpath fix 0.91650.8584 0.8550.862 16062722582727 4863
v7bclean 226 +camaugpath fix 0.92830.8574 0.8630.851 15872512772748 4863

v6 → v7b 三件事貢獻拆解

步驟test_APFPFN說明
v6 baseline0.8884283336舊 manifest(path bug) + camaug
+ path fix → v7b0.9283251277同 setting 純換 clean manifest(多 14% 資料 + 修 1955 row 配錯 img)
純 path fix 貢獻+4.0pp-32-59不改模型、不加 aug,只修資料 → AP +4pp、FP -11%、FN -18%

v7a vs v7b(clean manifest 上 camaug 的純效果)

v7a (no camaug)v7b (+camaug)Δ
test_AP0.91650.9283+1.18pp
F10.85840.8574-0.10pp
P0.85520.8634+0.83pp
R0.86160.8514-1.02pp
TP16061587-19
FP272251-21
FN258277+19
TN27272748+21

camaug 在 clean data 上仍 +1.2pp AP / FP -21(更保守,R 略跌但 P 升)—— viewpoint augmentation 對降誤報持續有效,但 magnitude 比 path fix 小。

🎯 Inference-time Temporal Smoothing(0 訓練成本,已上線)

5/3 research agent 驗證:對 v6 推論機率做 IoU≥0.7 + window=±10 frame + N=7 median pool 的 temporal smoothing:

test_APF1FPFN
v6 single-frame0.88840.8267283336
v6 + temporal median0.89650.8364283306
純 smoothing 貢獻+0.81pp+0.97pp持平-30 (-8.9%)

FP 完全不漲、FN 降 30(漏報少 8.9%)。已加進 mac model_viewer 的 inference loop(自動對 safety_rope_* 主版套用)。

v7b + smoothing 預估 test_AP ~0.93+ / FN ~250(push 到 0.93 級)—— 已部署但 test 數字尚未跑,等場域實測。

Augmentation 配置(v7b)

Training history

eptrain_lossval_APval_F1val_AP bar
10.3676 0.87690.7925 0.877
20.1746 0.89810.8075 0.898
30.1711 0.91830.8443 0.918
40.1180 0.89050.8148 0.891
50.1257 0.91990.8414 0.920
60.1037 0.92170.8368 0.922
70.0895 0.91670.8262 0.917
80.0724 0.92320.8494 0.923
90.0683 0.92310.8295 0.923
100.0571 0.92670.8533 0.927
110.0512 0.92950.8598 0.929
120.0433 0.93220.8638 0.932
130.0416 0.93170.8703 0.932
140.0344 0.92680.8574 0.927
15 ⭐0.0270 0.93510.8652 0.935
160.0240 0.92840.8517 0.928
170.0202 0.92390.8674 0.924
180.0173 0.90310.8579 0.903
190.0170 0.90220.8492 0.902
200.0114 0.91350.8574 0.914
210.0133 0.92010.8574 0.920
220.0082 0.91690.8647 0.917
230.0095 0.91580.8697 0.916

best_epoch=15, val_AP=0.9351(patience=8 在 ep23 觸發 early stop)

📦 模型下載

檔案大小用途下載
safety_rope_v20260503_v7b_clean/best.pt86 MBfp32 完整 ckptR2 link
safety_rope_v20260503_v7b_clean/best_fp16.pt43 MBfp16 inference 部署用R2 link
safety_rope_v20260503_v7b_clean/summary.json5 KB訓練 metadataR2 link
person_yolo11n_v20260501/best.pt5.5 MBYOLO person detectorR2 link

🧪 推論 quick-start(含 temporal smoothing)

pip install torch torchvision timm ultralytics opencv-python pillow numpy

curl -L -o best_fp16.pt https://pub-478929a98a5c440cb22c2241c0bde314.r2.dev/safety_rope_v20260503_v7b_clean/best_fp16.pt
curl -L -o person_yolo11n_v20260501.pt https://pub-478929a98a5c440cb22c2241c0bde314.r2.dev/person_yolo11n_v20260501/best.pt

Model class 跟單張 inference 同 v6 報告(點此),下面是新加的 temporal smoothing wrapper(部署時整合):

from collections import deque
import statistics

class TemporalSmoother:
    """IoU>=0.7 + window=±0.5s + N=7 median pool。
    research 驗證:v6+smoothing test_AP +0.81pp / FN -8.9% / FP 不變。0 訓練成本。"""
    def __init__(self, win_s=0.5, iou_thr=0.7, top_n=7):
        self.buf = []   # [{ts, items: [(bbox, prob)]}]
        self.win_s = win_s; self.iou_thr = iou_thr; self.top_n = top_n
    @staticmethod
    def _iou(a, b):
        ix1, iy1 = max(a[0],b[0]), max(a[1],b[1])
        ix2, iy2 = min(a[2],b[2]), min(a[3],b[3])
        iw, ih = max(0, ix2-ix1), max(0, iy2-iy1)
        inter = iw*ih
        return inter / max((a[2]-a[0])*(a[3]-a[1]) + (b[2]-b[0])*(b[3]-b[1]) - inter, 1e-9)
    def smooth(self, persons, ts):
        """persons: list of {bbox, prob}。in-place 改 prob 為 smoothed,加 prob_raw 欄。"""
        self.buf = [e for e in self.buf if ts - e["ts"] < self.win_s * 2]
        for p in persons:
            cur_bb, cur_prob = p["bbox"], p["prob"]
            matched = []
            for entry in self.buf:
                if abs(ts - entry["ts"]) > self.win_s: continue
                best_iou, best_prob = 0, None
                for past_bb, past_prob in entry["items"]:
                    v = self._iou(cur_bb, past_bb)
                    if v >= self.iou_thr and v > best_iou:
                        best_iou, best_prob = v, past_prob
                if best_prob is not None:
                    matched.append((entry["ts"], best_prob))
            matched.append((ts, cur_prob))
            matched.sort(key=lambda x: -x[0])
            recent = [pr for _, pr in matched[:self.top_n]]
            if len(recent) >= 2:
                p["prob_raw"] = cur_prob
                p["prob"] = float(statistics.median(recent))
        # 加當前 frame 進 buffer
        self.buf.append({"ts": ts, "items": [(p["bbox"], p.get("prob_raw", p["prob"])) for p in persons]})
        return persons

# 用法
smoother = TemporalSmoother()
for frame in stream:
    persons = single_frame_infer(frame)   # 你的 v7b infer 函式
    persons = smoother.smooth(persons, time.time())   # in-place smoothing
    for p in persons:
        if p["prob"] >= 0.461:   # v7b thr
            alert(p)

Mac model_viewer 已自動套用 smoothing 對所有 safety_rope_* handler。場域可看 attrs 內 safety_rope_correct(smoothed) vs safety_rope_correct_raw(單張 prob)的差。

下一步

  1. 場域實測 v7b + smoothing 對「人類連續看才能判」的場景(單張 prob 抖動 → median 抑制)
  2. 5090-2 research agent 啟動 v8 clip-aware fine-tune(T=4 clip + TokShift + consistency aux loss)目標 test_AP ≥ 0.91
  3. e2e audit 用 clean manifest 重跑(之前 v4 audit 結論作廢,path bug 修了之後才知 YOLO 真實漏框/假框數字)
  4. YOLO conf 0.35→0.5 / imgsz 640→1280 試驗(清場域 hard-neg 重訓 person YOLO)

Raw artifacts: 5090-2:~/runs_new/safety_rope_v20260503_p10_dinov3_re_v7b_clean/
R2 bucket: rai-models
對照舊版 v6 報告: v6_camaug_report.html