🦺 Safety Rope v7b 訓練報告（新冠軍）

version: v20260503_p10_dinov3_re_v7b_clean · 訓練日 2026-05-03 · backbone vit_small_patch16_dinov3 @ 1280×720
arch: ViT-S patch16 + RoIAlign + MLP 2-cls，外擴 1.0 / 0.2 / 1.5（X / Y_top / Y_bot）
data: manifest_v3_clean.csv（修了 cvat task_id != data_id 的 path bug，226 task / 12291 train / 2953 val / 4863 test）

TL;DR — 一刀拿到全方位進步

0.9283

test AP

vs v6: +4.0pp

0.8574

test F1

vs v6: +3.1pp

0.863

Precision

vs v6: +2.4pp

0.851

Recall

vs v6: +3.7pp

251

FP（誤報）

vs v6: -32 (-11%)

277

FN（漏報）

vs v6: -59

三件事疊加帶來大躍升：(1) 修 manifest path bug（cvat2 task_id != data_id 30%，舊 export 撈到別 task 的 frame 餵訓練 → 修後 train 從 11082 變 12291，乾淨 14% data）+ (2) 226 task 全資料（v1 是 178 task subset） + (3) rotation+blur camaug（攝影機擺位微差 augmentation）。

五版對照表（test set）

版本	data	test_AP	F1	P	R	TP	FP	FN	TN	test_n
v1	178 taskpath bug	0.9167	0.8449	0.854	0.836	1414	241	278	2627	4560
v4	226 task no augpath bug	0.8651	0.8102	0.780	0.843	1528	432	284	2531	4775
v6	226 task +camaugpath bug	0.8884	0.8267	0.839	0.815	1476	283	336	2680	4775
v7a	clean 226 no augpath fix	0.9165	0.8584	0.855	0.862	1606	272	258	2727	4863
v7b ⭐	clean 226 +camaugpath fix ⭐	0.9283	0.8574	0.863	0.851	1587	251	277	2748	4863

v6 → v7b 三件事貢獻拆解

步驟	test_AP	FP	FN	說明
v6 baseline	0.8884	283	336	舊 manifest（path bug） + camaug
+ path fix → v7b	0.9283	251	277	同 setting 純換 clean manifest（多 14% 資料 + 修 1955 row 配錯 img）
純 path fix 貢獻	+4.0pp	-32	-59	不改模型、不加 aug，只修資料 → AP +4pp、FP -11%、FN -18%

v7a vs v7b（clean manifest 上 camaug 的純效果）

	v7a (no camaug)	v7b (+camaug)	Δ
test_AP	0.9165	0.9283	+1.18pp
F1	0.8584	0.8574	-0.10pp
P	0.8552	0.8634	+0.83pp
R	0.8616	0.8514	-1.02pp
TP	1606	1587	-19
FP	272	251	-21
FN	258	277	+19
TN	2727	2748	+21

camaug 在 clean data 上仍 +1.2pp AP / FP -21（更保守，R 略跌但 P 升）—— viewpoint augmentation 對降誤報持續有效，但 magnitude 比 path fix 小。

🎯 Inference-time Temporal Smoothing（0 訓練成本，已上線）

5/3 research agent 驗證：對 v6 推論機率做 IoU≥0.7 + window=±10 frame + N=7 median pool 的 temporal smoothing：

	test_AP	F1	FP	FN
v6 single-frame	0.8884	0.8267	283	336
v6 + temporal median	0.8965	0.8364	283	306
純 smoothing 貢獻	+0.81pp	+0.97pp	持平	-30 (-8.9%)

FP 完全不漲、FN 降 30（漏報少 8.9%）。已加進 mac model_viewer 的 inference loop（自動對 safety_rope_* 主版套用）。

v7b + smoothing 預估 test_AP ~0.93+ / FN ~250（push 到 0.93 級）—— 已部署但 test 數字尚未跑，等場域實測。

Augmentation 配置（v7b）

Photometric：brightness ±0.4 / contrast ±0.3 / saturation ±0.4
ROI bbox jitter：center ±20% / size 0.7-1.4× / 外擴 ratio 0.5-2.0×
Random erasing：person bbox 上半 60%，prob 0.4 / area 5-20%
Horizontal flip：prob 0.5
Rotation ±5°（v6 新加）：prob 0.5，模擬攝影機歪斜
Gaussian blur σ=0.5-1.5（v6 新加）：prob 0.2，模擬攝影機對焦微差

Training history

ep	train_loss	val_AP	val_F1	val_AP bar
1	0.3676	0.8769	0.7925	0.877
2	0.1746	0.8981	0.8075	0.898
3	0.1711	0.9183	0.8443	0.918
4	0.1180	0.8905	0.8148	0.891
5	0.1257	0.9199	0.8414	0.920
6	0.1037	0.9217	0.8368	0.922
7	0.0895	0.9167	0.8262	0.917
8	0.0724	0.9232	0.8494	0.923
9	0.0683	0.9231	0.8295	0.923
10	0.0571	0.9267	0.8533	0.927
11	0.0512	0.9295	0.8598	0.929
12	0.0433	0.9322	0.8638	0.932
13	0.0416	0.9317	0.8703	0.932
14	0.0344	0.9268	0.8574	0.927
15 ⭐	0.0270	0.9351	0.8652	0.935
16	0.0240	0.9284	0.8517	0.928
17	0.0202	0.9239	0.8674	0.924
18	0.0173	0.9031	0.8579	0.903
19	0.0170	0.9022	0.8492	0.902
20	0.0114	0.9135	0.8574	0.914
21	0.0133	0.9201	0.8574	0.920
22	0.0082	0.9169	0.8647	0.917
23	0.0095	0.9158	0.8697	0.916

best_epoch=15, val_AP=0.9351（patience=8 在 ep23 觸發 early stop）

📦 模型下載

檔案	大小	用途	下載
`safety_rope_v20260503_v7b_clean/best.pt`	86 MB	fp32 完整 ckpt	R2 link
`safety_rope_v20260503_v7b_clean/best_fp16.pt`	43 MB	fp16 inference 部署用	R2 link
`safety_rope_v20260503_v7b_clean/summary.json`	5 KB	訓練 metadata	R2 link
`person_yolo11n_v20260501/best.pt`	5.5 MB	YOLO person detector	R2 link

🧪 推論 quick-start（含 temporal smoothing）

pip install torch torchvision timm ultralytics opencv-python pillow numpy

curl -L -o best_fp16.pt https://pub-478929a98a5c440cb22c2241c0bde314.r2.dev/safety_rope_v20260503_v7b_clean/best_fp16.pt
curl -L -o person_yolo11n_v20260501.pt https://pub-478929a98a5c440cb22c2241c0bde314.r2.dev/person_yolo11n_v20260501/best.pt

Model class 跟單張 inference 同 v6 報告（點此），下面是新加的 temporal smoothing wrapper（部署時整合）：

from collections import deque
import statistics

class TemporalSmoother:
    """IoU>=0.7 + window=±0.5s + N=7 median pool。
    research 驗證：v6+smoothing test_AP +0.81pp / FN -8.9% / FP 不變。0 訓練成本。"""
    def __init__(self, win_s=0.5, iou_thr=0.7, top_n=7):
        self.buf = []   # [{ts, items: [(bbox, prob)]}]
        self.win_s = win_s; self.iou_thr = iou_thr; self.top_n = top_n
    @staticmethod
    def _iou(a, b):
        ix1, iy1 = max(a[0],b[0]), max(a[1],b[1])
        ix2, iy2 = min(a[2],b[2]), min(a[3],b[3])
        iw, ih = max(0, ix2-ix1), max(0, iy2-iy1)
        inter = iw*ih
        return inter / max((a[2]-a[0])*(a[3]-a[1]) + (b[2]-b[0])*(b[3]-b[1]) - inter, 1e-9)
    def smooth(self, persons, ts):
        """persons: list of {bbox, prob}。in-place 改 prob 為 smoothed，加 prob_raw 欄。"""
        self.buf = [e for e in self.buf if ts - e["ts"] < self.win_s * 2]
        for p in persons:
            cur_bb, cur_prob = p["bbox"], p["prob"]
            matched = []
            for entry in self.buf:
                if abs(ts - entry["ts"]) > self.win_s: continue
                best_iou, best_prob = 0, None
                for past_bb, past_prob in entry["items"]:
                    v = self._iou(cur_bb, past_bb)
                    if v >= self.iou_thr and v > best_iou:
                        best_iou, best_prob = v, past_prob
                if best_prob is not None:
                    matched.append((entry["ts"], best_prob))
            matched.append((ts, cur_prob))
            matched.sort(key=lambda x: -x[0])
            recent = [pr for _, pr in matched[:self.top_n]]
            if len(recent) >= 2:
                p["prob_raw"] = cur_prob
                p["prob"] = float(statistics.median(recent))
        # 加當前 frame 進 buffer
        self.buf.append({"ts": ts, "items": [(p["bbox"], p.get("prob_raw", p["prob"])) for p in persons]})
        return persons

# 用法
smoother = TemporalSmoother()
for frame in stream:
    persons = single_frame_infer(frame)   # 你的 v7b infer 函式
    persons = smoother.smooth(persons, time.time())   # in-place smoothing
    for p in persons:
        if p["prob"] >= 0.461:   # v7b thr
            alert(p)

Mac model_viewer 已自動套用 smoothing 對所有 safety_rope_* handler。場域可看 attrs 內 safety_rope_correct（smoothed） vs safety_rope_correct_raw（單張 prob）的差。

下一步

場域實測 v7b + smoothing 對「人類連續看才能判」的場景（單張 prob 抖動 → median 抑制）
5090-2 research agent 啟動 v8 clip-aware fine-tune（T=4 clip + TokShift + consistency aux loss）目標 test_AP ≥ 0.91
e2e audit 用 clean manifest 重跑（之前 v4 audit 結論作廢，path bug 修了之後才知 YOLO 真實漏框/假框數字）
YOLO conf 0.35→0.5 / imgsz 640→1280 試驗（清場域 hard-neg 重訓 person YOLO）

Raw artifacts: 5090-2:~/runs_new/safety_rope_v20260503_p10_dinov3_re_v7b_clean/
R2 bucket: rai-models
對照舊版 v6 報告: v6_camaug_report.html