version: v20260503_p10_dinov3_re_v7b_clean · 訓練日 2026-05-03 · backbone vit_small_patch16_dinov3 @ 1280×720
arch: ViT-S patch16 + RoIAlign + MLP 2-cls,外擴 1.0 / 0.2 / 1.5(X / Y_top / Y_bot)
data: manifest_v3_clean.csv(修了 cvat task_id != data_id 的 path bug,226 task / 12291 train / 2953 val / 4863 test)
| 版本 | data | test_AP | F1 | P | R | TP | FP | FN | TN | test_n |
|---|---|---|---|---|---|---|---|---|---|---|
| v1 | 178 taskpath bug | 0.9167 | 0.8449 | 0.854 | 0.836 | 1414 | 241 | 278 | 2627 | 4560 |
| v4 | 226 task no augpath bug | 0.8651 | 0.8102 | 0.780 | 0.843 | 1528 | 432 | 284 | 2531 | 4775 |
| v6 | 226 task +camaugpath bug | 0.8884 | 0.8267 | 0.839 | 0.815 | 1476 | 283 | 336 | 2680 | 4775 |
| v7a | clean 226 no augpath fix | 0.9165 | 0.8584 | 0.855 | 0.862 | 1606 | 272 | 258 | 2727 | 4863 |
| v7b ⭐ | clean 226 +camaugpath fix ⭐ | 0.9283 | 0.8574 | 0.863 | 0.851 | 1587 | 251 | 277 | 2748 | 4863 |
| 步驟 | test_AP | FP | FN | 說明 |
|---|---|---|---|---|
| v6 baseline | 0.8884 | 283 | 336 | 舊 manifest(path bug) + camaug |
| + path fix → v7b | 0.9283 | 251 | 277 | 同 setting 純換 clean manifest(多 14% 資料 + 修 1955 row 配錯 img) |
| 純 path fix 貢獻 | +4.0pp | -32 | -59 | 不改模型、不加 aug,只修資料 → AP +4pp、FP -11%、FN -18% |
| v7a (no camaug) | v7b (+camaug) | Δ | |
|---|---|---|---|
| test_AP | 0.9165 | 0.9283 | +1.18pp |
| F1 | 0.8584 | 0.8574 | -0.10pp |
| P | 0.8552 | 0.8634 | +0.83pp |
| R | 0.8616 | 0.8514 | -1.02pp |
| TP | 1606 | 1587 | -19 |
| FP | 272 | 251 | -21 |
| FN | 258 | 277 | +19 |
| TN | 2727 | 2748 | +21 |
camaug 在 clean data 上仍 +1.2pp AP / FP -21(更保守,R 略跌但 P 升)—— viewpoint augmentation 對降誤報持續有效,但 magnitude 比 path fix 小。
5/3 research agent 驗證:對 v6 推論機率做 IoU≥0.7 + window=±10 frame + N=7 median pool 的 temporal smoothing:
| test_AP | F1 | FP | FN | |
|---|---|---|---|---|
| v6 single-frame | 0.8884 | 0.8267 | 283 | 336 |
| v6 + temporal median | 0.8965 | 0.8364 | 283 | 306 |
| 純 smoothing 貢獻 | +0.81pp | +0.97pp | 持平 | -30 (-8.9%) |
FP 完全不漲、FN 降 30(漏報少 8.9%)。已加進 mac model_viewer 的 inference loop(自動對 safety_rope_* 主版套用)。
v7b + smoothing 預估 test_AP ~0.93+ / FN ~250(push 到 0.93 級)—— 已部署但 test 數字尚未跑,等場域實測。
| ep | train_loss | val_AP | val_F1 | val_AP bar |
|---|---|---|---|---|
| 1 | 0.3676 | 0.8769 | 0.7925 | 0.877 |
| 2 | 0.1746 | 0.8981 | 0.8075 | 0.898 |
| 3 | 0.1711 | 0.9183 | 0.8443 | 0.918 |
| 4 | 0.1180 | 0.8905 | 0.8148 | 0.891 |
| 5 | 0.1257 | 0.9199 | 0.8414 | 0.920 |
| 6 | 0.1037 | 0.9217 | 0.8368 | 0.922 |
| 7 | 0.0895 | 0.9167 | 0.8262 | 0.917 |
| 8 | 0.0724 | 0.9232 | 0.8494 | 0.923 |
| 9 | 0.0683 | 0.9231 | 0.8295 | 0.923 |
| 10 | 0.0571 | 0.9267 | 0.8533 | 0.927 |
| 11 | 0.0512 | 0.9295 | 0.8598 | 0.929 |
| 12 | 0.0433 | 0.9322 | 0.8638 | 0.932 |
| 13 | 0.0416 | 0.9317 | 0.8703 | 0.932 |
| 14 | 0.0344 | 0.9268 | 0.8574 | 0.927 |
| 15 ⭐ | 0.0270 | 0.9351 | 0.8652 | 0.935 |
| 16 | 0.0240 | 0.9284 | 0.8517 | 0.928 |
| 17 | 0.0202 | 0.9239 | 0.8674 | 0.924 |
| 18 | 0.0173 | 0.9031 | 0.8579 | 0.903 |
| 19 | 0.0170 | 0.9022 | 0.8492 | 0.902 |
| 20 | 0.0114 | 0.9135 | 0.8574 | 0.914 |
| 21 | 0.0133 | 0.9201 | 0.8574 | 0.920 |
| 22 | 0.0082 | 0.9169 | 0.8647 | 0.917 |
| 23 | 0.0095 | 0.9158 | 0.8697 | 0.916 |
best_epoch=15, val_AP=0.9351(patience=8 在 ep23 觸發 early stop)
| 檔案 | 大小 | 用途 | 下載 |
|---|---|---|---|
safety_rope_v20260503_v7b_clean/best.pt | 86 MB | fp32 完整 ckpt | R2 link |
safety_rope_v20260503_v7b_clean/best_fp16.pt | 43 MB | fp16 inference 部署用 | R2 link |
safety_rope_v20260503_v7b_clean/summary.json | 5 KB | 訓練 metadata | R2 link |
person_yolo11n_v20260501/best.pt | 5.5 MB | YOLO person detector | R2 link |
pip install torch torchvision timm ultralytics opencv-python pillow numpy
curl -L -o best_fp16.pt https://pub-478929a98a5c440cb22c2241c0bde314.r2.dev/safety_rope_v20260503_v7b_clean/best_fp16.pt
curl -L -o person_yolo11n_v20260501.pt https://pub-478929a98a5c440cb22c2241c0bde314.r2.dev/person_yolo11n_v20260501/best.pt
Model class 跟單張 inference 同 v6 報告(點此),下面是新加的 temporal smoothing wrapper(部署時整合):
from collections import deque
import statistics
class TemporalSmoother:
"""IoU>=0.7 + window=±0.5s + N=7 median pool。
research 驗證:v6+smoothing test_AP +0.81pp / FN -8.9% / FP 不變。0 訓練成本。"""
def __init__(self, win_s=0.5, iou_thr=0.7, top_n=7):
self.buf = [] # [{ts, items: [(bbox, prob)]}]
self.win_s = win_s; self.iou_thr = iou_thr; self.top_n = top_n
@staticmethod
def _iou(a, b):
ix1, iy1 = max(a[0],b[0]), max(a[1],b[1])
ix2, iy2 = min(a[2],b[2]), min(a[3],b[3])
iw, ih = max(0, ix2-ix1), max(0, iy2-iy1)
inter = iw*ih
return inter / max((a[2]-a[0])*(a[3]-a[1]) + (b[2]-b[0])*(b[3]-b[1]) - inter, 1e-9)
def smooth(self, persons, ts):
"""persons: list of {bbox, prob}。in-place 改 prob 為 smoothed,加 prob_raw 欄。"""
self.buf = [e for e in self.buf if ts - e["ts"] < self.win_s * 2]
for p in persons:
cur_bb, cur_prob = p["bbox"], p["prob"]
matched = []
for entry in self.buf:
if abs(ts - entry["ts"]) > self.win_s: continue
best_iou, best_prob = 0, None
for past_bb, past_prob in entry["items"]:
v = self._iou(cur_bb, past_bb)
if v >= self.iou_thr and v > best_iou:
best_iou, best_prob = v, past_prob
if best_prob is not None:
matched.append((entry["ts"], best_prob))
matched.append((ts, cur_prob))
matched.sort(key=lambda x: -x[0])
recent = [pr for _, pr in matched[:self.top_n]]
if len(recent) >= 2:
p["prob_raw"] = cur_prob
p["prob"] = float(statistics.median(recent))
# 加當前 frame 進 buffer
self.buf.append({"ts": ts, "items": [(p["bbox"], p.get("prob_raw", p["prob"])) for p in persons]})
return persons
# 用法
smoother = TemporalSmoother()
for frame in stream:
persons = single_frame_infer(frame) # 你的 v7b infer 函式
persons = smoother.smooth(persons, time.time()) # in-place smoothing
for p in persons:
if p["prob"] >= 0.461: # v7b thr
alert(p)
Mac model_viewer 已自動套用 smoothing 對所有 safety_rope_* handler。場域可看 attrs 內 safety_rope_correct(smoothed) vs safety_rope_correct_raw(單張 prob)的差。
v8 clip-aware fine-tune(T=4 clip + TokShift + consistency aux loss)目標 test_AP ≥ 0.91Raw artifacts: 5090-2:~/runs_new/safety_rope_v20260503_p10_dinov3_re_v7b_clean/
R2 bucket: rai-models
對照舊版 v6 報告: v6_camaug_report.html