YOLO Tracker-Aided Temporal Smoothing 設計

RTSP frame ↓ ┌─────────────────────────────┐ │ Stage 1: YOLO det │ ← 現役 person YOLO11n（不動） │ → raw det bbox + conf │ └─────────────────────────────┘ ↓ ┌─────────────────────────────┐ │ Stage 2: ByteTrack │ ← Ultralytics 內建，加 persist=True │ → bbox + tid │ └─────────────────────────────┘ ↓ ┌─────────────────────────────┐ │ Stage 3: Per-track buffer │ ← 新加（核心） │ for tid in tracks: │ │ deque.append(conf) │ │ smoothed = mean(deque) │ │ track_age += 1 │ │ → filter / fill │ └─────────────────────────────┘ ↓ ┌─────────────────────────────┐ │ Stage 4: 下游消費者 │ │ - PPE 21-attr cls │ │ - safety_rope RoI │ │ - iSeek 通報 │ └─────────────────────────────┘

機制	解決什麼	邏輯
1. Per-track confidence smoothing	單幀低 conf 噪音	同 track_id N 幀 conf 做 sliding mean / EMA，平滑 confidence 抖動
2. Gap filling	短暫漏抓 / 遮擋	同 track 短暫消失 ≤ 3 幀，用 motion model interpolate bbox，當作仍存在
3. Transient filter	單幀假框 FP	track 長度 < N_min 幀（如 3）視為 transient，不報 alarm

參數	建議值	說明
`N (deque maxlen)`	8-16 幀	RTSP @ 10 fps 約 1-1.5 秒
`N_MIN_TRACK` (transient filter)	3-5 幀	track 持續這麼久才認真
`N_GAP_MAX` (gap fill)	3 幀	短暫遮擋可補；超過視為真消失
`N_RETIRE` (清掉死 track)	30 幀	3 秒沒看到就 retire
`ghost conf decay`	0.7	補幀的 confidence 打折，不要當真實 det

指標	baseline	+ smoothing	說明
FP rate	100%	50-70%	transient filter + smoothing 雙效
flickering（同人忽抓忽漏）	常見	-60%	gap fill 補回
recall（持續真陽性）	baseline	+2-5%	gap fill + boost confidence
iSeek 通報延遲	sustained × frame_dt	-50%	sustained 可調短
推論成本	1×	~1.1×	tracker 開銷可忽略

服務	位置	動作
model_viewer / ppe-demo	`scripts/model_viewer/app.py`	PPE21Handler 已加完成；person handler 待加
iSeek	iSeek_iframe (5090-2 上 docker)	rule engine 整合 smoothed det 待做
iseek-river	iseek-river.intemotech.com	river_debris 也可加待做
離線推論 batch	infer_ppe_video.py / infer_ppe_to_json.py	可加選做

Step	動作	時長
1	model_viewer person handler 加 tracker smoothing（沿用 PPE21Handler 範本）	1 天
2	同份 code 套到 iSeek inference 服務（5090-2 docker）	1 天
3	調 iSeek rule 整合 smoothed conf + 縮短 sustained 窗口	1 天
4	production 跑 1 週 ABTest（half cam baseline / half cam smoothing）	1 週
5	analytics：FP / recall / 通報延遲對比	1 天
6	若效益顯著 → 全 cam roll out	1 天

方法	類別	備註
ByteTrack / BoT-SORT	Tracker	Ultralytics 內建，我們本方案直接用
FGFA (Flow-Guided Feature Aggregation)	VOD-train	用 optical flow warp 鄰幀 feature 加權平均
SELSA (Sequence Level Semantic Aggregation)	VOD-train	DETR-based
MEGA (Multi-frame Enhanced Global)	VOD-train	Long-term + short-term memory
YOLOv8-MS / YOLO11-MS	VOD-train	YOLO 加 temporal neck（社群實作）
TSM (Temporal Shift Module)	VOD-train	channel-wise shift to prev/next frame
TransVOD / PTSEFormer	VOD-train	DETR + temporal attention
VideoSwin / SlowFast	Video transformer	長期可考慮，cost 高

本份是 A 路線（tracker-aided smoothing）的設計文件，零訓練成本，當下立即可動。
未來累積足夠 sequence 標注後，可再評估 B 路線（FGFA / SELSA 重訓），預期再 +5-10% recall / -20% FP，但 cost 3-10×。
→ 回模型訓練 SOP · → 所有報告目錄

🎯 YOLO Tracker-Aided Temporal Smoothing

🎯 為什麼要做

🧠 核心 idea（3 個機制）

🔄 完整 pipeline

⚙️ 演算法詳細

per-track state

每幀更新邏輯

建議參數

🚨 跟 iSeek 通報路徑整合

📊 預期效益（推測，需 production 驗證）

🛠️ 部署位置

⚠️ 注意 / 已知限制

🚀 動工順序

🔗 相關 VOD 研究（後續可延伸）