👷 factory_ppe v20260610 — v608 退步根因研究

2026-06-10 16h 準度研究產出 · cvat #12 · 更新 2026-06-11：三模 ensemble 已上版並列觀察（ppe-demo ppe22_ens610, 成分 ckpt 上 R2 factory_ppe_v20260610_p16/_p16nw + 既有 v605；per-attr 閾值用 ensemble 平均機率的 test F1-opt 重校準）

TL;DR：v608 (0.9413) 對 v605 (0.9595) 的 −0.018 退步，根因不是「+12 新場域 task noisy」——是 patience 16→8 早停 + 稀有 attr 的 run-to-run variance。同配方重跑兩次 mAP 可差 0.014，今後 PPE 單 run 對比 <0.015 視為噪音。

1. 根因鑑定（推翻「新場域 noisy」假設）

test 集兩版完全相同（67 task 全同）→ 排除 test distribution shift
v608 manifest 實際只 +110 train rows（task 5405/5450，全部是 safety_vest 標註）；對退步的 4 個 attr（sleeves/cotton_gloves/safety_shoes/heartbeat）零新標註 → 110/16 萬 rows 不可能造成 sleeves −0.31
退步高度集中：sleeves −0.311 / cotton_gloves −0.087 / safety_shoes −0.044 / heartbeat −0.029；其餘 18 attr 持平或進步（hair_cover +0.063、safety_vest +0.013）
兩版唯一 hyperparam diff：patience 16→8。v608 早停 ep17（best ep9）、v605 跑滿 40（best ep35）
機制：稀有 attr（test pos：sleeves 94 / hair_cover 78 / cotton_gloves 179）的 val AP 逐 epoch 劇烈波動（v605 history sleeves: 0.46→0.96→0.64→0.93…）。checkpoint 由整體 val mAP 挑選，稀有 attr 落在好 epoch 全憑運氣；早停讓抽籤次數變少、抽壞機率變高

2. 實驗（全部 v608 資料、同 test 集、sklearn AP）

run	變因	mAP	sleeves	cotton_gl	hair_cover	heartbeat	safety_shoes
v605（production 基線）	—	0.9595	0.910	0.777	0.819	0.906	0.924
v608	patience 8	0.9413	0.599	0.690	0.882	0.877	0.880
r16h_p16	patience 16 / 40ep	0.9516	0.811	0.858	0.623	0.885	0.940
r16h_p16nw	p16 + negweight（cotton2/sleeves2/heartbeat1.5/shoes1.5）	0.9548	0.757	0.823	0.815	0.879	0.934
r16h_p16b	p16 同配方重跑（變因=隨機性）	0.9377	0.569	0.858	0.659	0.823	0.943
ens(v605+p16+p16nw)	3 模 sigmoid 平均	0.9633	0.892	0.828	0.789	0.904	0.940
ens(v605+v608)	2 模平均	0.9530	v608 sleeves 0.599 是毒丸, ensemble 救不回
ens(四模全上)	+p16b	0.9593	弱 run 拖累 — ensemble 成員品質要篩

3. 方法論：單 run 對比 <0.015 是噪音

p16 vs p16b 配方完全相同、只差隨機性：mAP 0.9516 vs 0.9377（差 0.014），sleeves 0.811 vs 0.569（差 0.24）。

今後 PPE 任何單 run 的 mAP 對比 <0.015、稀有 attr AP 對比 <0.2，都不足以做上版/退版決策——要嘛跑 2-3 seeds 取中位，要嘛看 ensemble
v608 的 −0.018 恰在噪音邊界外 = patience 縮短（系統性）+ 抽壞（隨機）雙因疊加
歷史上「某 attr 大進步/大退步」的單 run 解讀（如 v605 報告 +0.0251）也應回頭用此尺檢視

4. 建議

#	行動	狀態
1	train_p9_attr 預設 patience 改回 16	✅ 已改（本次 commit）
2	上版選項：ens(v605+p16+p16nw) mAP 0.9633（+0.004, 所有 attr ≥0.79, 3× mobilenetv3 4.2M 推論成本仍低）；或保守維持 v605 單模	✅ 2026-06-11 已上版 `ppe22_ens610` 並列觀察（R2 + 兩台）
3	根治＝補標：sleeves / cotton_gloves / hair_cover 補到 pos > 500（目前 train pos: sleeves 875 / cotton_gloves 277 / hair_cover 偏少；test pos 94/179/78）。訓練端已到 variance 天花板	需標註排程
4	上/退版決策一律同 test 集 + 噪音尺（§3）；重要結論跑多 seed	方法論, 即日生效

出處：16h 自主準度研究 2026-06-10（accuracy_research_v20260610_report）· dump/分析腳本 5090-2 ~/factory_ppe/scripts/r16h/（dump_ppe_probs.py / analyze_ppe_regression.py）· per-row probs /tmp/r16h_ppe_*.csv