🔒 Safety Rope Detection — v20260502

RoI Align MobileNetV3-L 2-class 訓練日期：2026-04-28 ｜來源：cvat2 project 8 (safety_rope_detection)

📊 主要指標

Test AP

0.870

Test Accuracy

0.878

Test F1

0.836

Precision

0.816

Recall

0.857

最佳 Threshold

0.075

📐 模型架構

項目	內容
Backbone	mobilenetv3_large_100.ra_in1k（ImageNet 預訓）
架構	整張圖 → backbone features → RoI Align (7×7) → MLP head → softmax(2)
輸入大小	640 × 640
Feature channels	960（last stage）
Spatial scale	1/32（mobilenetv3 final stride）
參數量	5.18 M
BBox expand	x: ±100% × bbox_w（左右各擴 100%） y_top: +20% × bbox_h（上方擴 20%） y_bot: +150% × bbox_h（下方擴 150%，看安全繩落點）
Class	0 = wrong（沒掛 / 掛錯）, 1 = correct（正確扣繩）

為何用 RoI Align？
傳統做法是把 person 從原圖 crop 出再餵 classifier，但安全繩常超出 bbox（往下到掛點/腰側）， crop 會把繩子切掉。RoI Align 在整張圖共用 backbone forward，只對 person 區域取 spatial feature， bbox 還能擴大不受限，下半身 +150% 容納繩子的視覺軌跡。

📈 訓練曲線

🎯 Test Confusion Matrix（thr=0.075）

	Pred wrong	Pred correct	Total
True wrong (0)	TN 1262	FP 157	1419
True correct (1)	FN 116	TP 695	811
Total	1378	852	2230

🚀 模型具體用法

Pipeline 流程

YOLO11n 對整張影像偵測 person bbox（conf ≥ 0.35）
整張影像 resize 到 640×640，轉 tensor，送入 backbone 取 feature map [B, 960, 20, 20]
每個 person bbox 經 expand（x ±100%, y_top +20%, y_bot +150%）後同步縮放到 640 座標
RoI Align(7×7) 對每個 person 取 [N, 960, 7, 7] 特徵
MLP head: Conv2d 256 → AvgPool → Linear(2) → softmax，取 class=1（correct）的機率
機率 ≥ 0.075 判為「正確扣繩」（在 PR-curve 最佳 F1 點）

① PyTorch 推論程式

import torch, torch.nn as nn
import torchvision.ops as tvops
import timm
import numpy as np
from PIL import Image
from ultralytics import YOLO

# ── Model ──────────────────────────────────────────────
class RoISafetyRopeModel(nn.Module):
    def __init__(self, backbone_name="mobilenetv3_large_100.ra_in1k"):
        super().__init__()
        self.backbone = timm.create_model(backbone_name, pretrained=False, features_only=True)
        ch = self.backbone.feature_info.channels()[-1]   # 960
        self.roi_align = tvops.RoIAlign((7,7), spatial_scale=1/32, sampling_ratio=2)
        self.head = nn.Sequential(
            nn.Conv2d(ch, 256, 3, padding=1), nn.ReLU(inplace=True),
            nn.AdaptiveAvgPool2d((1,1)), nn.Flatten(),
            nn.Dropout(0.3), nn.Linear(256, 2),
        )
    def forward(self, image, rois):
        feats = self.backbone(image)[-1]
        return self.head(self.roi_align(feats, rois))

# ── 載入 ──────────────────────────────────────────────
device = torch.device("cuda")
ckpt = torch.load("safety_rope_v20260502_best.pt", map_location=device, weights_only=False)
model = RoISafetyRopeModel(ckpt["backbone_name"]).to(device).eval()
model.load_state_dict(ckpt["model_state"])
yolo = YOLO("person_seg_yolo11n.pt")  # person detector
THR = ckpt["thr"]              # 0.0749
IMG_SIZE = ckpt["img_size"]    # 640
EX = ckpt["expand_x"]          # 1.0
EYT = ckpt["expand_y_top"]     # 0.2
EYB = ckpt["expand_y_bot"]     # 1.5

# ── 預測（單張圖） ────────────────────────────────────
mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
std  = np.array([0.229, 0.224, 0.225], dtype=np.float32)

def predict(img_pil):
    W, H = img_pil.size
    persons = yolo(img_pil, verbose=False, conf=0.35)[0].boxes.xyxy.cpu().numpy()
    if len(persons) == 0:
        return []
    img_resized = img_pil.resize((IMG_SIZE, IMG_SIZE))
    arr = (np.array(img_resized, dtype=np.float32)/255.0 - mean) / std
    x = torch.from_numpy(arr.transpose(2,0,1)).unsqueeze(0).to(device)
    sx, sy = IMG_SIZE/W, IMG_SIZE/H
    rois = []
    for x1, y1, x2, y2 in persons:
        w, h = x2-x1, y2-y1
        ex1 = max(0, x1 - w*EX); ey1 = max(0, y1 - h*EYT)
        ex2 = min(W, x2 + w*EX); ey2 = min(H, y2 + h*EYB)
        rois.append([0.0, ex1*sx, ey1*sy, ex2*sx, ey2*sy])
    rois_t = torch.tensor(rois, dtype=torch.float32).to(device)
    with torch.no_grad():
        probs = torch.softmax(model(x, rois_t), dim=-1)[:,1].cpu().numpy()
    return [
        {"bbox": persons[i].tolist(),
         "prob_correct": float(probs[i]),
         "predicted": "correct" if probs[i] >= THR else "wrong"}
        for i in range(len(persons))
    ]

② Live RTSP / Webcam 整合

已整合到 rai-model-viewer，部署於 http://192.168.53.21:7860/。選 model = safety_rope_v20260502 (安全背繩) 後，每個 person 上會疊 bbox 與 CORRECT/WRONG 標籤，右側直方圖即時顯示每人的安全繩信心值。

③ Threshold 調整建議

場景	推薦 Threshold	策略
稽核（需高精準告警）	≥ 0.50	降低誤報，可能漏判薄繩
巡檢（平衡）	0.075（best F1）	訓練 PR 最佳點
強調 recall	≤ 0.03	幾乎不漏，但 FP 多

📦 資料集

來源	cvat2 project 8 (safety_rope_detection)，從 raicvat#12 lock/unlock polygon 轉換而來
樣本數	共 8,159 person bbox（mask=1 of safety_rope_use 屬性）
Split	使用 CVAT task.subset（Train / Validation / Test），絕不 hash split
標註方式	raicvat#12 lock = correct (含繩條延伸)，unlock = wrong (僅人物剪影)；用 YOLO11n 對 raicvat 影格再 detect 找 person bbox，與原 polygon 中心點配對 → 78% 命中率，剩 22% 用 polygon bbox 替代
類別分布	Test: 811 correct / 1419 wrong

📥 模型下載

R2 Bucket：rai-models / safety_rope_v20260502 / best.pt
Public URL（待上傳）：https://pub-478929a98a5c440cb22c2241c0bde314.r2.dev/safety_rope_v20260502/best.pt

🛠️ 訓練指令

python3 train_safety_rope_roi.py v20260502 \
  --manifest /home/ubuntu/factory_ppe/safety_rope_dataset/manifest_v1.csv \
  --backbone mobilenetv3_large_100.ra_in1k \
  --batch 32 --epochs 30 --lr 3e-4 --wd 0.01 --patience 6

📋 Hyperparameters

Optimizer	AdamW (lr=3e-4, wd=0.01)
Scheduler	OneCycleLR, pct_start=0.1, cosine annealing
Batch	32
Epochs run	20 / 30（early stop 未觸發）
Best epoch	14（val_AP=0.870）
訓練時間	10 分鐘（1× RTX 5090）
AMP	fp16
Loss	CrossEntropyLoss（2-class）
Augmentation	Horizontal flip 50%（bbox 同步翻）

Generated 2026-04-28 ｜回到目錄