🔒 Safety Rope Detection — v20260502

RoI Align MobileNetV3-L 2-class 訓練日期:2026-04-28 | 來源:cvat2 project 8 (safety_rope_detection)

📊 主要指標

Test AP
0.870
Test Accuracy
0.878
Test F1
0.836
Precision
0.816
Recall
0.857
最佳 Threshold
0.075

📐 模型架構

項目內容
Backbonemobilenetv3_large_100.ra_in1k(ImageNet 預訓)
架構整張圖 → backbone features → RoI Align (7×7) → MLP head → softmax(2)
輸入大小640 × 640
Feature channels960(last stage)
Spatial scale1/32(mobilenetv3 final stride)
參數量5.18 M
BBox expandx: ±100% × bbox_w(左右各擴 100%)
y_top: +20% × bbox_h(上方擴 20%)
y_bot: +150% × bbox_h(下方擴 150%,看安全繩落點)
Class0 = wrong(沒掛 / 掛錯), 1 = correct(正確扣繩)
為何用 RoI Align?
傳統做法是把 person 從原圖 crop 出再餵 classifier,但安全繩常超出 bbox(往下到掛點/腰側), crop 會把繩子切掉。RoI Align 在整張圖共用 backbone forward,只對 person 區域取 spatial feature, bbox 還能擴大不受限,下半身 +150% 容納繩子的視覺軌跡。

📈 訓練曲線

🎯 Test Confusion Matrix(thr=0.075)

Pred wrongPred correctTotal
True wrong (0)TN 1262FP 1571419
True correct (1)FN 116TP 695811
Total13788522230

🚀 模型具體用法

Pipeline 流程

  1. YOLO11n 對整張影像偵測 person bbox(conf ≥ 0.35)
  2. 整張影像 resize 到 640×640,轉 tensor,送入 backbone 取 feature map [B, 960, 20, 20]
  3. 每個 person bbox 經 expand(x ±100%, y_top +20%, y_bot +150%)後同步縮放到 640 座標
  4. RoI Align(7×7) 對每個 person 取 [N, 960, 7, 7] 特徵
  5. MLP head: Conv2d 256 → AvgPool → Linear(2) → softmax,取 class=1(correct)的機率
  6. 機率 ≥ 0.075 判為「正確扣繩」(在 PR-curve 最佳 F1 點)

① PyTorch 推論程式

import torch, torch.nn as nn
import torchvision.ops as tvops
import timm
import numpy as np
from PIL import Image
from ultralytics import YOLO

# ── Model ──────────────────────────────────────────────
class RoISafetyRopeModel(nn.Module):
    def __init__(self, backbone_name="mobilenetv3_large_100.ra_in1k"):
        super().__init__()
        self.backbone = timm.create_model(backbone_name, pretrained=False, features_only=True)
        ch = self.backbone.feature_info.channels()[-1]   # 960
        self.roi_align = tvops.RoIAlign((7,7), spatial_scale=1/32, sampling_ratio=2)
        self.head = nn.Sequential(
            nn.Conv2d(ch, 256, 3, padding=1), nn.ReLU(inplace=True),
            nn.AdaptiveAvgPool2d((1,1)), nn.Flatten(),
            nn.Dropout(0.3), nn.Linear(256, 2),
        )
    def forward(self, image, rois):
        feats = self.backbone(image)[-1]
        return self.head(self.roi_align(feats, rois))

# ── 載入 ──────────────────────────────────────────────
device = torch.device("cuda")
ckpt = torch.load("safety_rope_v20260502_best.pt", map_location=device, weights_only=False)
model = RoISafetyRopeModel(ckpt["backbone_name"]).to(device).eval()
model.load_state_dict(ckpt["model_state"])
yolo = YOLO("person_seg_yolo11n.pt")  # person detector
THR = ckpt["thr"]              # 0.0749
IMG_SIZE = ckpt["img_size"]    # 640
EX = ckpt["expand_x"]          # 1.0
EYT = ckpt["expand_y_top"]     # 0.2
EYB = ckpt["expand_y_bot"]     # 1.5

# ── 預測(單張圖) ────────────────────────────────────
mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
std  = np.array([0.229, 0.224, 0.225], dtype=np.float32)

def predict(img_pil):
    W, H = img_pil.size
    persons = yolo(img_pil, verbose=False, conf=0.35)[0].boxes.xyxy.cpu().numpy()
    if len(persons) == 0:
        return []
    img_resized = img_pil.resize((IMG_SIZE, IMG_SIZE))
    arr = (np.array(img_resized, dtype=np.float32)/255.0 - mean) / std
    x = torch.from_numpy(arr.transpose(2,0,1)).unsqueeze(0).to(device)
    sx, sy = IMG_SIZE/W, IMG_SIZE/H
    rois = []
    for x1, y1, x2, y2 in persons:
        w, h = x2-x1, y2-y1
        ex1 = max(0, x1 - w*EX); ey1 = max(0, y1 - h*EYT)
        ex2 = min(W, x2 + w*EX); ey2 = min(H, y2 + h*EYB)
        rois.append([0.0, ex1*sx, ey1*sy, ex2*sx, ey2*sy])
    rois_t = torch.tensor(rois, dtype=torch.float32).to(device)
    with torch.no_grad():
        probs = torch.softmax(model(x, rois_t), dim=-1)[:,1].cpu().numpy()
    return [
        {"bbox": persons[i].tolist(),
         "prob_correct": float(probs[i]),
         "predicted": "correct" if probs[i] >= THR else "wrong"}
        for i in range(len(persons))
    ]

② Live RTSP / Webcam 整合

已整合到 rai-model-viewer,部署於 http://192.168.53.21:7860/。 選 model = safety_rope_v20260502 (安全背繩) 後,每個 person 上會疊 bbox 與 CORRECT/WRONG 標籤, 右側直方圖即時顯示每人的安全繩信心值。

③ Threshold 調整建議

場景推薦 Threshold策略
稽核(需高精準告警)≥ 0.50降低誤報,可能漏判薄繩
巡檢(平衡)0.075(best F1)訓練 PR 最佳點
強調 recall≤ 0.03幾乎不漏,但 FP 多

📦 資料集

來源cvat2 project 8 (safety_rope_detection),從 raicvat#12 lock/unlock polygon 轉換而來
樣本數共 8,159 person bbox(mask=1 of safety_rope_use 屬性)
Split使用 CVAT task.subset(Train / Validation / Test),絕不 hash split
標註方式raicvat#12 lock = correct (含繩條延伸),unlock = wrong (僅人物剪影);用 YOLO11n 對 raicvat 影格再 detect 找 person bbox,與原 polygon 中心點配對 → 78% 命中率,剩 22% 用 polygon bbox 替代
類別分布Test: 811 correct / 1419 wrong

📥 模型下載

R2 Bucket:rai-models / safety_rope_v20260502 / best.pt
Public URL(待上傳):https://pub-478929a98a5c440cb22c2241c0bde314.r2.dev/safety_rope_v20260502/best.pt

🛠️ 訓練指令

python3 train_safety_rope_roi.py v20260502 \
  --manifest /home/ubuntu/factory_ppe/safety_rope_dataset/manifest_v1.csv \
  --backbone mobilenetv3_large_100.ra_in1k \
  --batch 32 --epochs 30 --lr 3e-4 --wd 0.01 --patience 6

📋 Hyperparameters

OptimizerAdamW (lr=3e-4, wd=0.01)
SchedulerOneCycleLR, pct_start=0.1, cosine annealing
Batch32
Epochs run20 / 30(early stop 未觸發)
Best epoch14(val_AP=0.870)
訓練時間10 分鐘(1× RTX 5090)
AMPfp16
LossCrossEntropyLoss(2-class)
AugmentationHorizontal flip 50%(bbox 同步翻)

Generated 2026-04-28 | 回到目錄