RoI Align MobileNetV3-L 2-class 訓練日期:2026-04-28 | 來源:cvat2 project 8 (safety_rope_detection)
| 項目 | 內容 |
|---|---|
| Backbone | mobilenetv3_large_100.ra_in1k(ImageNet 預訓) |
| 架構 | 整張圖 → backbone features → RoI Align (7×7) → MLP head → softmax(2) |
| 輸入大小 | 640 × 640 |
| Feature channels | 960(last stage) |
| Spatial scale | 1/32(mobilenetv3 final stride) |
| 參數量 | 5.18 M |
| BBox expand | x: ±100% × bbox_w(左右各擴 100%) y_top: +20% × bbox_h(上方擴 20%) y_bot: +150% × bbox_h(下方擴 150%,看安全繩落點) |
| Class | 0 = wrong(沒掛 / 掛錯), 1 = correct(正確扣繩) |
| Pred wrong | Pred correct | Total | |
|---|---|---|---|
| True wrong (0) | TN 1262 | FP 157 | 1419 |
| True correct (1) | FN 116 | TP 695 | 811 |
| Total | 1378 | 852 | 2230 |
import torch, torch.nn as nn
import torchvision.ops as tvops
import timm
import numpy as np
from PIL import Image
from ultralytics import YOLO
# ── Model ──────────────────────────────────────────────
class RoISafetyRopeModel(nn.Module):
def __init__(self, backbone_name="mobilenetv3_large_100.ra_in1k"):
super().__init__()
self.backbone = timm.create_model(backbone_name, pretrained=False, features_only=True)
ch = self.backbone.feature_info.channels()[-1] # 960
self.roi_align = tvops.RoIAlign((7,7), spatial_scale=1/32, sampling_ratio=2)
self.head = nn.Sequential(
nn.Conv2d(ch, 256, 3, padding=1), nn.ReLU(inplace=True),
nn.AdaptiveAvgPool2d((1,1)), nn.Flatten(),
nn.Dropout(0.3), nn.Linear(256, 2),
)
def forward(self, image, rois):
feats = self.backbone(image)[-1]
return self.head(self.roi_align(feats, rois))
# ── 載入 ──────────────────────────────────────────────
device = torch.device("cuda")
ckpt = torch.load("safety_rope_v20260502_best.pt", map_location=device, weights_only=False)
model = RoISafetyRopeModel(ckpt["backbone_name"]).to(device).eval()
model.load_state_dict(ckpt["model_state"])
yolo = YOLO("person_seg_yolo11n.pt") # person detector
THR = ckpt["thr"] # 0.0749
IMG_SIZE = ckpt["img_size"] # 640
EX = ckpt["expand_x"] # 1.0
EYT = ckpt["expand_y_top"] # 0.2
EYB = ckpt["expand_y_bot"] # 1.5
# ── 預測(單張圖) ────────────────────────────────────
mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
std = np.array([0.229, 0.224, 0.225], dtype=np.float32)
def predict(img_pil):
W, H = img_pil.size
persons = yolo(img_pil, verbose=False, conf=0.35)[0].boxes.xyxy.cpu().numpy()
if len(persons) == 0:
return []
img_resized = img_pil.resize((IMG_SIZE, IMG_SIZE))
arr = (np.array(img_resized, dtype=np.float32)/255.0 - mean) / std
x = torch.from_numpy(arr.transpose(2,0,1)).unsqueeze(0).to(device)
sx, sy = IMG_SIZE/W, IMG_SIZE/H
rois = []
for x1, y1, x2, y2 in persons:
w, h = x2-x1, y2-y1
ex1 = max(0, x1 - w*EX); ey1 = max(0, y1 - h*EYT)
ex2 = min(W, x2 + w*EX); ey2 = min(H, y2 + h*EYB)
rois.append([0.0, ex1*sx, ey1*sy, ex2*sx, ey2*sy])
rois_t = torch.tensor(rois, dtype=torch.float32).to(device)
with torch.no_grad():
probs = torch.softmax(model(x, rois_t), dim=-1)[:,1].cpu().numpy()
return [
{"bbox": persons[i].tolist(),
"prob_correct": float(probs[i]),
"predicted": "correct" if probs[i] >= THR else "wrong"}
for i in range(len(persons))
]
已整合到 rai-model-viewer,部署於 http://192.168.53.21:7860/。
選 model = safety_rope_v20260502 (安全背繩) 後,每個 person 上會疊 bbox 與 CORRECT/WRONG 標籤,
右側直方圖即時顯示每人的安全繩信心值。
| 場景 | 推薦 Threshold | 策略 |
|---|---|---|
| 稽核(需高精準告警) | ≥ 0.50 | 降低誤報,可能漏判薄繩 |
| 巡檢(平衡) | 0.075(best F1) | 訓練 PR 最佳點 |
| 強調 recall | ≤ 0.03 | 幾乎不漏,但 FP 多 |
| 來源 | cvat2 project 8 (safety_rope_detection),從 raicvat#12 lock/unlock polygon 轉換而來 |
|---|---|
| 樣本數 | 共 8,159 person bbox(mask=1 of safety_rope_use 屬性) |
| Split | 使用 CVAT task.subset(Train / Validation / Test),絕不 hash split |
| 標註方式 | raicvat#12 lock = correct (含繩條延伸),unlock = wrong (僅人物剪影);用 YOLO11n 對 raicvat 影格再 detect 找 person bbox,與原 polygon 中心點配對 → 78% 命中率,剩 22% 用 polygon bbox 替代 |
| 類別分布 | Test: 811 correct / 1419 wrong |
R2 Bucket:rai-models / safety_rope_v20260502 / best.pt
Public URL(待上傳):https://pub-478929a98a5c440cb22c2241c0bde314.r2.dev/safety_rope_v20260502/best.pt
python3 train_safety_rope_roi.py v20260502 \
--manifest /home/ubuntu/factory_ppe/safety_rope_dataset/manifest_v1.csv \
--backbone mobilenetv3_large_100.ra_in1k \
--batch 32 --epochs 30 --lr 3e-4 --wd 0.01 --patience 6
| Optimizer | AdamW (lr=3e-4, wd=0.01) |
|---|---|
| Scheduler | OneCycleLR, pct_start=0.1, cosine annealing |
| Batch | 32 |
| Epochs run | 20 / 30(early stop 未觸發) |
| Best epoch | 14(val_AP=0.870) |
| 訓練時間 | 10 分鐘(1× RTX 5090) |
| AMP | fp16 |
| Loss | CrossEntropyLoss(2-class) |
| Augmentation | Horizontal flip 50%(bbox 同步翻) |
Generated 2026-04-28 | 回到目錄