👥 Age + Gender — v20260507 Final(8hr 自主研究)

multi-task: gender binary + age 4-class image cls (no bbox) PA-100K + MSP60K mix (160K crops) 13 ablations + SWA-4 訓練:2026-05-07 | 5090-2 dual-GPU agent | 5h wall-clock

🎯 SWA-4 冠軍(PA-100K test, 10K samples)

gender accuracy
0.935
vs E single 0.857 (+8pp)
age 4-class accuracy
0.976
vs E 0.924 (+5pp)
age macro F1
0.707
adult recall
0.988
elder recall ⭐
0.760
vs E 0.345 (+42pp)
child recall ⚠
0.348
SWA trade-off
⭐ SWA-4 推薦上線:4 個 convnext_tiny seed (E/E2/E3/E4) state_dict 平均,1× 推論成本但 elder recall +42pp。safety-critical 場景(工地辨識老人需要關懷對象)強烈推薦。child recall 從 0.73→0.35 是 trade-off — 工地 child 出現機率低,可接受。

📦 模型下載

⭐ SWA-4 (生產冠軍)
ConvNeXt-Tiny 28M, 384×192
g_acc 0.935 / a_acc 0.976 / elder 0.76
4 seed weight 平均,1× 推論
106 MB
⬇ best.pt eval
E_convnext_tiny (single seed 對照)
同 architecture, single seed
g_acc 0.857 / a_acc 0.924 / elder 0.345
106 MB
⬇ best.pt
📋 載入 + 推論範例(Python,點開)
import torch, torch.nn as nn, timm
from PIL import Image
import torchvision.transforms as T

class MultiHead(nn.Module):
    def __init__(self, backbone_name, drop_rate=0.3, num_age=4):
        super().__init__()
        self.backbone = timm.create_model(backbone_name, pretrained=False,
                                          num_classes=0, global_pool="avg",
                                          drop_rate=drop_rate)
        with torch.no_grad():
            feat_dim = self.backbone(torch.zeros(1,3,64,64)).shape[-1]
        self.gender_head = nn.Linear(feat_dim, 1)
        self.age_head = nn.Linear(feat_dim, num_age)
    def forward(self, x):
        f = self.backbone(x)
        return self.gender_head(f).squeeze(-1), self.age_head(f)

ckpt = torch.load("age_gender_v20260507_swa4.pt", weights_only=False)
model = MultiHead(ckpt["args"]["backbone"]).eval()
model.load_state_dict(ckpt["model_state"])

mean = [0.485, 0.456, 0.406]; std = [0.229, 0.224, 0.225]
tf = T.Compose([T.Resize((384, 192)), T.ToTensor(), T.Normalize(mean, std)])
x = tf(Image.open("person_crop.jpg").convert("RGB")).unsqueeze(0)
with torch.no_grad():
    g_logit, a_logit = model(x)
gender = "female" if torch.sigmoid(g_logit).item() > 0.5 else "male"
age = ["child","young","adult","elder"][a_logit.argmax(dim=-1).item()]
# 注意:young 在 PA-100K + MSP60K 都沒 supervision,推論不會出 young

🚀 部署到 ppe-demo

已上線 https://ppe-demo.intemotech.com/,dropdown 選「👥 年齡+性別 | v20260507 SWA-4 ⭐ + BoT-SORT」。

另含 BoT-SORT 追蹤 + 跨 frame majority vote:每 person bbox 顯示 T#7 ♂ 0.92 成人 0.97 (95f),95 是該 track 累積的 frame 數。Track 越長預測越穩定。新影片自動 reset tracker state。

📄 完整研究紀錄


Generated 2026-05-07 | rai-vision-training | 8hr autonomous research on 5090-2 | kaggle-reports.pages.dev