person age / gender 訓練報告 v20260508

2026-05-08 · MobileNetV3-L 多頭(gender 3-class CE + age 回歸 + age bucket aux)
Self-labeled by Qwen3-VL-8B-Instruct(cvat2 #1 / 79K labels)

關鍵指標 (Test Set)

樣本數 9,837 Gender Accuracy 84.3% Age MAE 5.65 歲 Bucket Accuracy 73.5%

Per-bucket 表現 (Test)

Age bucketnAge MAE (歲)Gender Acc
child15119.5754.3%
teen8711.9167.8%
young2,3185.1270.8%
adult6,8934.7289.5%
senior36817.3788.3%
elder2036.2485.0%

adult / young 較準(佔訓練 89%),edge bucket(elder/child)受 VLM 標籤噪音 + 樣本少影響較大。

訓練曲線 (Val)

訓練配置

BackboneMobileNetV3-L (ImageNet pretrained, timm)
輸入解析度192 × 192
Headsgender (3-class CE) + age regression (smooth L1) + age bucket (6-class aux CE, λ=0.5)
OptimizerAdamW lr=1e-3 wd=1e-4 + Cosine LR
Batch128 × 2 GPUs (DDP NCCL)
Epoch30 (best epoch 28, val MAE 6.91)
AugRandomResizedCrop, HFlip, ColorJitter, RandomErasing(p=0.1)
SamplingClass-balanced (by age_bucket) on single-GPU; DistributedSampler on DDP
訓練機5090-2 雙卡 RTX 5090 (BF16 AMP)
訓練時長~7 min(30 epoch × 9s/ep)

資料來源

cvat2 project#1 person_v20260423 (584 task)
標記方式Qwen3-VL-8B-Instruct 自動 + cvat attribute 寫回
有效標記79,295 person shape (age 1-100, gender male/female/unknown)
切分cvat2 task subset → train 35,725 / val 7,306 / test 9,837
過濾bbox 短邊 ≥ 96px、age != 0、gender != unset
Cropbbox + 20% pad(與 VLM 推論時一致)

模型下載

R2 公開連結https://pub-478929a98a5c440cb22c2241c0bde314.r2.dev/age_gender_v20260508/best.pt
用法(Python,timm + torch):
import torch, timm, torch.nn as nn
ckpt = torch.load('best.pt', map_location='cpu')

class AgeGenderModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.backbone = timm.create_model('mobilenetv3_large_100',
                                          pretrained=False, num_classes=0,
                                          global_pool='avg', drop_rate=0.3)
        d = self.backbone(torch.zeros(1,3,192,192)).shape[-1]
        self.gender_head = nn.Linear(d, 3)
        self.age_head = nn.Linear(d, 1)
        self.bucket_head = nn.Linear(d, 6)
    def forward(self, x):
        f = self.backbone(x)
        return self.gender_head(f), self.age_head(f).squeeze(-1), self.bucket_head(f)

m = AgeGenderModel()
m.load_state_dict(ckpt['state_dict'])
m.eval()
# input: 192×192 RGB normalized ImageNet stats
# output: (gender_logits[3], age_norm[0,1] *100=年齡, bucket_logits[6])

已知限制

產出 2026-05-08 · 5090-2 雙卡訓練 · loop autonomous mode