age_gender v20260508 報告

Age bucket	n	Age MAE (歲)	Gender Acc
child	151	19.57	54.3%
teen	87	11.91	67.8%
young	2,318	5.12	70.8%
adult	6,893	4.72	89.5%
senior	368	17.37	88.3%
elder	20	36.24	85.0%

Age bucket

Age MAE (歲)

Gender Acc

child

151

19.57

54.3%

teen

11.91

67.8%

young

2,318

5.12

70.8%

adult

6,893

4.72

89.5%

senior

368

17.37

88.3%

elder

36.24

85.0%

adult / young 較準（佔訓練 89%），edge bucket（elder/child）受 VLM 標籤噪音 + 樣本少影響較大。

訓練配置

Backbone	MobileNetV3-L (ImageNet pretrained, timm)
輸入解析度	192 × 192
Heads	gender (3-class CE) + age regression (smooth L1) + age bucket (6-class aux CE, λ=0.5)
Optimizer	AdamW lr=1e-3 wd=1e-4 + Cosine LR
Batch	128 × 2 GPUs (DDP NCCL)
Epoch	30 (best epoch 28, val MAE 6.91)
Aug	RandomResizedCrop, HFlip, ColorJitter, RandomErasing(p=0.1)
Sampling	Class-balanced (by age_bucket) on single-GPU; DistributedSampler on DDP
訓練機	5090-2 雙卡 RTX 5090 (BF16 AMP)
訓練時長	~7 min（30 epoch × 9s/ep）

Backbone

MobileNetV3-L (ImageNet pretrained, timm)

輸入解析度

192 × 192

Heads

gender (3-class CE) + age regression (smooth L1) + age bucket (6-class aux CE, λ=0.5)

Optimizer

AdamW lr=1e-3 wd=1e-4 + Cosine LR

Batch

128 × 2 GPUs (DDP NCCL)

Epoch

30 (best epoch 28, val MAE 6.91)

Aug

RandomResizedCrop, HFlip, ColorJitter, RandomErasing(p=0.1)

Sampling

Class-balanced (by age_bucket) on single-GPU; DistributedSampler on DDP

訓練機

5090-2 雙卡 RTX 5090 (BF16 AMP)

訓練時長

~7 min（30 epoch × 9s/ep）

資料來源

cvat2 project	#1 person_v20260423 (584 task)
標記方式	Qwen3-VL-8B-Instruct 自動 + cvat attribute 寫回
有效標記	79,295 person shape (age 1-100, gender male/female/unknown)
切分	cvat2 task subset → train 35,725 / val 7,306 / test 9,837
過濾	bbox 短邊 ≥ 96px、age != 0、gender != unset
Crop	bbox + 20% pad（與 VLM 推論時一致）

cvat2 project

#1 person_v20260423 (584 task)

標記方式

Qwen3-VL-8B-Instruct 自動 + cvat attribute 寫回

有效標記

79,295 person shape (age 1-100, gender male/female/unknown)

切分

cvat2 task subset → train 35,725 / val 7,306 / test 9,837

過濾

bbox 短邊 ≥ 96px、age != 0、gender != unset

Crop

bbox + 20% pad（與 VLM 推論時一致）

模型下載

R2 公開連結：https://pub-478929a98a5c440cb22c2241c0bde314.r2.dev/age_gender_v20260508/best.pt
用法（Python，timm + torch）：

import torch, timm, torch.nn as nn
ckpt = torch.load('best.pt', map_location='cpu')

class AgeGenderModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.backbone = timm.create_model('mobilenetv3_large_100',
                                          pretrained=False, num_classes=0,
                                          global_pool='avg', drop_rate=0.3)
        d = self.backbone(torch.zeros(1,3,192,192)).shape[-1]
        self.gender_head = nn.Linear(d, 3)
        self.age_head = nn.Linear(d, 1)
        self.bucket_head = nn.Linear(d, 6)
    def forward(self, x):
        f = self.backbone(x)
        return self.gender_head(f), self.age_head(f).squeeze(-1), self.bucket_head(f)

m = AgeGenderModel()
m.load_state_dict(ckpt['state_dict'])
m.eval()
# input: 192×192 RGB normalized ImageNet stats
# output: (gender_logits[3], age_norm[0,1] *100=年齡, bucket_logits[6])

已知限制

Edge bucket（child <1.5%、teen <1%、elder <1%）樣本太少，per-bucket MAE 18-36 歲 → 訓練時 oversample 不夠

Age 是回歸到 [0,100]，VLM 自己也偏 5 倍數，下游精度天花板 ~5 歲

Gender unknown 是 VLM 看不清的（背影、過小）— 7% 比例屬正常

未驗證 in-distribution shift（與訓練 cvat 不同來源的場域可能下降 1-3%）

產出 2026-05-08 · 5090-2 雙卡訓練 · loop autonomous mode

person age / gender 訓練報告 v20260508

關鍵指標 (Test Set)

Per-bucket 表現 (Test)

訓練曲線 (Val)

訓練配置

資料來源

模型下載

已知限制