28 July 2020
Origin: YOLO9000: Better, Faster, Stronger
a. Train on ImageNet (224 x 224)
b. Resize & Finetune on ImageNet (448 x 448)
c. Finetune on dataset
d. Get 13 x 13 grid finally (7 x 7 grid before)
a. Lower features are concatenated directly to heigher features
b. A new layer is added for that purpose: reorg
Can accpet any size of inputs, enhance model robustness.
[border % 32 = 0, decided by down sampling
Box Generation | # | Avg IOU |
---|---|---|
Cluster SSE | 5 | 58.7 |
Cluster IOU | 5 | 58.7 |
Anchor Boxes | 9 | 58.7 |
Cluster IOU | 9 | 58.7 |
a. Faster RCNN: 9 by hands
b. YOLOv2: 5 by K-Means [dist: 1 − IOU(bbox, cluster)]
[10 numbers: (awi,ahi) * 5)]
anchors[0] = awi = awiW * 13
bx=σ(tx)+cxby=σ(ty)+cybw=pwetwbh=phethPr( object )∗IOU(b, object )=σ(to)
one grid cell: S2 * B * [x, y, w, h, C0,...,CN]
YOLO | YOLOv2 | ||||||||
---|---|---|---|---|---|---|---|---|---|
batch norm? | √ | √ | √ | √ | √ | √ | √ | √ | |
hi-res classifier? | √ | √ | √ | √ | √ | √ | √ | ||
convolutional? | √ | √ | √ | √ | √ | √ | |||
anchor boxes? | √ | √ | |||||||
new network? | √ | √ | √ | √ | √ | ||||
dimension priors? | √ | √ | √ | √ | |||||
location prediction? | √ | √ | √ | √ | |||||
passthrough? | √ | √ | √ | ||||||
multi-scale? | √ | √ | |||||||
hi-res detector? | √ | ||||||||
VOC2007 mAP | 63.4 | 65.8 | 69.5 | 69.2 | 69.6 | 74.4 | 75.4 | 76.8 | 78.6 |