YOLOv2

28 July 2020

Origin: YOLO9000: Better, Faster, Stronger

a. Train on ImageNet (224 x 224)
b. Resize & Finetune on ImageNet (448 x 448)
c. Finetune on dataset
d. Get 13 x 13 grid finally (7 x 7 grid before)

a. Lower features are concatenated directly to heigher features
b. A new layer is added for that purpose: reorg

Remove FC layers: Can accpet any size of inputs, enhance model robustness.
Size across 320, 352, …, 608. Change 10 per epochs
[border % 32 = 0, decided by down sampling

a. Faster RCNN: 9 by hands
b. YOLOv2: 5 by K-Means [dist: 1 − IOU(bbox, cluster)]

[10 numbers: () * 5)]
anchors[0] = = * 13

oringinal bbox:
normalize original bbox: [xr,yr,wr,hr]∈[0,1]
Transfer to 13 x 13 grid and box: [xs,ys,ws,hs]∈[0,13]
- save this for calculating
- transfer to 0~1 corresponding to each grid cell
final box: [xf,yf,wf,hf]∈[0,1]
- // i,j = 13 x 13 grid

one grid cell: * B * [x, y, w, h, ]