Loading [MathJax]/jax/output/HTML-CSS/jax.js
Eryck Zhou

A super simple BLOG for Artifical Intelligence.

YOLOv2

28 July 2020

image
Photo by unsplash-logoGrace Brauteseth

Origin: YOLO9000: Better, Faster, Stronger

Imporvements comparing to YOLOv1

1. Add BN

2. Hight Resolution classifier

a.  Train on ImageNet (224 x 224)
b.  Resize & Finetune on ImageNet (448 x 448)
c.  Finetune on dataset
d.  Get 13 x 13 grid finally (7 x 7 grid before)

3. Use Anchors

4. Fine-Grained Features

a.  Lower features are concatenated directly to heigher features
b.  A new layer is added for that purpose: reorg

[abcdefghijklmnop]===>[acik][bdjl][fhnp][egmn]

5. Multi-Scale Training

  • Remove FC layers: Can accpet any size of inputs, enhance model robustness.
  • Size across 320, 352, …, 608. Change 10 per epochs
         [border % 32 = 0, decided by down sampling

Anchor in YOLOv2

Box Generation # Avg IOU
Cluster SSE 5 58.7
Cluster IOU 5 58.7
Anchor Boxes 9 58.7
Cluster IOU 9 58.7

1. Anchor size and number

a.  Faster RCNN: 9 by hands
b.  YOLOv2: 5 by K-Means [dist: 1 − IOU(bbox, cluster)]

2. Anchors, Truth BBoxes & Predicted bboxes

Anchors: 0.57273, 0.677385, …, 9.77052, 9.16828

               [10 numbers: (awi,ahi) * 5)]
               anchors[0] = awi = awiW * 13

Truth Anchor:
  1. oringinal bbox: [xo,yo,wo,ho][0,WH]
  2. normalize original bbox: [xr,yr,wr,hr][0,1]
    • [xr,yr,wr,hr]=[xo/W,yo/H,wo/W,ho/H]
  3. Transfer to 13 x 13 grid and box: [xs,ys,ws,hs][0,13]
    • [xi,yi,wi,hi]=[xr,yr,wr,hr](1313)
    • save this for calculating
    • transfer to 0~1 corresponding to each grid cell
  4. final box: [xf,yf,wf,hf][0,1]
    • xf=xii     // i,j = 13 x 13 grid
    • yf=yij
    • Wf=log(W/anchors[0])
    • Hf=log(H/anchors[1])
Predicted Anchor:

bx=σ(tx)+cxby=σ(ty)+cybw=pwetwbh=phethPr( object )IOU(b, object )=σ(to)

The Model Darknet-19

Output of YOLOv2: [0: 25]

one grid cell: S2 * B * [x, y, w, h, C0,...,CN]

  • detection layer
    • 3 x 331024 Conv
    • add a passthrough layer
    • 1*1 avg pooling

Loss Function

Wi=0Hj=0Ak=01Max IOU<Threshλnoobj(boijk)2+1t<12800λpriorr(x,y,w,h)(priorrkbrijk)2+1truthk(λcoordr(x,y,w,h)(truthrbrijk)+λobj(IoUktruthboijk)2+λclass(Cc=1(truthcbcijk)2))
  • No longer use the square root
  • Confidence: 1 convert to IoU

The path from YOLO to YOLOv2

  YOLO               YOLOv2
batch norm?  
hi-res classifier?    
convolutional?      
anchor boxes?              
new network?        
dimension priors?          
location prediction?          
passthrough?            
multi-scale?              
hi-res detector?                
VOC2007 mAP 63.4 65.8 69.5 69.2 69.6 74.4 75.4 76.8 78.6