Ich versuche die Yolo v2-Verlustfunktion zu verstehen:
Wenn jemand die Funktion detaillieren kann.
Ich versuche die Yolo v2-Verlustfunktion zu verstehen:
Wenn jemand die Funktion detaillieren kann.
Antworten:
Erklärung der verschiedenen Begriffe:
Beachten Sie, dass ich verwendet , um zwei Indizes und j für jede bbox Prognosen, ist dies nicht der Fall , in dem Artikel ist , weil es immer ein Faktor ist 1 o b j i j oder 1 n o o b j i j es so nicht mehrdeutig Interpretation: das gewählte j ist dasjenige, das der höchsten Vertrauensbewertung in dieser Zelle entspricht.
Allgemeinere Erklärung jedes Terms der Summe:
B*(5+C)
? Zumindest ist dies bei YOLO v3 der Fall.
Doesn't the YOLOv2 Loss function looks scary? It's not actually! It is one of the boldest, smartest loss function around.
Let's first look at what the network actually predicts.
If we recap, YOLOv2 predicts detections on a 13x13 feature map, so in total, we have 169 maps/cells.
We have 5 anchor boxes. For each anchor box we need Objectness-Confidence Score (whether any object was found?), 4 Coordinates ( and ) for the anchor box, and 20 top classes. This can crudely be seen as 20 coordinates, 5 confidence scores, and 100 class probabilities for all 5 anchor box predictions put together.
We have few things to worry about:
All losses are mean-squared errors, except classification loss, which uses cross-entropy function.
Now, let's break the code in the image.
We need to compute losses for each Anchor Box (5 in total)
We need to do this for each of the 13x13 cells where S = 12 (since we start index from 0)
is 1 when there is an object in the cell , else 0.
var1 | var2 | (var1 - var2)^2 | (sqrtvar1 - sqrtvar2)^2
0.0300 | 0.020 | 9.99e-05 | 0.001
0.0330 | 0.022 | 0.00012 | 0.0011
0.0693 | 0.046 | 0.000533 | 0.00233
0.2148 | 0.143 | 0.00512 | 0.00723
0.3030 | 0.202 | 0.01 | 0.01
0.8808 | 0.587 | 0.0862 | 0.0296
4.4920 | 2.994 | 2.2421 | 0.1512
Not that scary, right!
Read HERE for further details.
Your loss function is for YOLO v1 and not YOLO v2. I was also confused with the difference in the two loss functions and seems like many people are: https://groups.google.com/forum/#!topic/darknet/TJ4dN9R4iJk
YOLOv2 paper explains the difference in architecture from YOLOv1 as follows:
We remove the fully connected layers from YOLO(v1) and use anchor boxes to predict bounding boxes... When we move to anchor boxes we also decouple the class prediction mechanism from the spatial location and instead predict class and objectness for every anchorbox.
This means that the confidence probability above should depend not only on and but also an anchor box index, say . Therefore, the loss needs to be different from above. Unfortunately, YOLOv2 paper does not explicitly state its loss function.
I try to make a guess on the loss function of YOLOv2 and discuss it here: https://fairyonice.github.io/Part_4_Object_Detection_with_Yolo_using_VOC_2012_data_loss.html
Hier ist meine Studiennotiz
Verlustfunktion: Quadratischer Fehler
ein. Grund: Einfach zu optimieren b. Problem: (1) Passt nicht perfekt zu unserem Ziel, die durchschnittliche Präzision zu maximieren. (2) In jedem Bild enthalten viele Gitterzellen kein Objekt. Dadurch werden die Konfidenzwerte dieser Zellen in Richtung 0 verschoben, wodurch der Gradient von Zellen, die ein Objekt enthalten, häufig übersteuert wird. c. Lösung: Erhöhen Sie den Verlust durch Vorhersagen von Bounding-Box-Koordinaten und verringern Sie den Verlust durch Konfidenzvorhersagen von Boxen, die keine Objekte enthalten. Wir verwenden zwei Parameter
Only one bounding box should be responsible for each obejct. We assign one predictor to be responsible for predicting an object based on which prediction has the highest current IOU with the ground truth.
a. Loss from bound box coordinate (x, y) Note that the loss comes from one bounding box from one grid cell. Even if obj not in grid cell as ground truth.
b. Loss from width w and height h. Note that the loss comes from one bounding box from one grid cell, even if the object is not in the grid cell as ground truth.
c. Loss from the confidence in each bound box. Not that the loss comes from one bounding box from one grid cel, even if the object is not in the grid cell as ground truth.
Loss function only penalizes classification if obj is present in the grid cell. It also penalize bounding box coordinate if that box is responsible for the ground box (highest IOU)
The loss formula you wrote is of the original YOLO paper loss, not the v2, or v3 loss.
There are some major differences between versions. I suggest reading the papers, or checking the code implementations. Papers: v2, v3.
Some major differences I noticed:
Class probability is calculated per bounding box (hence output is now S∗S∗B*(5+C) instead of SS(B*5 + C))
Bounding box coordinates now have a different representation
In v3 they use 3 boxes across 3 different "scales"
You can try getting into the nitty-gritty details of the loss, either by looking at the python/keras implementation v2, v3 (look for the function yolo_loss) or directly at the c implementation v3 (look for delta_yolo_box, and delta_yolo_class).