Antworten:
Der Back-Propagation-Algorithmus ist ein Gradienten-Descent- Algorithmus zum Anpassen eines neuronalen Netzwerkmodells.(wie von @Dikran erwähnt) Lassen Sie mich erklären, wie.
Formal: Wenn Sie die Berechnung des Gradienten am Ende dieses Beitrags in der folgenden Gleichung [1] verwenden (das ist eine Definition des Gradientenabfalls), wird der Algorithmus für die Rückwärtsausbreitung als besonderer Fall für die Verwendung eines Gradientenabfalls angegeben.
Ein neuronales Netzwerkmodell Formal fixieren wir Ideen mit einem einfachen Einschichtmodell:
wobei g : R → R und s : R M → R M für alle m = 1 … , M , s ( x ) bekannt sind [ m ] = σ ( x [ m ] ) und
Eine quadratische Verlustfunktion wird verwendet, um Ideen zu fixieren. Daher können die Eingangsvektoren von R p an die reale Ausgabe ( y 1 , … , y n ) von R (könnten Vektoren sein) angepasst werden, indem der empirische Verlust minimiert wird: R n ( A 1 , A 2 ) = n Σ i = 1 ( y i - f ( i bezüglich der Wahl von A 1 und A 2 .
(for the simple considered neural net model) Let us denote, by the gradient of as a function of , and the gradient of as a function of . Standard calculation (using the rule for derivation of composition of functions) and the use of the notation give
Here I used the R notation: is the vector composed of the coordinates of from index to index .
Back-propogation is a way of working out the derivative of the error function with respect to the weights, so that the model can be trained by gradient descent optimisation methods - it is basically just the application of the "chain rule". There isn't really much more to it than that, so if you are comfortable with calculus that is basically the best way to look at it.
If you are not comfortable with calculus, a better way would be to say that we know how badly the output units are doing because we have a desired output with which to compare the actual output. However we don't have a desired output for the hidden units, so what do we do? The back-propagation rule is basically a way of speading out the blame for the error of the output units onto the hidden units. The more influence a hidden unit has on a particular output unit, the more blame it gets for the error. The total blame associated with a hidden unit then give an indication of how much the input-to-hidden layer weights need changing. The two things that govern how much blame is passed back is the weight connecting the hidden and output layer weights (obviously) and the output of the hidden unit (if it is shouting rather than whispering it is likely to have a larger influence). The rest is just the mathematical niceties that turn that intuition into the derivative of the training criterion.
I'd also recommend Bishops book for a proper answer! ;o)
It's an algorithm for training feedforward multilayer neural networks (multilayer perceptrons). There are several nice java applets around the web that illustrate what's happening, like this one: http://neuron.eng.wayne.edu/bpFunctionApprox/bpFunctionApprox.html. Also, Bishop's book on NNs is the standard desk reference for anything to do with NNs.