Welches Prinzip steckt hinter der Konvergenz der Krylov-Subraummethoden zur Lösung linearer Gleichungssysteme?

Nach meinem Verständnis gibt es zwei Hauptkategorien iterativer Methoden zum Lösen linearer Gleichungssysteme:

Stationäre Methoden (Jacobi, Gauß-Seidel, SOR, Multigrid)
Krylov-Subraum-Methoden (Conjugate Gradient, GMRES usw.)

Ich verstehe, dass die meisten stationären Methoden durch iteratives Relaxieren (Glätten) der Fourier-Modi des Fehlers funktionieren. Wie ich es verstehe, funktioniert die Methode des konjugierten Gradienten (Krylov-Subraum-Methode), indem sie einen optimalen Satz von Suchrichtungen anhand der Potenzen der Matrix "durchläuft", die auf das te Residuum angewendet werden . Ist dieses Prinzip allen Krylov-Subraummethoden gemeinsam? Wenn nicht, wie charakterisieren wir das Prinzip der Konvergenz von Krylov-Subraummethoden im Allgemeinen? $n$

— Paul
quelle

Ihre Analyse stationärer Methoden ist durch einfache Modellprobleme verzerrt, da diese anhand von Fourier-Moden analysiert werden können. Außerdem werden ADI (Alternating Direction Implicit) und viele andere Methoden ignoriert. Bei den meisten "stationären Methoden" geht es darum, viele einfache "approximative partielle" Löser zu einem iterativen Löser zu kombinieren . Der Sinn der Krylov-Methoden ist es, die Konvergenz einer gegebenen stationären linearen Iteration zu beschleunigen (oder sogar zu erzwingen).

— Thomas Klimpel

Ein Artikel, von dem ich glaube, dass er verfasst wurde, um Ihre Fragen zu beantworten, ist Ipsen und Meyer, Die Idee hinter den Krylov-Methoden, Amer. Mathematik. Monthly 105 (1998), S. 889-899. Es ist ein wunderbar gut geschriebenes und klarstellendes Papier, das hier erhältlich ist .

— Andrew T. Barker

@ AndrewT.Barker: Großartig! Vielen Dank Andrew! :)

— Paul

Antworten:

Im Allgemeinen suchen alle Krylov-Methoden im Wesentlichen ein Polynom, das klein ist, wenn es im Spektrum der Matrix ausgewertet wird. Insbesondere kann der te Rest einer Krylov-Methode (mit Null-Anfangsschätzung) in der Form geschrieben werden $n$

r_{n} = P_{n} (A) b

$r_n = P_n (A) b$

wobei ein monisches Polynom vom Grad $P_n$ $n$ .

Wenn diagonalisierbar ist, haben wir mit $A$ $A=V\Lambda V^{-1}$

\begin{array}{rcl} ‖ r_{n} ‖ & \leq & ‖ V ‖ \cdot ‖ P_{n} (Λ) ‖ \cdot ‖ V^{- 1} ‖ \cdot ‖ b ‖ \\ = & κ (V) \cdot ‖ P_{n} (Λ) ‖ \cdot ‖ b ‖ . \end{array}

$\begin{eqnarray*} \|r_n\| &\leq& \|V\|\cdot \|P_n(\Lambda)\|\cdot \|V^{-1}\|\cdot \|b\|\\ &=& \kappa(V) \cdot \|P_n(\Lambda)\| \cdot \|b\|. \end{eqnarray*}$

Für den Fall, dass normal ist (z. B. symmetrisch oder einheitlich), wissen wir, dass GMRES konstruiert ein solches Polynom durch Arnoldi-Iteration, während CG das Polynom unter Verwendung eines anderen inneren Produkts konstruiert (siehe diese Antwort) $A$ $\kappa(V) = 1.$ für Details). . In ähnlicher Weise konstruiert BiCG sein Polynom durch den nicht symmetrischen Lanczos-Prozess, während die Chebyshev-Iteration vorherige Informationen über das Spektrum verwendet (normalerweise Schätzungen der größten und kleinsten Eigenwerte für symmetrische bestimmte Matrizen).

Betrachten Sie als cooles Beispiel (motiviert von Trefethen + Bau) eine Matrix, deren Spektrum wie folgt lautet:

Spektrum der Matrix

In MATLAB habe ich dies konstruiert mit:

A = rand(200,200);
[Q R] = qr(A);
A = (1/2)*Q + eye(200,200);

Wenn wir GMRES betrachten, das Polynome konstruiert, die tatsächlich den Residuum über alle monischen Polynome des Grades minimieren , können wir die Residuumshistorie leicht vorhersagen, indem wir das Kandidatenpolynom betrachten $n$

P_{n} (z) = (1 - z)^{n}

$P_n (z) = (1-z)^n$

was in unserem Fall gibt

| P_{n} (z) | = \frac{1}{2^{n}}

$|P_n(z)| = \frac{1}{2^n}$

für in dem Spektrum von . $z$ $A$

Wenn wir nun GMRES mit einer zufälligen RHS ausführen und die Residuenhistorie mit diesem Polynom vergleichen, sollten sie ziemlich ähnlich sein (die Kandidatenpolynomwerte sind kleiner als die GMRES-Residuen, weil ): $\|b\|_2 > 1$

Restliche Geschichte

— Reid. Atcheson
quelle

Können Sie klarstellen, was Sie unter "klein im Spektrum der Matrix" verstehen?

— Paul

Als komplexes Polynom genommen hat das Polynom

einen kleinen Modul in einem Bereich der komplexen Ebene, der das Spektrum von

. Stellen Sie sich ein Konturdiagramm vor, das einem Streudiagramm der Eigenwerte überlagert ist. Wie klein ist klein? Es hängt vom Problem ab, ob

normal ist und die rechte Seite

Die Grundidee ist jedoch, dass die Folge von Polynomen

im Spektrum immer kleiner werden soll, so dass die Restschätzung in meiner Antwort gegen

P_{n}

$P_n$

A

$A$

A

$A$

b .

$b.$

(P_{n})

$(P_n)$

0

$0$

— Reid.Atcheson

@ Reid.Atcheson: Sehr gut ausgedrückt. Könnte ich empfehlen,

als

schreiben und zu erwähnen, dass es eine für normale Matrizen ist?

‖ V ‖ ‖ V^{- 1} ‖

$\|V\|\|V^{-1}\|$

κ (V)

$\kappa(V)$

— Jack Poulson

Der mit optimaler SOR vorkonditionierte Laplace-Wert hat ein Spektrum, das dieser Beispielmatrix sehr ähnlich ist. Details hier: scicomp.stackexchange.com/a/852/119

— Jed Brown

Streng genommen ist CGNE unabhängig vom Spektrum, da es nur von singulären Werten abhängt.

— Jed Brown

On norms

$n^{\mathrm{th}}$ iteration, GMRES finds the polynomial $P_n$ that minimizes the $2$ -norm of the residual

r_{n} = A x_{n} - b = (P_{n} (A) - 1) b - b = P_{n} (A) b .

$r_n = A x_n - b = \big(P_n(A) - 1 \big)b - b = P_n(A) b .$

Suppose $A$ is SPD, so $A$ induces a norm and so does $A^{-1}$ . Then

\begin{aligned} ‖ r_{n} ‖_{A^{- 1}} & = r_{n}^{T} A^{- 1} r_{n} \\ = (A e_{n})^{T} A^{- 1} A e_{n} \\ = e_{n}^{T} A e_{n} \\ = ‖ e_{n} ‖_{A} \end{aligned}

$\begin{align*} \lVert r_n \rVert_{A^{-1}} &= r_n^T A^{-1} r_n \\ &= (A e_n)^T A^{-1} A e_n \\ &= e_n^T A e_n \\ &= \lVert e_n \rVert_{A} \end{align*}$

where we have used the error

e_{n} = x_{n} - x_{*} = x_{n} - A^{- 1} b = A^{- 1} r_{n}

$e_n = x_n - x_* = x_n - A^{-1} b = A^{-1} r_n$

Thus the $A$ -norm of the error is equivalent to the $A^{-1}$ norm of the residual. Conjugate gradients minimizes the $A$ -norm of the error which makes it relatively more accurate at resolving low energy modes. The $2$ -norm of the residual, which GMRES minimizes, is like the $A^T A$ -norm of the error, and thus is weaker in the sense that low-energy modes are less well-resolved. Note that the $A$ -norm of the residual is essentially worthless because it is even weaker on low-energy modes.

Sharpness of convergence bounds

Finally, there is interesting literature regarding different Krylov methods and subtleties of GMRES convergence, especially for non-normal operators.

Nachtigal, Reddy, and Trefethen (1992) How fast are nonsymmetric matrix iterations? (author's pdf) gives examples of matrices for which one method beats all others by a large factor (at least the square root of the matrix size).
Embree (1999) How descriptive are GMRES convergence bounds? gives an insightful discussion in terms of pseudospectra which give sharper bounds and also applies to non-diagonalizable matrices.
Embree (2003) The tortoise and the hare restart GMRES (author pdf)
Greenbaum, Pták, and Strakoš (1996) Any nonincreasing convergence curve is possible for GMRES

— Jed Brown
quelle

You left off the excellent book by Olavi Nevanlinna: books.google.com/…

— Matt Knepley

Iterative methods in a nutshell:

Stationary methods are in essence fixed point iterations: To solve $Ax=b$ , you pick an invertible matrix $C$ and find a fixed point of
$x = x + C b - C A x$ $x = x + Cb- CAx$ This converges by Banach's fixed point theorem if $\|I-CA\|<1$ . The various methods then correspond to a specific choice of $C$ (e.g., for Jacobi iteration, $C=D^{-1}$ , where $D$ is a diagonal matrix containing the diagonal elements of $A$ ).
Krylov methods subspace methods are in essence projection methods: You pick subspaces $U,V\subset \mathbb{C}^n$ and look for a $\tilde x \in U$ such that the residual $b-A\tilde x$ is orthogonal to $V$ . For Krylov methods, $U$ of course is the space spanned by powers of $A$ applied to an initial residual. The various methods then correspond to specific choices of $V$ (e.g., $V=U$ for CG and $V=AU$ for GMRES).

The convergence properties of these methods (and projection methods in general) follow from the fact that due to the respective choice of $V$ , the $\tilde x$ are optimal over $U$ (e.g., they minimize the error in the energy norm for CG or the residual for GMRES). If you increase the dimension of $U$ in every iteration, you are guaranteed (in exact arithmetic) to find the solution after finitely many steps.

As pointed out by Reid Atcheson, using Krylov spaces for $U$ allows you to prove rates of convergence in terms of the eigenvalues (and thus the condition number) of $A$ . In addition, they are crucial for deriving efficient algorithms for computing the projection $\tilde x$ .

This is nicely explained in Youcef Saad's book on iterative methods.

— Christian Clason
quelle