2.290 - Linear Systems - keiji's notes

![[Pasted image 20250401152351.png]] ## Gaussian Elimination ### Reduction ### Back-Substitution ### Thomas algorithm - solves $Ax=b$ for tridiagonal A in $O(n)$ time - great for finite differences - requires no pivoting if A is diagonally dominant Elimination in $O(n)$ instead of $O(n^{3})$ because we only have to eliminate a single subdiagonal ## LU Decomposition ### Cholesky Start by decomposing A into a lower triangular matrix and its transpose (works if A is PSD) cost = $\frac{1}{3} n^{3}+O(n^{2})$ (half of LU) $ A = L L^{\mathrm{T}} $ $ L L^{\mathrm{T}}x = b $ Define $y = L^{\mathrm{T}}x$ such that $ Ly = b $ Forward subsitution to find $y$, then solve the following for $x$ using backsubstitution Cost = $n^{2} + O(n)$ $ L^{\mathrm{T}}x = y $ Total cost: $\frac{n^{3}}{3}$ ### Banded Matrices A matrix with $p$ super-diagonals and $q$ sub-diagonals has bandwidth $ w = p + q + 1 $ If $A$ is sparse banded, 3 bands → $O(8n)$ ## Matrix Norms | Name | Formula | Useful for | | ------------------ | ----------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- | | Maximum Column Sum | $\lvert\lvert A \rvert\rvert_1 = \max\limits_{1 \leq j \leq n} \sum\limits_{i=1}^{m}\lvert a_{ij} \rvert$ | Stability analysis | | Maximum Row Sum | ${} \lvert\lvert A \rvert\rvert_{\infty} = \max\limits_{1 \leq i \leq m} \sum\limits_{j=1}^{n} \lvert a_{ij} \rvert {}$ | Matrix conditioning | | Frobenius Norm | $\lvert\lvert A \rvert\rvert_F = \sqrt{\sum\limits_{i=1}^{m} \sum\limits_{j=1}^{n} \lvert a_{ij} \rvert^{2}}$ | Error measurement | | Spectral Norm | $\lvert\lvert A \rvert\rvert_2 = \sqrt{\lambda_{\max} (A^{\mathrm{T}}A)}$ | Principal component analysis,<br>iterative schemes blow up when $\lVert A \rVert_{2} >1$ | ## Condition Number $ K(A) = \lVert A^{-1} \rVert ~ \lVert A \rVert $ ## Semi-Positive Definite $M$ is positive semi-definite (PSD) if and only if $ x^{\mathrm{T}}Mx \geq0 $ Condition number is the ratio of the relative error in $f(x)$ to that in $x$ $ K_{p} = \left\lvert \frac{\bar{x} f'(\bar{x})}{f(\bar{x})} \right\rvert $ ## Iterative Methods ### Stationary Methods: $ x^{k+1} = Bx^{k} + C $ Where $B$ and $C$ are fixed. This converges when $ \lvert \lvert B \rvert \rvert_{2} = \text{max} \lvert \lambda_{i} \rvert <1 $ For the following methods we let $ A = D + L + U $ where $D$ is the diagonal, $L$ is lower triangular, and $U$ is upper triangular. Motivation: taking the inverse of a diagonal matrix is trivial, and inverse of lower triangular matrix is *much* easier than having to go through LU decomposition | Method | Requires | $B$ | $C$ | Cost | | ------------ | -------------------------- | ---------------------------------------------- | --------------------------- | ---- | | Jacobi | Diagonally Dominant | ${} -D^{-1}(L+U) {}$ | ${} D^{-1}b {}$ | | | Gauss-Seidel | Diagonally dominant or PSD | ${} -(D+L)^{-1}U {}$ | ${} (D+L)^{-1}b {}$ | | | SOR | PSD, ${} 0<\omega<2 {}$ | $(D + \omega L)^{-1}(-\omega U + (1-\omega)D)$ | $\omega(D+\omega L)^{-1} b$ | | #### Jacobi's Method $ x^{k+1} = -D^{-1} (L+U)x^{k} + D^{-1}b $ Converges if A strictly diagonally dominant #### Gauss-Seidel Iteration $ x^{k+1} = -(D+L)^{-1}Ux^{k} + (D+L)^{-1}b $Converges if A strictly diagonally dominant or A is symmetric and positive definite #### Successive Over Relaxation $ x^{k+1} = $ $ x_{i+1} = (D + \omega L)^{-1}(-\omega U + (1-\omega)D)x^{k} + \omega(D+\omega L)^{-1} b $ ### Gradient Methods Construct Equivalent Optimization Problem $ Q(x) = \frac{1}{2} x^{T} Ax - x^{T}b $ $ \frac{d}{dx}Q(x) = Ax - b $ $ \frac{dQ(x)}{dx}|_{x_{opt}} = 0 $ $ \begin{align} x_{opt} = x_{e} & & \text{where } Ax_{e} = b \end{align} $ Iterate using the following step rule: $ x_{i+1} = x_{i} + \alpha_{i}v_{i} $ ${} v_{i}$: search direction at iteration $i+1$ $\alpha_{i}$: step size iteration $i+1$ $ r_{i} = b-Ax_{i} $ | Method | Requires | $\alpha$ | ${} v_{i+1} {}$ | ${} r_{i+1} {}$ | Convergence Condition | | ------------------ | -------- | ----------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------- | ------------------------ | ----------------------------------------------------- | | Steepest Descent | PSD | $\frac{r_{i}^{T}r_{i}}{r_{i}^{T}Ar_{i}}$ | $r_{i}$ | ${} b - Ax_{i} {}$ | $\nabla f(x) \neq 0$ and good step size $\alpha_n$ | | Conjugate Gradient | PSD | ${} -\frac{r_{i}^{\mathrm{T}}r_{i}}{v_{i}^{\mathrm{T}}Av_{i}} {}$ | $r_{i+1}+\beta v_{i}lt;br>where<br>${} \beta = \frac{r_{i+1}^{\mathrm{T}}r_{i+1}}{v_{i}^{\mathrm{T}}v_{i}} {}$ | $r_{i}-\alpha_{i}Av_{i}$ | Symmetric, positive-definite matrix in quadratic case | #### Steepest Descent $ \begin{align} d_{k} &= -\nabla Q(x^{k}) \\ &=b-Ax^{k} \\ &=r^{k} \end{align} $ $ r_{i} = b - Ax_{i} $ Set $v_{i}^{\mathrm{T}}r_{i+1}=0$ and choose $v_{i+1}=-r_{i}$ and solve for step length so that your new residual is orthogonal to the last search direction $ x_{i+1} = x_{i} + \frac{r_{i}^{T}r_{i}}{r_{i}^{T}Ar_{i}} r_{i} $ - All directions are orthogonal to each other (can only search in two directions) - No guarantee that it will converge in some number of steps #### Conjugate Gradient Initialization $r_{0} = b - Ax_{0}$ 1. Calculate steepest descent step length $ \alpha_{i} = \frac{v_{i}^{\mathrm{T}}r_{i}}{v_{i}^{\mathrm{T}}Av_{i}} $ 2. Approximate Solution $ x_{i+1} = x_{i} + \alpha_{i}v_{i} $ 3. New Residual $ r_{i+1} =b-A(x_{i}+\alpha_{i}v_{i})= r_{i} - \alpha_{i}Av_{i} $ 4. Secondary step length and direction such that $v_{i+1}^{\mathrm{T}}Av_{i}=0$ (orthogonal in the A-space) $ \beta_{i} = -\frac{v_{i}^{T} Ar_{i+1}}{v_{i}^{T}Av_{i}} $ 5. New step direction is steepest descent plus $\beta$ term to make it A-orthogonal $ v_{i+1} = r_{i+1} + \beta_{i}v_{i} $ ##### Krylov Subspace $ K_{n_{s}} = \mathrm{span}(b, Ab, \dots, A^{n_{s}-1}b) $ - Sequence converges towards eigenvector with largest eigenvalue - Vectors become increasingly linearly dependent - Orthogonal basis of the subspace is close to the $n_{s}$ eigenvectors with the largest eigenvalues #### BiCGStab BiCG is an extension of CG to nonsymmetric problems by simultaneously iterating on $A$ and $A^T$ to maintain two sets of conjugacy conditions (one with respect to $A$, one with respect to $A^T$). It uses two sequences of search directions and two residuals. BiCG can converge in at most $n$ steps as well, but it can suffer from irregular convergence or breakdown if $A^T$-residual becomes zero unexpectedly. - **BiCGSTAB (BiCG Stabilized):** This method builds on BiCG but tries to _smooth out_ the convergence. BiCGSTAB uses the BiCG two-subspace approach but then applies a technique to eliminate the zigzag behavior in the residual by a secondary minimal residual smoothing step each iteration. The result is often a more stable and smoothly converging method than BiCG. - BiCGSTAB is one of the popular choices for general nonsymmetric systems. It doesn’t guarantee monotonic residual reduction, but often yields faster convergence than restarted GMRES for many problems, and uses only a fixed amount of memory (it only needs to store a few vectors, unlike GMRES which stores an expanding basis). #### GMRES This method builds an orthonormal basis of the Krylov subspace using the **Arnoldi iteration** (which is Gram-Schmidt applied to ${r, Ar, A^2r, …}$). It then finds the approximate solution $x^{(m)}$ that **minimizes the residual norm** $|b - A x|$ over that subspace. - At each iteration $m$, GMRES solves a small $m \times m$ least-squares problem to get the new $x^{(m)}$. The residual monotonically decreases (or at least never increases). GMRES is great for tough matrices because of this optimal residual property. - _Cons:_ The Krylov basis keeps growing, so storage and computation increase with iteration. In practice, GMRES is often used with restart: e.g. "GMRES(20)" means run 20 iterations, then restart with the current solution as new initial guess, forgetting the old basis. Restarting can hurt convergence but controls memory. - GMRES works for any matrix (no requirement of symmetry), but without preconditioning it may still converge slowly if the matrix is ill-conditioned. ### Preconditioning of Ax = b Pre-multiply by the inverse of a non-singular matrix $M$ and solve instead: $ M^{-1}Ax = M^{-1}b \quad \text{or} \quad AM^{-1}(Mx) = b $ - improve condition number - converges based on $M^{-1}A$ or $AM^{-1}$ instead of A To maintain symmetry: - one could directly use $M^{-1}AM^{-T}x' = M^{-1}b$ with $x' = M^{T}x$ Jacobi preconditioning: - Apply jacobi a few steps, usually not efficient Other iterative methods (Gauss-Seidel, SOR, SSOR, etc) - usually better - Incomplete factorization preconditioning gets you 2-5x fewer iterations than straight CG #### Incomplete LU (ILU) - Assume that LU have the same sparsity as A - LU or Cholesky but avoiding fill-in of already null elements in A - this is very cheap if your matrix is sparse - A = "ILU" which is not exactly equal but close enough $ M = \tilde{L} \tilde{U} $ $ LU(0) \to $