This repository holds materials the second 6-unit "spoke" half of a new MIT course (Spring 2026) introducing numerical methods and numerical analysis to a broad audience. 18.C21B/16.C21B covers large-scale linear algebra: what do you do when the matrices get so huge that you probably can't even store them as
- Prerequisites: 18.03, 18.06, or equivalents, and some programming experience. You should have taken the first half-semester numerical "hub" 18.C21/16.C21, or alternatively any other introductory numerical-methods course (e.g. 18.330, 18.C25, 18.C20, 16.90, 2.086, 12.010, 6.7330J/16.920J/2.097J, or 6.S955).
Taking both the hub and any spoke will count as an 18.3xx class for math majors, similar to 18.330, and as 16.90 for course-16 majors.
Instructor: Prof. Steven G. Johnson.
Lectures: MWF10 in 2-142 (Mar 30 – May 11, slides and notes posted below.
Grading (all assignments submitted electronically via Gradescope on Canvas):
- 30% 4 weekly psets: due Wednesdays at midnight: April 8, 15, 22, and 29, 10% pset check-ins (oral check-ins on selected psets where they have to explain their work; pass/fail per problem).
- 30% final project: due May 11 (last day of class). The final project will be an 8–15 page paper reviewing, implementing, and testing some interesting numerical linear-algebra algorithm not covered in the course. A 1-page proposal will be due April 17 (but you are encouraged to submit earlier). See final-project/proposal information. 20% final project presentation in-class (last week).
- 10% attendance
Collaboration policy: Talk to anyone you want to and read anything you want to, with two caveats: First, make a solid effort to solve a problem on your own before discussing it with classmates or googling. Second, no matter whom you talk to or what you read, write up the solution on your own, without having their answer in front of you (this includes ChatGPT and similar). (You can use psetpartners.mit.edu to find problem-set partners.)
Teaching Assistant: Rodrigo Arietta Candia
Office Hours: Wednesday 3pm in 2-345 (Prof. Johnson)
Resources: Piazza discussion forum, math learning center, pset partners.
Textbook: No required textbook, but suggestions for further reading will be posted after each lecture. The book Fundamentals of Numerical Computation (FNC) by Driscoll and Braun is freely available online, has examples in Julia, Python, and Matlab, and is a valuable resource. Another important textbook for the course is Numerical Linear Algebra by Trefethen and Bau. (Readable online with MIT certificates, and there is also a PDF posted online at uchicago, though this is a graduate-level textbook and hence is somewhat more sophisticated than the coverage in this course.)
This document is a brief summary of what was covered in each lecture, along with links and suggestions for further reading. It is not a good substitute for attending lecture, but may provide a useful study guide.
- Overview and syllabus (this web page).
- Julia notebook with scaling examples
Reviewed the fact that traditional "dense" linear-algebra algorthms (factorizations LU, QR, diagonalization, SVD, etc.), which assume little or no special structure of the matrix, typically require
However, huge matrices (
The trick is that huge matrices typically have some special structure that you can exploit, and the most common such structure is sparsity: the matrix entries are mostly zero. Ideally, an
Showed how a sparse matrix, in fact a symmetric tridiagonal matrix arises from discretizing a simple PDE on a grid with finite differences: Poisson's equation
How can we generalize this to other sparsity patterns and other types of large-scale matrix structures?
Further reading: FNC book section 8.1: sparsity and structure. The example of discretizing a 1d Poisson equation with finite differences, resulting in a tridiagonal matrix, is discussed in many sources, e.g. these UTexas course notes.
- Sparse matrices and data structures
- sparse-matrix slides from 18.335 (Fall 2006)
- Julia notebook on dense and sparse matrices from 18.06 (Fall 2022)
- pset 1: due Wed, Apr 8 at midnight
Further reading: Sparse matrices in CSC format, along with sparse-direct algorithms, are provided in Julia by the SparseArrays standard library, and in Python by scipy.sparse, along with additional packages that provide other algorithms and data structures; for example, a famous C library for this is PETSc, which also has Python and Julia interfaces, and there are many sparse-direct libraries such as MUMPS and Pardiso. Sparse-direct solvers are described in detail by the book Direct Methods for Sparse Linear Systems by Davis; the corresponding software library is Davis's SuiteSparse, which is used by SparseArrays in Julia and is available in Python via scikit-sparse. We will soon start talking about iterative methods; more advanced treatments include the book Numerical Linear Algebra by Trefethen and Bao, and surveys of algorithms can be found in the Templates books for Ax=b and Ax=λx. Some crude rules of thumb for solving linear systems (from 18.335 spring 2020) may be useful.
Iterative methods: the big picture is to solve
-
Pro: can be fast whenever
$Av$ is fast (e.g. if$A$ is sparse, low-rank, Toeplitz, etc.). Can scale to huge probems. - Con: hard to design an iteration that converges quickly, how well the methods work is often problem-dependent, often requires problem-depending tuning and experimentation (e.g. preconditioners)
The simplest iterative method is the power method for eigenvalues: repeatedly multipling a random initial vector by
Analyzed the convergence of the power method: if
Given an estimate
To find other eigenvectors and eigenvalues, one possibility is an algorithm called deflation. It exploits the fact that for real-symmetric
Deflation is a terrible scheme if you want the smallest magnitude eigenvalue, however, since you'd have to compute all the other eigenvalues/vectors first. Instead, to find the smallest |λ| one can simply apply the power method to
Further reading: FNC book section 8.2: power iteration and section 8.3: inverse iteration. Trefethen & Bau, lecture 27. See this notebook on the power method from 18.06.
- Handwritten notes from spring 2025, page 29+
If you want to find the smallest |λ| instead of the biggest, you can simply apply the power method to
Proved that, for a real-symmetric (Hermitian) matrix A=Aᵀ, the Rayleigh quotient R(x)=xᵀAx/xᵀx is bounded above and below by the largest and smallest eigenvalues of A (the "min–max theorem"). This theorem is useful for lots of things in linear algebra. Here, it helps us understand why the Rayleigh quotient is so accurate: the power method converges to a maximum-|λ| eigenvalue, which is either the smallest (most negative) or the largest (most positive) λ of a real-symmetric A, and hence that λ is an extremum (minimum or maximum) of the Rayleigh quotient where its gradient is zero. In fact, you can show that ∇R=0 for any eigenvector (not necessarily min/max λ). This means, if we Taylor expand R(x+δx) around an eigenvector x where R(x)=λ, you get R(x+δx)=λ+O(‖δx‖^2): the error in the eigenvalue λ goes as the square of the error in the eigenvector (for real-symmetric A).
Above, we considered inverse iteration. A more general idea is shifted inverse iteration: at each step, compute
But where would you get a good guess for λ? A simple answer is to use the Rayleigh quotient R(x), where x comes from previous steps of the power iteration. Even if the power iteration is converging slowly, once you have even a rough approximation for λ you can use it as a shift. This leads to the algorithm of Rayleigh-quotient iteration: at each step, compute
The big problem with Rayleigh-quotient iteration, like Newton's method, is the need for a good initial guess — if you have a bad initial guess, it can be quite unpredictable what eigenvalue it converges to! But any time you can obtain a rough idea of where the desired eigenvalue is, it means you can zoom into the exact value extremely quickly.
Further reading: FNC book section 8.3: inverse iteration; however, beware that the book currently shows a less accurate (for real-symmetric/Hermitian A) method to estimate eigenvalues (issue fnc#16). Trefethen & Bau, lecture 27 covers these algorithms in much more depth. These slides by Per Persson (2006) are a useful summary.
- Handwritten notes from spring 2025, page 35+
- pset 1 solutions: coming soon
- pset 2: due April 15
To find other eigenvectors and eigenvalues of a Hermitian problem, one possibility is an algorithm called deflation. It exploits the fact that for real-symmetric
Introduced Krylov subspaces, and the idea of Krylov subspace methods: ideally, we want to find the "best" solution in the whole subspace 𝒦ₙ spanned by {x₀,Ax₀,...,Aⁿ⁻¹x₀}, which is the only subspace you can get starting from x₀ if you are only allowed linear operations and matrix–vector products.
The Arnoldi algorithm is a Krylov algorithm for eigenproblems. It basically has two components:
- Find an orthonormal basis Qₙ for 𝒦ₙ. Essentially, we will to this by a form of Gram–Schmidt, to be determined.
- Given the basis, give the "best" estimate in 𝒦ₙ for one or more eigenvectors and eigenvalues.
How do we construct the orthonormal basis
Further reading: FNC book, section 8.4 on Krylov subspaces and Arnoldi. Trefethen lecture 33 on Arnoldi. This 2009 course on numerical linear algebra by Zhaojun Bai has useful notes on Krylov methods, including a discussion of the Rayleigh–Ritz procedure.
Discussed what it means to find the "best" solution in the Krylov subspace 𝒦ₙ. Discussed the general principle of Rayleigh–Ritz methods for approximately solving the eigenproblem in a subspace: finding the Ritz vectors/values (= eigenvector/value approximations) with a residual perpendicular to the subspace (a special case of a Galerkin method).
For Hermitian matrices A, I showed that the max/min Ritz values are the maximum/minimum of the Rayleigh quotient in the subspace, via the min–max theorem. In this sense, at least for Hermitian matrices, the Ritz vectors are optimal in the sense of maximizing (or minimizing) the Rayleigh quotient in the Krylov space. Another sense in which they are optimal for Hermitian A is that the Ritz vectors/values minimize ‖AV-VD‖₂ over all possible orthonormal bases V of the Krylov space and all possible diagonal matrices D (see the Bai notes below for a proof). (Later, we will discuss an "optimal polynomial" interpretation of Arnoldi+Rayleigh–Ritz from Trefethen lecture 34.)
Moreover, showed that the dot products taken during the Gram–Schmidt process are exactly the entries of our Rayleigh–Ritz matrix
Showed that in the case where A is Hermitian, Hₙ is Hermitian as well, so Hₙ is tridiagonal and most of the dot products in the Arnoldi process are zero. Hence Arnoldi reduces to a three-term recurrence, and the Ritz matrix is tridiagonal. This is called the Lanczos algorithm.
Further reading: for Gram–Schmidt, see e.g. Strang Intro to Linear Algebra, chapter 4, and Strang 18.06 lecture 17. Modified Gram–Schmidt is analyzed in Trefethen lecture 8, and a detailed analysis with proofs can be found in e.g. this 2006 paper by Paige et al. [SIAM J. Matrix Anal. Appl. 28, pp. 264-284]. See also Per Persson's 18.335 slides on Gram–Schmidt. See also the links on Arnoldi from last lecture.
Reviewed some experimental results with a very simple implementation of the Arnoldi algorithm (see notebook above). Arnoldi indeed converges much faster than power iterations, and can give multiple eigenvalues at once. Like the power method convergence is slower if the desired eigenvalues are clustered closely with undesired ones. Unlike the power method, it can converge not just to the largest |λ| but to any desired "edge" of the set of eigenvalues (the "spectrum"), e.g. the λ with the most positive or most negative real parts. Unlike the power method, the convergence of the Arnoldi algorithm is shift-invariant: it is the same for
Discussed how rounding problems cause a loss of orthogonality in Lanczos, leading to "ghost" eigenvalues where extremal eigenvalues re-appear. In Arnoldi, we explicitly store and orthogonalize against all
A solution to the loss of orthogonality in Lanczos and the growing computational effort in Arnoldi, along with the growing storage, is restarting schemes, where we go for n steps and then restart with the k "best" eigenvectors. If we restart with k=1 every step, then we essentially have the power method, so while restarting makes the convergence worse, the algorithm still converges (at least for the largest |λ| eigenvalues), and converges significantly faster than the power method for n>1.
Further reading: Trefethen, lecture 36. See the section on implicitly restarted Lanczos in Templates for the Solution of Algebraic Eigenvalue Problems. Restarting schemes for Arnoldi (and Lanczos) turn out to be rather subtle — you first need to understand why the most obvious idea (taking the