Top-k eigenvalues¶
The two algorithms in this library for computing the top \(k\) eigenpairs of a symmetric operator: power iteration with deflation and Lanczos. They share the same EigenResult return type but differ in how they search the spectrum.
Power iteration with deflation¶
The basic recurrence:
Starting from a random vector, the iterates converge to the eigenvector of largest absolute eigenvalue. The convergence rate is \(|\lambda_2 / \lambda_1|^t\) — fast if there's a clear gap between the top two eigenvalues, slow otherwise.
For top \(k\), we deflate: after finding \((\lambda_1, v_1)\), we replace \(H\) with \(H - \lambda_1 v_1 v_1^\top\) and run power iteration again to find the next eigenpair, and so on.
from hessian_eigenthings.algorithms import deflated_power_iteration
result = deflated_power_iteration(
operator,
k=5,
max_iter=100,
tol=1e-4,
momentum=0.0, # optional Polyak momentum (De Sa et al. 2017)
seed=0,
)
print(result.eigenvalues) # (5,)
print(result.eigenvectors) # (5, n)
print(result.residuals) # (5,) ||H v - λ v||
Stopping uses a combined criterion: relative change in the Rayleigh estimate \(|\lambda_t - \lambda_{t-1}| / |\lambda_t|\) AND residual \(\|Hv - \lambda v\| / |\lambda|\) both below tolerance.
Lanczos¶
Lanczos builds a Krylov subspace \(\mathrm{span}(v, Hv, H^2 v, \ldots, H^{m-1} v)\) in tridiagonal form via a three-term recurrence:
with \(\alpha_j = v_j^\top H v_j\) and \(\beta_j = \|w_j\|\). After \(m\) steps, the tridiagonal matrix \(T_m\) has the same top eigenvalues as \(H\) (to within Krylov-projection error). We solve \(T_m\)'s eigenproblem directly with torch.linalg.eigh — cheap because \(T_m\) is \(m \times m\) with \(m \ll n\).
from hessian_eigenthings.algorithms import lanczos
result = lanczos(
operator,
k=5,
max_iter=20, # Krylov dimension; defaults to min(2k, n-1)
tol=1e-4,
reorthogonalize=None, # auto: True for max_iter <= 50
which="LM", # 'LM' largest-magnitude, 'LA'/'SA' largest/smallest algebraic
seed=0,
)
Reorthogonalization¶
In exact arithmetic the Lanczos basis is orthogonal. In floating point it isn't, and the loss of orthogonality produces ghost eigenvalues (Paige 1976) — duplicated copies of converged eigenvalues that pollute the result.
By default we run full Gram-Schmidt reorthogonalization for max_iter <= 50 and disable it for larger Krylov dimensions where the cost (\(O(m^2 n)\)) becomes significant. If you're analyzing a Hessian with many near-degenerate eigenvalues — common for transformers — keep reorthogonalization on regardless.
Ritz residuals¶
Lanczos reports residuals as \(|\beta_m \cdot s_{m,i}|\) where \(s_{m,i}\) is the last component of \(T_m\)'s \(i\)-th eigenvector. This is the standard cheap estimate of \(\|H v_i - \lambda_i v_i\|\) without an extra matvec per Ritz pair.
Choosing between the two¶
- Lanczos is almost always preferable for top \(k\) on a symmetric operator. It converges much faster than power iteration on the outer eigenvalues.
- Power iteration with deflation is mostly useful as a baseline, or when you can't store a Krylov basis (very large \(n\), no offload). v1.3 will provide block Lanczos and CPU-offload-of-basis to extend Lanczos's reach.
- For smallest-magnitude eigenvalues, raw Lanczos converges slowly. Future versions will add shift-invert. For now,
which="SA"works but expects more Krylov steps to converge.
Limitations¶
- Lanczos with reorthogonalization holds the entire Krylov basis in memory. For a model with \(n\) parameters and \(m\) Lanczos steps, that's \(O(mn)\) — the basis-storage problem at LLM scale.
- Both algorithms are sensitive to the ordering of near-degenerate eigenvalues and may return them in slightly different orders across runs.
- The
seedargument controls the initial random vector; results are reproducible given a fixed seed and the same random number generator on the same device.
References¶
- Lanczos, C. (1950). An iteration method for the solution of the eigenvalue problem of linear differential and integral operators.
- Paige, C. C. (1976). Error analysis of the Lanczos algorithm for tridiagonalizing a symmetric matrix.
- De Sa, He, Mitliagkas, Ré, Xu (2017). Accelerated Stochastic Power Iteration. arXiv:1707.02670.