SVD and Low-Rank Approximation¶
The singular value decomposition (SVD) factors any $m \times n$ matrix as $A = U \Sigma V^T$, where $U$ and $V$ are orthogonal and $\Sigma$ is diagonal with non-negative entries (the singular values). Truncating to the $k$ largest singular values gives the best rank-$k$ approximation to $A$ in both the Frobenius and spectral norms (Eckart-Young theorem).
This notebook demonstrates:
- Computing the SVD with
np_svd - Inspecting the singular value spectrum
- Building rank-$k$ approximations and measuring reconstruction error
- Visualising approximations with heatmaps
- Data compression interpretation
Key functions: np_svd, np_matmul, np_reshape, np_diag, np_scale,
np_sub, np_norm, np_col, np_row, ax_bar, ax_heatmap.
load("numerics")$
load("ax-plots")$
Building a Data Matrix¶
We construct a $6 \times 4$ matrix that is approximately rank-1: a clean outer product $u v^T$ plus a small amount of Gaussian noise. This models real-world data where a low-dimensional signal is corrupted by noise.
Since np_outer does not exist yet, we form the outer product via
np_matmul with reshaped column and row vectors:
$u v^T = (6 \times 1)(1 \times 4)$.
/* Column vector u (6x1) and row vector v (1x4) */
u : np_reshape(ndarray([1.0, 2.0, 3.0, 4.0, 5.0, 6.0]), [6, 1])$
v : np_reshape(ndarray([1.0, 0.5, 0.3, 0.1]), [1, 4])$
/* Data = outer product + small noise */
A : np_add(np_matmul(u, v), np_scale(0.1, np_randn([6, 4])))$
print("Data matrix A (6x4):")$
np_to_matrix(A);
Data matrix A (6x4):
Computing the SVD¶
The SVD decomposes $A = U \Sigma V^T$ where:
- $U$ is $6 \times 6$ (left singular vectors)
- $\Sigma$ is returned as a 1D vector of singular values $\sigma_1 \ge \sigma_2 \ge \cdots \ge 0$
- $V^T$ is $4 \times 4$ (right singular vectors, transposed)
Since our matrix is approximately rank-1, we expect one large singular value and the rest near zero.
/* SVD decomposition */
[U, S, Vt] : np_svd(A)$
print("U shape:", np_shape(U))$
print("S (singular values):", np_to_list(S))$
print("Vt shape:", np_shape(Vt))$
U shape: [6,4]
S (singular values):
[11.019792611542938,0.32342622799894927,
0.19282611598483843,0.13168198012160148]
Vt shape: [4,4]
Singular Value Spectrum¶
The singular values reveal the effective rank of the matrix. Rapid decay means the matrix is well-approximated by a low-rank truncation. Here we expect $\sigma_1 \gg \sigma_2 \approx \sigma_3 \approx \sigma_4 \approx 0$ since the true signal is rank-1.
/* Bar chart of singular values */
ax_draw2d(
ax_bar(["s1", "s2", "s3", "s4"], np_to_list(S)),
title="Singular Value Spectrum",
ylabel="Singular value"
)$
The first singular value dominates -- the remaining values are small (noise level). This confirms that a rank-1 approximation should capture most of the structure in $A$.
Rank-1 Approximation¶
The best rank-1 approximation is $A_1 = \sigma_1 \, u_1 \, v_1^T$, where $u_1$ and $v_1^T$ are the first left and right singular vectors.
We extract these from the SVD factors, form the outer product, and measure the relative reconstruction error $\|A - A_1\| / \|A\|$.
/* Extract first singular triplet */
u1 : np_reshape(np_col(U, 0), [6, 1])$
v1 : np_reshape(np_row(Vt, 0), [1, 4])$
s1 : np_ref(S, 0)$
/* Rank-1 reconstruction: A1 = s1 * u1 * v1^T */
A1 : np_scale(s1, np_matmul(u1, v1))$
print("Rank-1 approximation A1:")$
np_to_matrix(A1);
/* Relative error */
err1 : np_norm(np_sub(A, A1)) / np_norm(A)$
print("Relative error ||A - A1|| / ||A|| =", err1)$
Rank-1 approximation A1:
matrix([1.0323508868061055,0.5041958462191255,0.29861370897403416,
0.08985057383990286],
[1.9768638014995312,0.965491995014111,0.5718197557117578,
0.17205617705971463],
[2.9743745417964234,1.4526720597038016,0.8603557425631305,
0.2588744416368203],
[4.011911816055098,1.9594009158170318,1.1604696453309429,
0.34917641227864865],
[4.919965492091362,2.402890525263633,1.4231296377949423,
0.4282087886848071],
[6.099055040190595,2.978752918631902,1.7641883879450349,
0.530830749744105])
Relative error ||A - A1|| / ||A|| = 0.03617542022096205
Reconstruction Error vs Rank¶
We now compute truncated SVD approximations for ranks $k = 1, 2, 3, 4$ and track how the relative Frobenius-norm error decreases.
The rank-$k$ approximation is: $$A_k = \sum_{i=1}^{k} \sigma_i \, u_i \, v_i^T = U_k \, \Sigma_k \, V_k^T$$
By the Eckart-Young theorem: $$\|A - A_k\|_F = \sqrt{\sigma_{k+1}^2 + \cdots + \sigma_r^2}$$
/* Compute truncated SVD reconstruction for each rank */
s_list : np_to_list(S)$
ranks : [1, 2, 3, 4]$
errors : []$
norm_A : np_norm(A)$
for k in ranks do block(
[Ak : np_zeros([6, 4])],
/* Accumulate rank-k approximation */
for i : 1 thru k do block(
[ui, vi, si],
ui : np_reshape(np_col(U, i-1), [6, 1]),
vi : np_reshape(np_row(Vt, i-1), [1, 4]),
si : s_list[i],
Ak : np_add(Ak, np_scale(si, np_matmul(ui, vi)))
),
errors : append(errors, [np_norm(np_sub(A, Ak)) / norm_A])
)$
print("Relative reconstruction error by rank:")$
for i : 1 thru length(ranks) do
print(sconcat(" rank ", ranks[i], ": ", errors[i]))$
Relative reconstruction error by rank: rank 1: 0.03617542022096205 rank 2: 0.0211752393642924 rank 3: 0.011941766330324058 rank 4: 3.125000824323487e-16
/* Plot error vs rank */
error_data : makelist([ranks[i], errors[i]], i, 1, length(ranks))$
ax_draw2d(
color=blue, marker_size=8,
points(error_data),
color=blue,
lines(error_data),
title="Reconstruction Error vs Rank",
xlabel="Rank k",
ylabel="Relative error ||A - Ak|| / ||A||",
grid=true
)$
The error drops sharply from rank 0 to rank 1 (capturing the dominant signal), then decreases gradually as additional components mop up the noise. At full rank ($k = 4$) the reconstruction is exact (error $\approx 0$ to machine precision).
Visualising with Heatmaps¶
Heatmaps provide an intuitive view of matrix structure. We compare the original matrix with its rank-1 and rank-2 approximations to see how successive SVD components refine the reconstruction.
/* Original matrix A */
ax_draw2d(
ax_heatmap(np_to_matrix(A)),
title="Original Matrix A"
)$
/* Rank-1 approximation */
ax_draw2d(
ax_heatmap(np_to_matrix(A1)),
title="Rank-1 Approximation"
)$
/* Rank-2 approximation */
u2 : np_reshape(np_col(U, 1), [6, 1])$
v2 : np_reshape(np_row(Vt, 1), [1, 4])$
s2 : np_ref(S, 1)$
A2 : np_add(A1, np_scale(s2, np_matmul(u2, v2)))$
ax_draw2d(
ax_heatmap(np_to_matrix(A2)),
title="Rank-2 Approximation"
)$
The rank-1 heatmap already captures the overall gradient pattern of the original. The rank-2 approximation adds finer detail. The difference between rank-2 and the original is barely visible -- it consists only of noise.
Application: Data Compression¶
Low-rank approximation is a form of lossy compression. For an $m \times n$ matrix:
| Representation | Storage (values) |
|---|---|
| Full matrix | $m \cdot n$ |
| Rank-$k$ SVD | $k \cdot (m + n + 1)$ |
The rank-$k$ form stores $k$ singular values, $k$ columns of $U$, and $k$ rows of $V^T$. When $k \ll \min(m, n)$, this is much smaller than the original.
/* Compression ratio for our 6x4 example */
m : 6$ n : 4$
full_storage : m * n$
print("Full storage:", full_storage, "values")$
print("")$
for k in [1, 2, 3] do block(
[rank_storage, ratio],
rank_storage : k * (m + n + 1),
ratio : float(full_storage) / rank_storage,
print(sconcat(" Rank-", k, " storage: ", rank_storage,
" values (compression ratio: ",
ratio, "x, error: ", errors[k], ")"))
)$
Full storage: 24 values Rank-1 storage: 11 values (compression ratio: 2.1818181818181817x, error: 0.03617542022096205) Rank-2 storage: 22 values (compression ratio: 1.0909090909090908x, error: 0.0211752393642924) Rank-3 storage: 33 values (compression ratio: 0.7272727272727273x, error: 0.011941766330324058)
For this small example the savings are modest, but the principle scales dramatically. For a $1000 \times 1000$ matrix, a rank-50 approximation stores $50 \times 2001 = 100{,}050$ values instead of $1{,}000{,}000$ -- a 10$\times$ compression -- while often retaining $> 99\%$ of the signal.
This idea underpins image compression (keeping dominant SVD modes of pixel blocks), recommender systems (low-rank user-item matrices), and dimensionality reduction (PCA is SVD of the centred data matrix).
Summary:
np_svdcomputes the full SVD via LAPACK- The singular value spectrum reveals effective rank
- Truncating to rank $k$ gives the optimal low-rank approximation (Eckart-Young)
- Reconstruction error is fully determined by the discarded singular values
- Low-rank structure enables significant data compression