SVD and Low-Rank Approximation¶

The singular value decomposition (SVD) factors any $m \times n$ matrix as $A = U \Sigma V^T$, where $U$ and $V$ are orthogonal and $\Sigma$ is diagonal with non-negative entries (the singular values). Truncating to the $k$ largest singular values gives the best rank-$k$ approximation to $A$ in both the Frobenius and spectral norms (Eckart-Young theorem).

This notebook demonstrates:

  1. Computing the SVD with np_svd
  2. Inspecting the singular value spectrum
  3. Building rank-$k$ approximations and measuring reconstruction error
  4. Visualising approximations with heatmaps
  5. Data compression interpretation

Key functions: np_svd, np_matmul, np_reshape, np_diag, np_scale, np_sub, np_norm, np_col, np_row, ax_bar, ax_heatmap.

In [89]:
load("numerics")$
load("ax-plots")$

Building a Data Matrix¶

We construct a $6 \times 4$ matrix that is approximately rank-1: a clean outer product $u v^T$ plus a small amount of Gaussian noise. This models real-world data where a low-dimensional signal is corrupted by noise.

Since np_outer does not exist yet, we form the outer product via np_matmul with reshaped column and row vectors: $u v^T = (6 \times 1)(1 \times 4)$.

In [90]:
/* Column vector u (6x1) and row vector v (1x4) */
u : np_reshape(ndarray([1.0, 2.0, 3.0, 4.0, 5.0, 6.0]), [6, 1])$
v : np_reshape(ndarray([1.0, 0.5, 0.3, 0.1]), [1, 4])$

/* Data = outer product + small noise */
A : np_add(np_matmul(u, v), np_scale(0.1, np_randn([6, 4])))$

print("Data matrix A (6x4):")$
np_to_matrix(A);
Data matrix A (6x4):
Out[90]:

Computing the SVD¶

The SVD decomposes $A = U \Sigma V^T$ where:

  • $U$ is $6 \times 6$ (left singular vectors)
  • $\Sigma$ is returned as a 1D vector of singular values $\sigma_1 \ge \sigma_2 \ge \cdots \ge 0$
  • $V^T$ is $4 \times 4$ (right singular vectors, transposed)

Since our matrix is approximately rank-1, we expect one large singular value and the rest near zero.

In [91]:
/* SVD decomposition */
[U, S, Vt] : np_svd(A)$

print("U shape:", np_shape(U))$
print("S (singular values):", np_to_list(S))$
print("Vt shape:", np_shape(Vt))$
U shape: [6,4]
S (singular values):
                    [11.019792611542938,0.32342622799894927,
                     0.19282611598483843,0.13168198012160148]
Vt shape: [4,4]

Singular Value Spectrum¶

The singular values reveal the effective rank of the matrix. Rapid decay means the matrix is well-approximated by a low-rank truncation. Here we expect $\sigma_1 \gg \sigma_2 \approx \sigma_3 \approx \sigma_4 \approx 0$ since the true signal is rank-1.

In [92]:
/* Bar chart of singular values */
ax_draw2d(
  ax_bar(["s1", "s2", "s3", "s4"], np_to_list(S)),
  title="Singular Value Spectrum",
  ylabel="Singular value"
)$
No description has been provided for this image

The first singular value dominates -- the remaining values are small (noise level). This confirms that a rank-1 approximation should capture most of the structure in $A$.

Rank-1 Approximation¶

The best rank-1 approximation is $A_1 = \sigma_1 \, u_1 \, v_1^T$, where $u_1$ and $v_1^T$ are the first left and right singular vectors.

We extract these from the SVD factors, form the outer product, and measure the relative reconstruction error $\|A - A_1\| / \|A\|$.

In [93]:
/* Extract first singular triplet */
u1 : np_reshape(np_col(U, 0), [6, 1])$
v1 : np_reshape(np_row(Vt, 0), [1, 4])$
s1 : np_ref(S, 0)$

/* Rank-1 reconstruction: A1 = s1 * u1 * v1^T */
A1 : np_scale(s1, np_matmul(u1, v1))$

print("Rank-1 approximation A1:")$
np_to_matrix(A1);

/* Relative error */
err1 : np_norm(np_sub(A, A1)) / np_norm(A)$
print("Relative error ||A - A1|| / ||A|| =", err1)$
Rank-1 approximation A1:
matrix([1.0323508868061055,0.5041958462191255,0.29861370897403416,
        0.08985057383990286],
       [1.9768638014995312,0.965491995014111,0.5718197557117578,
        0.17205617705971463],
       [2.9743745417964234,1.4526720597038016,0.8603557425631305,
        0.2588744416368203],
       [4.011911816055098,1.9594009158170318,1.1604696453309429,
        0.34917641227864865],
       [4.919965492091362,2.402890525263633,1.4231296377949423,
        0.4282087886848071],
       [6.099055040190595,2.978752918631902,1.7641883879450349,
        0.530830749744105])
Relative error ||A - A1|| / ||A|| = 0.03617542022096205

Reconstruction Error vs Rank¶

We now compute truncated SVD approximations for ranks $k = 1, 2, 3, 4$ and track how the relative Frobenius-norm error decreases.

The rank-$k$ approximation is: $$A_k = \sum_{i=1}^{k} \sigma_i \, u_i \, v_i^T = U_k \, \Sigma_k \, V_k^T$$

By the Eckart-Young theorem: $$\|A - A_k\|_F = \sqrt{\sigma_{k+1}^2 + \cdots + \sigma_r^2}$$

In [94]:
/* Compute truncated SVD reconstruction for each rank */
s_list : np_to_list(S)$
ranks : [1, 2, 3, 4]$
errors : []$
norm_A : np_norm(A)$

for k in ranks do block(
  [Ak : np_zeros([6, 4])],
  /* Accumulate rank-k approximation */
  for i : 1 thru k do block(
    [ui, vi, si],
    ui : np_reshape(np_col(U, i-1), [6, 1]),
    vi : np_reshape(np_row(Vt, i-1), [1, 4]),
    si : s_list[i],
    Ak : np_add(Ak, np_scale(si, np_matmul(ui, vi)))
  ),
  errors : append(errors, [np_norm(np_sub(A, Ak)) / norm_A])
)$

print("Relative reconstruction error by rank:")$
for i : 1 thru length(ranks) do
  print(sconcat("  rank ", ranks[i], ": ", errors[i]))$
Relative reconstruction error by rank:
  rank 1: 0.03617542022096205
  rank 2: 0.0211752393642924
  rank 3: 0.011941766330324058
  rank 4: 3.125000824323487e-16
In [95]:
/* Plot error vs rank */
error_data : makelist([ranks[i], errors[i]], i, 1, length(ranks))$

ax_draw2d(
  color=blue, marker_size=8,
  points(error_data),
  color=blue,
  lines(error_data),
  title="Reconstruction Error vs Rank",
  xlabel="Rank k",
  ylabel="Relative error ||A - Ak|| / ||A||",
  grid=true
)$
No description has been provided for this image

The error drops sharply from rank 0 to rank 1 (capturing the dominant signal), then decreases gradually as additional components mop up the noise. At full rank ($k = 4$) the reconstruction is exact (error $\approx 0$ to machine precision).

Visualising with Heatmaps¶

Heatmaps provide an intuitive view of matrix structure. We compare the original matrix with its rank-1 and rank-2 approximations to see how successive SVD components refine the reconstruction.

In [96]:
/* Original matrix A */
ax_draw2d(
  ax_heatmap(np_to_matrix(A)),
  title="Original Matrix A"
)$
No description has been provided for this image
In [97]:
/* Rank-1 approximation */
ax_draw2d(
  ax_heatmap(np_to_matrix(A1)),
  title="Rank-1 Approximation"
)$
No description has been provided for this image
In [98]:
/* Rank-2 approximation */
u2 : np_reshape(np_col(U, 1), [6, 1])$
v2 : np_reshape(np_row(Vt, 1), [1, 4])$
s2 : np_ref(S, 1)$
A2 : np_add(A1, np_scale(s2, np_matmul(u2, v2)))$

ax_draw2d(
  ax_heatmap(np_to_matrix(A2)),
  title="Rank-2 Approximation"
)$
No description has been provided for this image

The rank-1 heatmap already captures the overall gradient pattern of the original. The rank-2 approximation adds finer detail. The difference between rank-2 and the original is barely visible -- it consists only of noise.

Application: Data Compression¶

Low-rank approximation is a form of lossy compression. For an $m \times n$ matrix:

Representation Storage (values)
Full matrix $m \cdot n$
Rank-$k$ SVD $k \cdot (m + n + 1)$

The rank-$k$ form stores $k$ singular values, $k$ columns of $U$, and $k$ rows of $V^T$. When $k \ll \min(m, n)$, this is much smaller than the original.

In [99]:
/* Compression ratio for our 6x4 example */
m : 6$ n : 4$
full_storage : m * n$

print("Full storage:", full_storage, "values")$
print("")$
for k in [1, 2, 3] do block(
  [rank_storage, ratio],
  rank_storage : k * (m + n + 1),
  ratio : float(full_storage) / rank_storage,
  print(sconcat("  Rank-", k, " storage: ", rank_storage,
                " values  (compression ratio: ",
                ratio, "x, error: ", errors[k], ")"))
)$
Full storage: 24 values
  Rank-1 storage: 11 values  (compression ratio: 2.1818181818181817x, error: 0.03617542022096205)
  Rank-2 storage: 22 values  (compression ratio: 1.0909090909090908x, error: 0.0211752393642924)
  Rank-3 storage: 33 values  (compression ratio: 0.7272727272727273x, error: 0.011941766330324058)

For this small example the savings are modest, but the principle scales dramatically. For a $1000 \times 1000$ matrix, a rank-50 approximation stores $50 \times 2001 = 100{,}050$ values instead of $1{,}000{,}000$ -- a 10$\times$ compression -- while often retaining $> 99\%$ of the signal.

This idea underpins image compression (keeping dominant SVD modes of pixel blocks), recommender systems (low-rank user-item matrices), and dimensionality reduction (PCA is SVD of the centred data matrix).

Summary:

  1. np_svd computes the full SVD via LAPACK
  2. The singular value spectrum reveals effective rank
  3. Truncating to rank $k$ gives the optimal low-rank approximation (Eckart-Young)
  4. Reconstruction error is fully determined by the discarded singular values
  5. Low-rank structure enables significant data compression