Correlation Heatmap¶

Correlation measures the linear relationship between pairs of variables. The Pearson correlation coefficient $r_{xy}$ ranges from $-1$ (perfect anti-correlation) through $0$ (no linear relationship) to $+1$ (perfect positive correlation):

$$r_{xy} = \frac{\text{cov}(x, y)}{\sigma_x \, \sigma_y}$$

A correlation matrix collects all pairwise correlations into one table, and a heatmap makes the structure immediately visible. In this notebook we generate synthetic data with a known correlation structure, compute the correlation matrix with np_corrcoef, and visualize it with ax_heatmap.

In [1]:
load("numerics")$
load("ax-plots")$

Generating Correlated Data¶

We build four variables from three independent standard-normal sources $z_1, z_2, z_3$. By constructing linear combinations we control the correlation structure:

Variable Construction Expected correlations
$x_1$ $z_1$ baseline
$x_2$ $0.8\,z_1 + 0.6\,z_2$ positively correlated with $x_1$
$x_3$ $z_3$ independent of $x_1$ and $x_2$
$x_4$ $-0.7\,z_1 + 0.714\,z_3$ anti-correlated with $x_1$
In [2]:
n : 200$

/* Independent standard-normal sources */
z1 : np_randn([n])$
z2 : np_randn([n])$
z3 : np_randn([n])$

/* Build variables with known correlation structure */
x1 : z1$
x2 : np_add(np_scale(0.8, z1), np_scale(0.6, z2))$
x3 : z3$
x4 : np_add(np_scale(-0.7, z1), np_scale(0.714, z3))$

print("Generated", n, "observations of 4 variables")$
Generated 200 observations of 4 variables
In [3]:
/* Stack the four 1D vectors into an n-by-4 matrix (each column is a variable) */
data : np_hstack(
  np_hstack(
    np_reshape(x1, [n, 1]),
    np_reshape(x2, [n, 1])
  ),
  np_hstack(
    np_reshape(x3, [n, 1]),
    np_reshape(x4, [n, 1])
  )
)$
print("Data shape:", np_shape(data))$

/* Compute the correlation matrix */
R : np_corrcoef(data)$
print("Correlation matrix:")$
np_to_matrix(R);
Data shape: [200,4]
Correlation matrix:
Out[3]:

Visualizing the Correlation Matrix¶

A heatmap with a diverging color scale (red-white-blue) makes positive and negative correlations easy to spot at a glance. The diagonal is always $1$ (each variable is perfectly correlated with itself).

In [4]:
ax_draw2d(
  ax_heatmap(
    np_to_matrix(R),
    ["x1", "x2", "x3", "x4"],
    ["x1", "x2", "x3", "x4"]
  ),
  colorscale="RdBu",
  title="Pearson Correlation Matrix"
)$
No description has been provided for this image

Interpreting Correlations¶

The heatmap should show:

  • x1 vs x2: strong positive correlation ($r \approx 0.8$) -- by construction $x_2$ shares $80\%$ of $z_1$
  • x1 vs x4: strong negative correlation ($r \approx -0.7$) -- $x_4$ has $-0.7\,z_1$
  • x1 vs x3: near zero -- $x_3 = z_3$ is independent of $z_1$
  • x2 vs x4: moderate negative correlation -- both share $z_1$ but with opposite signs

Let us verify these relationships with scatter plots of selected pairs.

In [5]:
/* Most correlated pair: x1 vs x2 */
ax_draw2d(
  marker_size=4, opacity=0.6,
  points(x1, x2),
  title="x1 vs x2 (positive correlation)",
  xlabel="x1", ylabel="x2",
  aspect_ratio=true
)$
No description has been provided for this image
In [6]:
/* Anti-correlated pair: x1 vs x4 */
ax_draw2d(
  color="red", marker_size=4, opacity=0.6,
  points(x1, x4),
  title="x1 vs x4 (negative correlation)",
  xlabel="x1", ylabel="x4",
  aspect_ratio=true
)$
No description has been provided for this image
In [7]:
/* Uncorrelated pair: x1 vs x3 */
ax_draw2d(
  color="green", marker_size=4, opacity=0.6,
  points(x1, x3),
  title="x1 vs x3 (uncorrelated)",
  xlabel="x1", ylabel="x3",
  aspect_ratio=true
)$
No description has been provided for this image

Covariance vs Correlation¶

The covariance matrix $\Sigma$ from np_cov contains the raw covariances $\text{cov}(x_i, x_j)$, which depend on the scale of each variable. The correlation matrix $R$ from np_corrcoef normalizes each entry by the product of standard deviations:

$$R_{ij} = \frac{\Sigma_{ij}}{\sqrt{\Sigma_{ii} \, \Sigma_{jj}}}$$

This normalization maps all values to $[-1, 1]$, making them comparable across variables with different units or scales.

In [8]:
/* Covariance matrix -- entries depend on variable scales */
print("Covariance matrix:")$
np_to_matrix(np_cov(data));

print("Correlation matrix (normalized to [-1, 1]):")$
np_to_matrix(np_corrcoef(data));
Covariance matrix:
matrix([1.0980767688991973,0.9228626572630175,-0.023322556823519025,
        -0.7853060438014303],
       [0.9228626572630175,1.1342293018123484,-0.057796496231947714,
        -0.6872705583937225],
       [-0.023322556823519025,-0.057796496231947714,1.045244577214172,
        0.762630417907382],
       [-0.7853060438014303,-0.6872705583937225,0.762630417907382,
        1.094232349046872])
Correlation matrix (normalized to [-1, 1]):
Out[8]:

Summary¶

Correlation heatmaps reveal the pairwise linear relationships between variables at a glance:

  • np_corrcoef computes the full Pearson correlation matrix from an observation matrix (rows = observations, columns = variables).
  • ax_heatmap with a diverging color scale ("RdBu") makes positive and negative correlations visually distinct.
  • Scatter plots confirm what the heatmap shows: elongated clouds for correlated pairs, circular clouds for independent ones.
  • np_cov gives the raw covariances; np_corrcoef normalizes them to the dimensionless $[-1, 1]$ range.