Correlation Heatmap¶
Correlation measures the linear relationship between pairs of variables. The Pearson correlation coefficient $r_{xy}$ ranges from $-1$ (perfect anti-correlation) through $0$ (no linear relationship) to $+1$ (perfect positive correlation):
$$r_{xy} = \frac{\text{cov}(x, y)}{\sigma_x \, \sigma_y}$$
A correlation matrix collects all pairwise correlations into one table,
and a heatmap makes the structure immediately visible. In this notebook we
generate synthetic data with a known correlation structure, compute the
correlation matrix with np_corrcoef, and visualize it with ax_heatmap.
load("numerics")$
load("ax-plots")$
Generating Correlated Data¶
We build four variables from three independent standard-normal sources $z_1, z_2, z_3$. By constructing linear combinations we control the correlation structure:
| Variable | Construction | Expected correlations |
|---|---|---|
| $x_1$ | $z_1$ | baseline |
| $x_2$ | $0.8\,z_1 + 0.6\,z_2$ | positively correlated with $x_1$ |
| $x_3$ | $z_3$ | independent of $x_1$ and $x_2$ |
| $x_4$ | $-0.7\,z_1 + 0.714\,z_3$ | anti-correlated with $x_1$ |
n : 200$
/* Independent standard-normal sources */
z1 : np_randn([n])$
z2 : np_randn([n])$
z3 : np_randn([n])$
/* Build variables with known correlation structure */
x1 : z1$
x2 : np_add(np_scale(0.8, z1), np_scale(0.6, z2))$
x3 : z3$
x4 : np_add(np_scale(-0.7, z1), np_scale(0.714, z3))$
print("Generated", n, "observations of 4 variables")$
Generated 200 observations of 4 variables
/* Stack the four 1D vectors into an n-by-4 matrix (each column is a variable) */
data : np_hstack(
np_hstack(
np_reshape(x1, [n, 1]),
np_reshape(x2, [n, 1])
),
np_hstack(
np_reshape(x3, [n, 1]),
np_reshape(x4, [n, 1])
)
)$
print("Data shape:", np_shape(data))$
/* Compute the correlation matrix */
R : np_corrcoef(data)$
print("Correlation matrix:")$
np_to_matrix(R);
Data shape: [200,4] Correlation matrix:
Visualizing the Correlation Matrix¶
A heatmap with a diverging color scale (red-white-blue) makes positive and negative correlations easy to spot at a glance. The diagonal is always $1$ (each variable is perfectly correlated with itself).
ax_draw2d(
ax_heatmap(
np_to_matrix(R),
["x1", "x2", "x3", "x4"],
["x1", "x2", "x3", "x4"]
),
colorscale="RdBu",
title="Pearson Correlation Matrix"
)$
Interpreting Correlations¶
The heatmap should show:
- x1 vs x2: strong positive correlation ($r \approx 0.8$) -- by construction $x_2$ shares $80\%$ of $z_1$
- x1 vs x4: strong negative correlation ($r \approx -0.7$) -- $x_4$ has $-0.7\,z_1$
- x1 vs x3: near zero -- $x_3 = z_3$ is independent of $z_1$
- x2 vs x4: moderate negative correlation -- both share $z_1$ but with opposite signs
Let us verify these relationships with scatter plots of selected pairs.
/* Most correlated pair: x1 vs x2 */
ax_draw2d(
marker_size=4, opacity=0.6,
points(x1, x2),
title="x1 vs x2 (positive correlation)",
xlabel="x1", ylabel="x2",
aspect_ratio=true
)$
/* Anti-correlated pair: x1 vs x4 */
ax_draw2d(
color="red", marker_size=4, opacity=0.6,
points(x1, x4),
title="x1 vs x4 (negative correlation)",
xlabel="x1", ylabel="x4",
aspect_ratio=true
)$
/* Uncorrelated pair: x1 vs x3 */
ax_draw2d(
color="green", marker_size=4, opacity=0.6,
points(x1, x3),
title="x1 vs x3 (uncorrelated)",
xlabel="x1", ylabel="x3",
aspect_ratio=true
)$
Covariance vs Correlation¶
The covariance matrix $\Sigma$ from np_cov contains the raw
covariances $\text{cov}(x_i, x_j)$, which depend on the scale of each
variable. The correlation matrix $R$ from np_corrcoef normalizes
each entry by the product of standard deviations:
$$R_{ij} = \frac{\Sigma_{ij}}{\sqrt{\Sigma_{ii} \, \Sigma_{jj}}}$$
This normalization maps all values to $[-1, 1]$, making them comparable across variables with different units or scales.
/* Covariance matrix -- entries depend on variable scales */
print("Covariance matrix:")$
np_to_matrix(np_cov(data));
print("Correlation matrix (normalized to [-1, 1]):")$
np_to_matrix(np_corrcoef(data));
Covariance matrix:
matrix([1.0980767688991973,0.9228626572630175,-0.023322556823519025,
-0.7853060438014303],
[0.9228626572630175,1.1342293018123484,-0.057796496231947714,
-0.6872705583937225],
[-0.023322556823519025,-0.057796496231947714,1.045244577214172,
0.762630417907382],
[-0.7853060438014303,-0.6872705583937225,0.762630417907382,
1.094232349046872])
Correlation matrix (normalized to [-1, 1]):
Summary¶
Correlation heatmaps reveal the pairwise linear relationships between variables at a glance:
np_corrcoefcomputes the full Pearson correlation matrix from an observation matrix (rows = observations, columns = variables).ax_heatmapwith a diverging color scale ("RdBu") makes positive and negative correlations visually distinct.- Scatter plots confirm what the heatmap shows: elongated clouds for correlated pairs, circular clouds for independent ones.
np_covgives the raw covariances;np_corrcoefnormalizes them to the dimensionless $[-1, 1]$ range.