Monte Carlo Methods¶
Monte Carlo methods use random simulation to estimate quantities that may be difficult or impossible to compute analytically. This notebook demonstrates several classic applications: estimating $\pi$, studying convergence rates, verifying the Central Limit Theorem, computing integrals, and constructing bootstrap confidence intervals.
We use the numerics package for vectorized random number generation
and array operations, and ax-plots for visualization.
load("numerics")$
load("ax-plots")$
Estimating Pi¶
The classic Monte Carlo estimate of $\pi$: scatter $N$ random points uniformly in the unit square $[0,1]^2$ and count how many fall inside the quarter unit circle ($x^2 + y^2 \le 1$). The ratio of inside points to total points approximates $\pi/4$, so
$$\hat{\pi} = 4 \cdot \frac{\text{count inside}}{N}$$
/* Generate N random points in [0,1)^2 */
N : 10000$
pts : np_rand([N, 2])$
x : np_col(pts, 0)$
y : np_col(pts, 1)$
/* Distance from origin for each point */
d : np_sqrt(np_add(np_pow(x, 2), np_pow(y, 2)))$
/* Count points inside the quarter unit circle */
inside_mask : np_less_equal(d, 1.0)$
count_inside : np_sum(inside_mask)$
pi_estimate : 4.0 * count_inside / N$
print("Monte Carlo pi estimate:", pi_estimate)$
print("True pi:", float(%pi))$
print("Error:", abs(pi_estimate - float(%pi)))$
Monte Carlo pi estimate: 3.1704 True pi: 3.141592653589793 Error: 0.02880734641020677
/* Scatter plot: blue = inside circle, red = outside */
x_in : np_to_list(np_extract(inside_mask, x))$
y_in : np_to_list(np_extract(inside_mask, y))$
outside_mask : np_logical_not(inside_mask)$
x_out : np_to_list(np_extract(outside_mask, x))$
y_out : np_to_list(np_extract(outside_mask, y))$
ax_draw2d(
color=blue, name="Inside", marker_size=2,
points(x_in, y_in),
color=red, name="Outside", marker_size=2,
points(x_out, y_out),
title=sconcat("Pi Estimate: ", string(pi_estimate), " (N=", string(N), ")"),
xlabel="x", ylabel="y",
aspect_ratio=true,
grid=true
)$
Convergence of the Pi Estimate¶
As $N$ increases, the Monte Carlo estimate converges to the true value. The standard error decreases as $O(1/\sqrt{N})$, which is the hallmark convergence rate of Monte Carlo methods -- independent of dimension.
/* Run the pi estimation for increasing sample sizes */
sample_sizes : [100, 500, 1000, 5000, 10000, 50000]$
pi_estimates : makelist(
block([pts, xx, yy, dd, cc],
pts : np_rand([sample_sizes[i], 2]),
xx : np_col(pts, 0),
yy : np_col(pts, 1),
dd : np_sqrt(np_add(np_pow(xx, 2), np_pow(yy, 2))),
cc : np_sum(np_less_equal(dd, 1.0)),
4.0 * cc / sample_sizes[i]
),
i, 1, length(sample_sizes)
)$
for i: 1 thru length(sample_sizes) do
print(sconcat("N = ", string(sample_sizes[i]),
" estimate = ", string(pi_estimates[i]),
" error = ", string(abs(pi_estimates[i] - float(%pi)))))$
N = 100 estimate = 3.12 error = 0.02159265358979301 N = 500 estimate = 3.032 error = 0.10959265358979309 N = 1000 estimate = 3.132 error = 0.009592653589792999 N = 5000 estimate = 3.1584 error = 0.016807346410206758 N = 10000 estimate = 3.1516 error = 0.010007346410207063 N = 50000 estimate = 3.1404 error = 0.0011926535897930357
/* Plot convergence: estimate vs N, with true pi as reference */
ax_draw2d(
color=blue, name="MC estimate",
points(sample_sizes, pi_estimates),
color=red, dash="dash", name="true pi",
lines([[first(sample_sizes), float(%pi)],
[last(sample_sizes), float(%pi)]]),
title="Convergence of Pi Estimate",
xlabel="N (sample size)", ylabel="Estimate",
grid=true
)$
Central Limit Theorem¶
The Central Limit Theorem states that the mean of $n$ i.i.d. random variables converges to a normal distribution as $n \to \infty$, regardless of the underlying distribution.
We demonstrate this with the uniform distribution on $[0,1)$: take many samples of size 30, compute the mean of each, and observe that the histogram of sample means is approximately Gaussian -- even though the individual values are uniformly distributed.
/* Generate 2000 samples, each of size 30 */
n_samples : 2000$
sample_size : 30$
/* Each row is one sample of 30 uniform values */
samples : np_rand([n_samples, sample_size])$
/* Compute mean of each row (sample) */
sample_means : np_mean(samples, 1)$
print("Mean of sample means:", np_mean(sample_means))$
print("Std dev of sample means:", np_std(sample_means))$
print("Theoretical std dev: 1/sqrt(12*30) =", float(1/sqrt(12*30)))$
Mean of sample means: 0.49922271176190636 Std dev of sample means: 0.052802166326814684 Theoretical std dev: 1/sqrt(12*30) = 0.05270462766947299
/* Histogram of sample means -- should look Gaussian */
ax_draw2d(
color="steelblue", nbins=40,
ax_histogram(sample_means),
title="Central Limit Theorem: Distribution of Sample Means",
xlabel="Sample Mean", ylabel="Count",
grid=true
)$
The histogram is bell-shaped even though each individual observation comes from a flat uniform distribution. This is the CLT in action.
Monte Carlo Integration¶
To estimate $\int_0^{\pi} \sin(x)\,dx$ by Monte Carlo, draw $N$ random points uniformly from $[0,\pi]$ and compute
$$\hat{I} = \pi \cdot \frac{1}{N} \sum_{i=1}^N \sin(X_i)$$
The exact answer is 2.
/* Monte Carlo estimate of integral of sin(x) from 0 to pi */
N_int : 50000$
/* Generate uniform random values in [0, pi] */
u : np_rand([N_int])$
x_vals : np_mul(u, float(%pi))$
/* Evaluate sin at each point, take mean, multiply by interval length */
sin_vals : np_sin(x_vals)$
mc_integral : float(%pi) * np_mean(sin_vals)$
/* Symbolic exact answer */
exact : integrate(sin(t), t, 0, %pi)$
print("Monte Carlo estimate:", mc_integral)$
print("Exact value:", exact)$
print("Error:", abs(mc_integral - exact))$
Monte Carlo estimate: 1.997170043391759 Exact value: 2 Error: 0.0028299566082410355
Bootstrap Confidence Intervals¶
The bootstrap is a resampling method for estimating uncertainty. Given a dataset of $n$ observations, we:
- Resample $n$ values with replacement (using random integer indices)
- Compute the statistic of interest (here, the mean)
- Repeat many times to build a distribution of the statistic
- Use percentiles of this distribution as confidence interval bounds
We generate a synthetic dataset from a standard normal, then bootstrap a 95% confidence interval for the mean.
/* Generate a dataset of 50 observations from standard normal */
n_obs : 50$
dataset : np_randn([n_obs])$
print("Dataset mean:", np_mean(dataset))$
print("Dataset std:", np_std(dataset))$
Dataset mean: -0.16492631652427606 Dataset std: 0.9974140971578545
/* Bootstrap: resample with replacement 2000 times */
n_boot : 2000$
boot_means : makelist(
block([idx, resample],
/* Generate random indices in [0, n_obs-1] */
idx : np_map(lambda([z], floor(z * n_obs)), np_rand([n_obs])),
/* Build resample by picking elements at those indices */
resample : ndarray(makelist(np_ref(dataset, round(np_ref(idx, k))),
k, 0, n_obs - 1)),
np_mean(resample)
),
i, 1, n_boot
)$
/* Convert to ndarray and sort for percentile extraction */
boot_arr : ndarray(boot_means)$
boot_sorted : np_sort(boot_arr)$
/* 2.5th and 97.5th percentiles for 95% CI */
lo_idx : round(0.025 * n_boot)$
hi_idx : round(0.975 * n_boot) - 1$
ci_lo : np_ref(boot_sorted, lo_idx)$
ci_hi : np_ref(boot_sorted, hi_idx)$
print("95% Bootstrap CI for the mean:", [ci_lo, ci_hi])$
print("Bootstrap mean of means:", np_mean(boot_arr))$
95%o179 Bootstrap CI for the mean: [-0.45124134609434313,0.09918584286816276] Bootstrap mean of means: -0.16839913304545026
/* Histogram of bootstrap means with CI boundary lines */
ax_draw2d(
color="steelblue", nbins=40, name="Bootstrap means",
ax_histogram(boot_arr),
color=red, dash="dash", line_width=2, name="95% CI",
lines([[ci_lo, 0], [ci_lo, n_boot/10]]),
color=red, dash="dash", line_width=2,
lines([[ci_hi, 0], [ci_hi, n_boot/10]]),
title="Bootstrap Distribution of the Mean",
xlabel="Sample Mean", ylabel="Count",
grid=true
)$
The red dashed lines mark the 2.5th and 97.5th percentiles of the bootstrap distribution, forming the 95% confidence interval for the population mean. Since the true mean is 0 (we sampled from a standard normal), the CI should contain 0 in most runs.