Monte Carlo Methods¶

Monte Carlo methods use random simulation to estimate quantities that may be difficult or impossible to compute analytically. This notebook demonstrates several classic applications: estimating $\pi$, studying convergence rates, verifying the Central Limit Theorem, computing integrals, and constructing bootstrap confidence intervals.

We use the numerics package for vectorized random number generation and array operations, and ax-plots for visualization.

In [89]:
load("numerics")$
load("ax-plots")$

Estimating Pi¶

The classic Monte Carlo estimate of $\pi$: scatter $N$ random points uniformly in the unit square $[0,1]^2$ and count how many fall inside the quarter unit circle ($x^2 + y^2 \le 1$). The ratio of inside points to total points approximates $\pi/4$, so

$$\hat{\pi} = 4 \cdot \frac{\text{count inside}}{N}$$

In [90]:
/* Generate N random points in [0,1)^2 */
N : 10000$
pts : np_rand([N, 2])$
x : np_col(pts, 0)$
y : np_col(pts, 1)$

/* Distance from origin for each point */
d : np_sqrt(np_add(np_pow(x, 2), np_pow(y, 2)))$

/* Count points inside the quarter unit circle */
inside_mask : np_less_equal(d, 1.0)$
count_inside : np_sum(inside_mask)$

pi_estimate : 4.0 * count_inside / N$
print("Monte Carlo pi estimate:", pi_estimate)$
print("True pi:", float(%pi))$
print("Error:", abs(pi_estimate - float(%pi)))$
Monte Carlo pi estimate: 3.1704
True pi: 3.141592653589793
Error: 0.02880734641020677
In [91]:
/* Scatter plot: blue = inside circle, red = outside */
x_in  : np_to_list(np_extract(inside_mask, x))$
y_in  : np_to_list(np_extract(inside_mask, y))$
outside_mask : np_logical_not(inside_mask)$
x_out : np_to_list(np_extract(outside_mask, x))$
y_out : np_to_list(np_extract(outside_mask, y))$

ax_draw2d(
  color=blue, name="Inside", marker_size=2,
  points(x_in, y_in),
  color=red, name="Outside", marker_size=2,
  points(x_out, y_out),
  title=sconcat("Pi Estimate: ", string(pi_estimate), " (N=", string(N), ")"),
  xlabel="x", ylabel="y",
  aspect_ratio=true,
  grid=true
)$
No description has been provided for this image

Convergence of the Pi Estimate¶

As $N$ increases, the Monte Carlo estimate converges to the true value. The standard error decreases as $O(1/\sqrt{N})$, which is the hallmark convergence rate of Monte Carlo methods -- independent of dimension.

In [92]:
/* Run the pi estimation for increasing sample sizes */
sample_sizes : [100, 500, 1000, 5000, 10000, 50000]$

pi_estimates : makelist(
  block([pts, xx, yy, dd, cc],
    pts : np_rand([sample_sizes[i], 2]),
    xx : np_col(pts, 0),
    yy : np_col(pts, 1),
    dd : np_sqrt(np_add(np_pow(xx, 2), np_pow(yy, 2))),
    cc : np_sum(np_less_equal(dd, 1.0)),
    4.0 * cc / sample_sizes[i]
  ),
  i, 1, length(sample_sizes)
)$

for i: 1 thru length(sample_sizes) do
  print(sconcat("N = ", string(sample_sizes[i]),
                "  estimate = ", string(pi_estimates[i]),
                "  error = ", string(abs(pi_estimates[i] - float(%pi)))))$
N = 100  estimate = 3.12  error = 0.02159265358979301
N = 500  estimate = 3.032  error = 0.10959265358979309
N = 1000  estimate = 3.132  error = 0.009592653589792999
N = 5000  estimate = 3.1584  error = 0.016807346410206758
N = 10000  estimate = 3.1516  error = 0.010007346410207063
N = 50000  estimate = 3.1404  error = 0.0011926535897930357
In [93]:
/* Plot convergence: estimate vs N, with true pi as reference */
ax_draw2d(
  color=blue, name="MC estimate",
  points(sample_sizes, pi_estimates),
  color=red, dash="dash", name="true pi",
  lines([[first(sample_sizes), float(%pi)],
         [last(sample_sizes), float(%pi)]]),
  title="Convergence of Pi Estimate",
  xlabel="N (sample size)", ylabel="Estimate",
  grid=true
)$
No description has been provided for this image

Central Limit Theorem¶

The Central Limit Theorem states that the mean of $n$ i.i.d. random variables converges to a normal distribution as $n \to \infty$, regardless of the underlying distribution.

We demonstrate this with the uniform distribution on $[0,1)$: take many samples of size 30, compute the mean of each, and observe that the histogram of sample means is approximately Gaussian -- even though the individual values are uniformly distributed.

In [94]:
/* Generate 2000 samples, each of size 30 */
n_samples : 2000$
sample_size : 30$

/* Each row is one sample of 30 uniform values */
samples : np_rand([n_samples, sample_size])$

/* Compute mean of each row (sample) */
sample_means : np_mean(samples, 1)$

print("Mean of sample means:", np_mean(sample_means))$
print("Std dev of sample means:", np_std(sample_means))$
print("Theoretical std dev: 1/sqrt(12*30) =", float(1/sqrt(12*30)))$
Mean of sample means: 0.49922271176190636
Std dev of sample means: 0.052802166326814684
Theoretical std dev: 1/sqrt(12*30) = 0.05270462766947299
In [95]:
/* Histogram of sample means -- should look Gaussian */
ax_draw2d(
  color="steelblue", nbins=40,
  ax_histogram(sample_means),
  title="Central Limit Theorem: Distribution of Sample Means",
  xlabel="Sample Mean", ylabel="Count",
  grid=true
)$
No description has been provided for this image

The histogram is bell-shaped even though each individual observation comes from a flat uniform distribution. This is the CLT in action.

Monte Carlo Integration¶

To estimate $\int_0^{\pi} \sin(x)\,dx$ by Monte Carlo, draw $N$ random points uniformly from $[0,\pi]$ and compute

$$\hat{I} = \pi \cdot \frac{1}{N} \sum_{i=1}^N \sin(X_i)$$

The exact answer is 2.

In [96]:
/* Monte Carlo estimate of integral of sin(x) from 0 to pi */
N_int : 50000$

/* Generate uniform random values in [0, pi] */
u : np_rand([N_int])$
x_vals : np_mul(u, float(%pi))$

/* Evaluate sin at each point, take mean, multiply by interval length */
sin_vals : np_sin(x_vals)$
mc_integral : float(%pi) * np_mean(sin_vals)$

/* Symbolic exact answer */
exact : integrate(sin(t), t, 0, %pi)$

print("Monte Carlo estimate:", mc_integral)$
print("Exact value:", exact)$
print("Error:", abs(mc_integral - exact))$
Monte Carlo estimate: 1.997170043391759
Exact value: 2
Error: 0.0028299566082410355

Bootstrap Confidence Intervals¶

The bootstrap is a resampling method for estimating uncertainty. Given a dataset of $n$ observations, we:

  1. Resample $n$ values with replacement (using random integer indices)
  2. Compute the statistic of interest (here, the mean)
  3. Repeat many times to build a distribution of the statistic
  4. Use percentiles of this distribution as confidence interval bounds

We generate a synthetic dataset from a standard normal, then bootstrap a 95% confidence interval for the mean.

In [97]:
/* Generate a dataset of 50 observations from standard normal */
n_obs : 50$
dataset : np_randn([n_obs])$

print("Dataset mean:", np_mean(dataset))$
print("Dataset std:", np_std(dataset))$
Dataset mean: -0.16492631652427606
Dataset std: 0.9974140971578545
In [98]:
/* Bootstrap: resample with replacement 2000 times */
n_boot : 2000$

boot_means : makelist(
  block([idx, resample],
    /* Generate random indices in [0, n_obs-1] */
    idx : np_map(lambda([z], floor(z * n_obs)), np_rand([n_obs])),
    /* Build resample by picking elements at those indices */
    resample : ndarray(makelist(np_ref(dataset, round(np_ref(idx, k))),
                               k, 0, n_obs - 1)),
    np_mean(resample)
  ),
  i, 1, n_boot
)$

/* Convert to ndarray and sort for percentile extraction */
boot_arr : ndarray(boot_means)$
boot_sorted : np_sort(boot_arr)$

/* 2.5th and 97.5th percentiles for 95% CI */
lo_idx : round(0.025 * n_boot)$
hi_idx : round(0.975 * n_boot) - 1$
ci_lo : np_ref(boot_sorted, lo_idx)$
ci_hi : np_ref(boot_sorted, hi_idx)$

print("95% Bootstrap CI for the mean:", [ci_lo, ci_hi])$
print("Bootstrap mean of means:", np_mean(boot_arr))$
95%o179 Bootstrap CI for the mean: [-0.45124134609434313,0.09918584286816276]
Bootstrap mean of means: -0.16839913304545026
In [99]:
/* Histogram of bootstrap means with CI boundary lines */
ax_draw2d(
  color="steelblue", nbins=40, name="Bootstrap means",
  ax_histogram(boot_arr),
  color=red, dash="dash", line_width=2, name="95% CI",
  lines([[ci_lo, 0], [ci_lo, n_boot/10]]),
  color=red, dash="dash", line_width=2,
  lines([[ci_hi, 0], [ci_hi, n_boot/10]]),
  title="Bootstrap Distribution of the Mean",
  xlabel="Sample Mean", ylabel="Count",
  grid=true
)$
No description has been provided for this image

The red dashed lines mark the 2.5th and 97.5th percentiles of the bootstrap distribution, forming the 95% confidence interval for the population mean. Since the true mean is 0 (we sampled from a standard normal), the CI should contain 0 in most runs.