Exploratory Data Analysis Pipeline¶

This notebook demonstrates a complete EDA workflow in Maxima: load a dataset, inspect its structure, compute summary statistics, and visualize distributions and relationships — all using the dataframes and ax-plots packages.

In [1]:
load("numerics")$
load("dataframes")$
load("dataframes-duckdb")$
load("ax-plots")$

Loading Data¶

In [2]:
T : df_read_csv("../../data/sales.csv")$
print("Shape:", df_table_shape(T))$
print("Columns:", df_table_names(T))$
Shape: [100,6]
Columns: ["date","region","product","units","price","revenue"]
In [3]:
df_table_head(T)
Out[3]:

Summary Statistics¶

df_describe computes count, mean, standard deviation, min, quartiles (25%, 50%, 75%), and max for every numeric column.

In [4]:
df_describe(T)
Out[4]:

Distribution of Revenue¶

A histogram shows how revenue values are spread across the dataset.

In [5]:
ax_draw2d(
  ax_histogram(df_table_column(T, "revenue")),
  title="Revenue Distribution",
  xlabel="Revenue", ylabel="Count",
  grid=true, nbins=15
)$
No description has been provided for this image

Sales by Region¶

Group the data by region and compute total revenue and average units sold per region.

In [6]:
by_region : df_group_by(T, "region")$
region_totals : df_summarize(by_region,
  "total_revenue", lambda([revenue], np_sum(revenue)),
  "avg_units", lambda([units], np_mean(units))
)$
region_totals;
Out[6]:
In [7]:
ax_draw2d(
  ax_bar(
    df_to_string_list(df_table_column(region_totals, "region")),
    np_to_list(df_table_column(region_totals, "total_revenue"))
  ),
  title="Total Revenue by Region",
  ylabel="Revenue ($)", grid=true
)$
No description has been provided for this image

Product Analysis¶

How many sales does each product have, and what are the average price and total units?

In [8]:
df_value_counts(df_table_column(T, "product"))
Out[8]:
In [9]:
by_product : df_group_by(T, "product")$
product_stats : df_summarize(by_product,
  "avg_price", lambda([price], np_mean(price)),
  "total_units", lambda([units], np_sum(units))
)$
product_stats;
Out[9]:
In [10]:
ax_draw2d(
  ax_bar(
    df_to_string_list(df_table_column(product_stats, "product")),
    np_to_list(df_table_column(product_stats, "avg_price"))
  ),
  title="Average Price by Product",
  ylabel="Price ($)", grid=true
)$
No description has been provided for this image

Scatter: Units vs Revenue¶

Do more units sold correspond to higher revenue? A scatter plot reveals the relationship.

In [11]:
ax_draw2d(
  color=blue, marker_size=5,
  points(np_to_list(df_table_column(T, "units")),
         np_to_list(df_table_column(T, "revenue"))),
  title="Units Sold vs Revenue",
  xlabel="Units", ylabel="Revenue",
  grid=true
)$
No description has been provided for this image

Filtering — High-Value Sales¶

Filter the dataset to keep only rows where revenue exceeds 3000.

In [12]:
high_value : df_filter(T, lambda([revenue], is(revenue > 3000)))$
print("High-value sales:", df_table_shape(high_value))$
df_table_head(high_value);
High-value sales: [26,6]
Out[12]:

Summary¶

With dataframes for tabular manipulation and ax-plots for visualization, Maxima provides a complete exploratory data analysis workflow: load data, inspect structure, compute statistics, visualize distributions and relationships, and filter for subsets of interest.