Exploratory Data Analysis Pipeline¶
This notebook demonstrates a complete EDA workflow in Maxima: load a dataset, inspect its structure, compute summary statistics, and visualize distributions and relationships — all using the dataframes and ax-plots packages.
load("numerics")$
load("dataframes")$
load("dataframes-duckdb")$
load("ax-plots")$
Loading Data¶
T : df_read_csv("../../data/sales.csv")$
print("Shape:", df_table_shape(T))$
print("Columns:", df_table_names(T))$
Shape: [100,6] Columns: ["date","region","product","units","price","revenue"]
df_table_head(T)
Summary Statistics¶
df_describe computes count, mean, standard deviation, min, quartiles
(25%, 50%, 75%), and max for every numeric column.
df_describe(T)
Distribution of Revenue¶
A histogram shows how revenue values are spread across the dataset.
ax_draw2d(
ax_histogram(df_table_column(T, "revenue")),
title="Revenue Distribution",
xlabel="Revenue", ylabel="Count",
grid=true, nbins=15
)$
Sales by Region¶
Group the data by region and compute total revenue and average units sold per region.
by_region : df_group_by(T, "region")$
region_totals : df_summarize(by_region,
"total_revenue", lambda([revenue], np_sum(revenue)),
"avg_units", lambda([units], np_mean(units))
)$
region_totals;
ax_draw2d(
ax_bar(
df_to_string_list(df_table_column(region_totals, "region")),
np_to_list(df_table_column(region_totals, "total_revenue"))
),
title="Total Revenue by Region",
ylabel="Revenue ($)", grid=true
)$
Product Analysis¶
How many sales does each product have, and what are the average price and total units?
df_value_counts(df_table_column(T, "product"))
by_product : df_group_by(T, "product")$
product_stats : df_summarize(by_product,
"avg_price", lambda([price], np_mean(price)),
"total_units", lambda([units], np_sum(units))
)$
product_stats;
ax_draw2d(
ax_bar(
df_to_string_list(df_table_column(product_stats, "product")),
np_to_list(df_table_column(product_stats, "avg_price"))
),
title="Average Price by Product",
ylabel="Price ($)", grid=true
)$
Scatter: Units vs Revenue¶
Do more units sold correspond to higher revenue? A scatter plot reveals the relationship.
ax_draw2d(
color=blue, marker_size=5,
points(np_to_list(df_table_column(T, "units")),
np_to_list(df_table_column(T, "revenue"))),
title="Units Sold vs Revenue",
xlabel="Units", ylabel="Revenue",
grid=true
)$
Filtering — High-Value Sales¶
Filter the dataset to keep only rows where revenue exceeds 3000.
high_value : df_filter(T, lambda([revenue], is(revenue > 3000)))$
print("High-value sales:", df_table_shape(high_value))$
df_table_head(high_value);
High-value sales: [26,6]
Summary¶
With dataframes for tabular manipulation and ax-plots for visualization, Maxima provides a complete exploratory data analysis workflow: load data, inspect structure, compute statistics, visualize distributions and relationships, and filter for subsets of interest.