DuckDB Integration

The dataframes-duckdb extension adds file I/O, SQL queries, and DuckDB-accelerated operations to the dataframes package. It requires libduckdb (brew install duckdb on macOS).

load("dataframes-duckdb")$

Type mapping: DuckDB DOUBLE → ndarray (f64), INTEGER/BIGINT → ndarray (coerced to f64), VARCHAR → string-column, NULL → NaN (numeric) or “” (string).

`df_read_csv` (path) — Function

Read a CSV file into a df_table. DuckDB auto-detects column types, delimiters, and encoding.

Examples

(%i1) T : df_read_csv("/path/to/data.csv")$
(%i2) df_table_shape(T);
(%o2)                        [100, 5]
(%i3) df_table_names(T);
(%o3)           [id, name, price, qty, region]

See also: df_read_parquet, df_read_json, df_write_csv

`df_read_parquet` (path) — Function

Read a Parquet file into a df_table.

Examples

(%i1) T : df_read_parquet("/path/to/data.parquet")$
(%i2) df_table_shape(T);
(%o2)                        [1000, 4]

See also: df_read_csv, df_write_parquet

`df_read_json` (path) — Function

Read a JSON file into a df_table. DuckDB auto-detects the JSON structure.

Examples

(%i1) T : df_read_json("/path/to/data.json")$
(%i2) df_table_shape(T);
(%o2)                        [50, 3]

See also: df_read_csv, df_read_parquet

`df_write_csv` (T, path) — Function

Write a df_table to a CSV file with header row.

Examples

(%i1) T : df_table(["name", "score"],
                    [df_string_column(["Alice", "Bob"]), ndarray([95.0, 87.0])])$
(%i2) df_write_csv(T, "/tmp/output.csv");
(%o2)                         done

See also: df_read_csv, df_write_parquet

`df_write_parquet` (T, path) — Function

Write a df_table to a Parquet file.

Examples

(%i1) df_write_parquet(T, "/tmp/output.parquet");
(%o1)                         done

See also: df_read_parquet, df_write_csv

`df_sql` (query) — Function

Execute an SQL query via DuckDB and return the result as a df_table. DuckDB can read files directly in SQL. Registered tables (via df_register) are available by name.

Examples

(%i1) T : df_sql("SELECT * FROM read_csv('data.csv') WHERE price > 10")$
(%i2) df_table_shape(T);
(%o2)                        [42, 5]
(%i3) df_register(T, "sales")$
(%i4) R : df_sql("SELECT region, sum(price) AS total FROM sales GROUP BY region")$
(%i5) df_table_names(R);
(%o5)                    [region, total]

See also: df_sql_run, df_register

`df_sql_run` (query) — Function

Execute a DuckDB SQL statement that produces no result (DDL, INSERT, etc.).

Examples

(%i1) df_sql_run("CREATE TABLE temp AS SELECT * FROM read_csv('data.csv')")$
(%i2) T : df_sql("SELECT * FROM temp WHERE x > 0")$

See also: df_sql

`df_register` (T, name) — Function

Register a df_table as a DuckDB table, making it available for SQL queries by name. Re-registering the same name replaces the previous registration.

Examples

(%i1) T : df_table(["x", "y"],
                    [ndarray([1.0, 2.0, 3.0]), ndarray([10.0, 20.0, 30.0])])$
(%i2) df_register(T, "mydata");
(%o2)                         done
(%i3) R : df_sql("SELECT * FROM mydata WHERE x > 1")$
(%i4) df_table_shape(R);
(%o4)                        [2, 2]

See also: df_unregister, df_sql

`df_unregister` (name) — Function

Remove a previously registered DuckDB table.

Examples

(%i1) df_unregister("mydata");
(%o1)                         done

See also: df_register

`df_duckdb_filter` (T, where_clause) — Function

Filter table rows using a DuckDB SQL WHERE clause. Faster than df_filter for large tables. Accepts SQL expressions, not Maxima lambdas.

Examples

(%i1) T : df_table(["name", "price"],
                    [df_string_column(["A", "B", "C"]), ndarray([10.0, 30.0, 20.0])])$
(%i2) T2 : df_duckdb_filter(T, "price > 15")$
(%i3) df_table_shape(T2);
(%o3)                        [2, 2]
(%i4) df_duckdb_filter(T, "price > 15 AND name = 'B'")$

See also: df_filter, df_duckdb_arrange

`df_duckdb_arrange` (T, column) / df_duckdb_arrange (T, column, descending) — Function

Sort table rows using DuckDB. Ascending by default; pass descending for reverse order.

Examples

(%i1) T2 : df_duckdb_arrange(T, "price")$
(%i2) df_to_string_list(df_table_column(T2, "name"));
(%o2)                      [A, C, B]
(%i3) T3 : df_duckdb_arrange(T, "price", descending)$
(%i4) df_to_string_list(df_table_column(T3, "name"));
(%o4)                      [B, C, A]

See also: df_arrange, df_duckdb_filter

`df_duckdb_join` (T1, T2, key) / df_duckdb_join (T1, T2, key, left) — Function

Join two tables using DuckDB. Inner join by default; pass left for a left join.

Examples

(%i1) T1 : df_table(["id", "name"],
                     [df_string_column(["1", "2", "3"]),
                      df_string_column(["Alice", "Bob", "Charlie"])])$
(%i2) T2 : df_table(["id", "score"],
                     [df_string_column(["1", "2"]), ndarray([90.0, 80.0])])$
(%i3) df_table_shape(df_duckdb_join(T1, T2, "id"));
(%o3)                        [2, 3]
(%i4) df_table_shape(df_duckdb_join(T1, T2, "id", left));
(%o4)                        [3, 3]

See also: df_inner_join, df_left_join

`df_duckdb_group_summarize` (T, group_col, name1, expr1, …) — Function

Group and aggregate using DuckDB SQL. Takes alternating result-name / SQL-expression string pairs. Much faster than df_group_by + df_summarize for large tables.

Examples

(%i1) T : df_table(["region", "revenue"],
                    [df_string_column(["N", "S", "N", "S", "N"]),
                     ndarray([10.0, 20.0, 30.0, 40.0, 50.0])])$
(%i2) R : df_duckdb_group_summarize(T, "region",
             "total", "sum(revenue)",
             "avg_rev", "avg(revenue)",
             "n", "count(*)")$
(%i3) df_table_shape(R);
(%o3)                        [2, 4]
(%i4) df_table_names(R);
(%o4)              [region, total, avg_rev, n]

See also: df_group_by, df_summarize

`df_duckdb_status` () — Function

Report whether the DuckDB connection is active. Returns true or false.

Examples

(%i1) df_duckdb_status();
(%o1)                        false
(%i2) df_sql("SELECT 1")$
(%i3) df_duckdb_status();
(%o3)                         true

See also: df_duckdb_close

`df_duckdb_close` () — Function

Close the session DuckDB connection and database. The connection is automatically re-created on the next DuckDB operation.

Examples

(%i1) df_duckdb_close();
(%o1)                         done
(%i2) df_duckdb_status();
(%o2)                        false

See also: df_duckdb_status