Kcud is a simplified library that extracts string columns from tables in a database stored as a DuckDB database file. (UNDER CONSTRUCTION FOR MORE FEATURES!)
- DuckDB's C++ API (install it based on your OS)
- DuckDB's CLI (refer to this page)
- If not working, then build from DuckDB's source
cdintoduckdb.- Run
make. - A
buildfolder is created. - To run DuckDB as an in-memory database, run
build/release/duckdband you are done. To run DuckDB with a native database file (i.e.,.dbfile), runbuild/release/duckdb <filename>.db. Refer to the How to Generate.dbFiles section for details related to generating database files for testing.
- If not working, then build from DuckDB's source
- Run DuckDB as an in-memory database.
- Ensure that the TPC-H extension is loaded in DuckDB. If not, run
INSTALL tpch;and thenLOAD tpch;. - Run
CALL dbgen(sf=<sf>);, where<sf>is the scale factor the TPC-H workload is generated. - Run
COPY (SELECT l_comment FROM lineitem) TO 'comment_sf<sf>.csv' (header, delimiter ',');to dump thelineitem.l_commentcolumn to a CSV file. - Run
.exitto quit DuckDB and then re-run DuckDB, specifying the database file (no need to create one in advance, as DuckDB will do it if the file is not found). - Run
PRAGMA force_compression='uncompressed';to disable compression on the string columns. This is for the purpose of benchmarking. - Load the rows in the dumped CSV file with
CREATE TABLE comment AS SELECT * FROM 'comment_sf<sf>.csv';. Now you have a table namedcommentthat contains all strings fromlineitem.l_commentcolumn.