teachdb - A free, in-memory database for facilitating hands-on, basic SQL instruction in a notebook environment
Install teachdb:
pip install git+https://github.com/freestackinitiative/teachingdb.git
teachdb is an in-memory micro relational database, powered by duckdb. It was made with two types of users in mind: instructors who want to teach SQL concepts, and students who want to learn and practice the fundamentals. Combined with a Jupyter Notebook, teachdb provides a database that can be used to demonstrate fundamental SQL concepts such as select queries, filtering, aggregations, and joins. It can even be used to introduce more advanced topics such as analytical/window functions, common table expressions (CTEs), data definition language (DDL) commands, etc. For students, it provides a safe environment to learn and experiment with a SQL database without the need for setting up their own server or downloading additional software.
Here is all that's needed to use teachdb in your Jupyter Notebook:
from teachdb import TeachDB
# Initialize the database
db = TeachDB()
# Set up the notebook environment
db.setup_notebook()
# Connect to SQL magic
connection = db.connection
%sql connectionAnd then, to query your data, create a new cell and use the %%sql SQL magic function like so:
%%sql
SELECT *
FROM companyThe main class for creating and managing TeachDB connections. This class automatically loads sample databases and sets up the environment for SQL instruction.
Parameters:
connection(Optional[duckdb.DuckDBPyConnection]): An existing DuckDB connection. If None, creates a new in-memory connection.database(Union[List[str], str]): The database(s) to load. Defaults to "core". Can be a single database name or a list of database names.include_schemas(bool): Whether to create separate schemas for each database. Defaults to False.
Basic Usage:
from teachdb import TeachDB
# Load the default "core" database
db = TeachDB()
db.setup_notebook()
connection = db.connection
%sql connectionLoading Multiple Databases:
from teachdb import TeachDB
# Load multiple databases
db = TeachDB(database=["sales_cog_opex", "ds_salaries"])
db.setup_notebook()
connection = db.connection
%sql connectionUsing with Existing DuckDB Connection:
import duckdb
from teachdb import TeachDB
# Create your own connection
conn = duckdb.connect(":memory:")
db = TeachDB(connection=conn, database="core")
db.setup_notebook()
%sql connConfigures the notebook environment for optimal SQL instruction, including setting up SQL magic and pandas display options.
Parameters:
sqlmagic_autopandas(bool): Enable automatic pandas DataFrame output. Defaults to True.sqlmagic_feedback(bool): Show SQL magic feedback. Defaults to False.sqlmagic_displaycon(bool): Display connection info. Defaults to False.display_max_rows(Optional[int]): Maximum rows to display in pandas output.display_max_columns(Optional[int]): Maximum columns to display in pandas output.display_width(Optional[int]): Display width for pandas output.display_max_colwidth(Optional[int]): Maximum column width. Defaults to 99.**pandas_opts: Additional pandas display options.
Example:
from teachdb import TeachDB
db = TeachDB()
db.setup_notebook(
display_max_rows=50,
display_max_columns=10,
sqlmagic_feedback=True
)
connection = db.connection
%sql connectionExecute a SQL query directly and return results as a list of tuples.
Parameters:
query(str): The SQL query to execute.
Returns:
- List of tuples representing the query results.
Example:
from teachdb import TeachDB
db = TeachDB()
results = db.execute_query("SELECT * FROM company LIMIT 5")
print(results)Get metadata about tables in the database by querying DuckDB system tables.
Parameters:
schema(Optional[str]): Filter results for a specific schema.
Returns:
- List of tuples containing table metadata.
Example:
from teachdb import TeachDB
db = TeachDB()
schema_info = db.get_database_schema()
for table_info in schema_info:
print(table_info)Load custom data into the database from a dictionary of CSV data.
Parameters:
schema(Dict[str, bytes]): Dictionary with table names as keys and CSV data as bytes as values.overwrite(bool): Whether or not to overwrite an existing schema/database. Default isFalse.schema_name(Optional[str]): Name for the database schema.
Example:
import pandas as pd
from teachdb import TeachDB
# Create sample data
df = pd.DataFrame({"ColA": list(range(10)), "ColB": list(range(10))})
csv_data = df.to_csv(index=False).encode('utf-8')
# Load into TeachDB
db = TeachDB()
db.load_db(schema={"sample_table": csv_data}, schema_name="custom_data")
db.setup_notebook()
connection = db.connection
%sql connectionNow you can query your custom data:
%%sql
SELECT *
FROM sample_table;Get information about all available databases in the TeachDB schema.
Returns:
- Dictionary containing database schemas and their metadata.
Example:
from teachdb import get_available_schemas
schemas = get_available_schemas()
print("Available databases:")
for db_name in schemas.keys():
print(f"- {db_name}")TeachDB comes with several pre-built databases for different learning scenarios:
- core: The default database with basic tables for learning SQL fundamentals
- sales_cog_opex: Sales and operational data for business analytics
- ds_salaries: Data science salary information for analytical queries
Use get_available_schemas() to see all available databases and their contents.
TeachDB includes custom exception classes for better error handling:
TeachDBConnectionError: Raised when there are connection issuesTeachDBSchemaError: Raised when there are problems with database schemaTeachDBQueryExecutionError: Raised when query execution fails
These exceptions provide detailed error messages to help with debugging and learning.