- Introduction
- Installing pgEdge Document Loader
- Using pgEdge Document Loader
- Supported Formats
- Troubleshooting
- Licence
pgEdge Document Loader is a command-line tool for loading documents from various formats into PostgreSQL databases. Full documentation is available here.
The pgEdge Document Loader automatically converts documents (HTML, Markdown, reStructuredText, and SGML/DocBook) to Markdown format and loads them into a PostgreSQL database with extracted metadata.
Features
The pgEdge Document Loader automatically converts documents (HTML, Markdown, reStructuredText, and DocBook SGML/XML) to Markdown format and loads them into a PostgreSQL database with extracted metadata.
Features
- Multiple Format Support: HTML, Markdown, reStructuredText, and DocBook SGML/XML
- Git Repository Support: Clone and process docs directly from Git repositories
- Automatic Conversion: All formats converted to Markdown
- Metadata Extraction: Titles, filenames, timestamps
- Flexible Input: Single file, directory, glob patterns, or Git repository URL
- Database Flexibility: Configurable column mappings
- Custom Metadata Columns: Add fixed values to custom columns for every row
- Update Mode: Update existing rows or insert new ones
- Transactional: All-or-nothing processing with automatic rollback
- Secure: Password from environment, .pgpass, or interactive prompt
- Configuration Files: Reusable YAML configuration
Before installing and using pgEdge Document Loader, download and install:
- Go 1.23 or later
- PostgreSQL 14 or later
Getting started with pgEdge Document Loader involves three steps:
- Install the tool.
- Create a table in your Postgres database to hold the loaded content.
- Run the
pgedge-docloaderexecutable.
Installing pgEdge Document Loader
Use the following commands to download and build pgedge-docloader:
git clone https://github.com/pgedge/pgedge-docloader.git
cd pgedge-docloader
make build
make installCreating a Postgres Table
Before invoking Document Loader, you must configure a Postgres database and create a table with the appropriate columns to hold the extracted documentation content:
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT,
content TEXT NOT NULL,
source BYTEA,
filename TEXT UNIQUE,
modified TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);Invoking pgedge-docloader
When invoking pgedge-docloader, you can specify configuration preferences on the command line, or with a configuration file.
The following command invokes Document Loader on the command line:
# Load Markdown files into PostgreSQL
pgedge-docloader \
--source ./docs \
--db-host localhost \
--db-name mydb \
--db-user myuser \
--db-table documents \
--col-doc-content content \
--col-file-name filenameTo manage deployment preferences in a configuration file, save your deployment details in a file, and then include the --config keyword when invoking pgedge-docloader:
# Create config.yml
cat > config.yml <<EOF
source: "./docs"
db-host: localhost
db-name: mydb
db-user: myuser
db-table: documents
col-doc-content: content
col-file-name: filename
update: true
EOF
# Run with a configuration file
export PGPASSWORD=mypassword
pgedge-docloader --config config.ymlFor a comprehensive Quickstart Guide, visit here.
This project is under active development. See the documentation for the latest features and updates.
The pgEdge Document Loader Makefile includes clauses that run test cases or invoke the go linter. Use the following commands:
Running Tests
make testLinting
make lintYour contributions are welcome! Please feel free to submit issues and pull requests.
- Documentation: pgEdge Docloader
- Issues: GitHub Issues
This project is licensed under the PostgreSQL License.