Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Primary outcomes: reproducible structured feature/value extraction and structure
- Linting/Formatting: Ruff (configured in `pyproject.toml`). Line length 79 (72 for docstrings). Ruff also runs format check.
- Testing: Pytest (unit/integration separation), Coverage (HTML + XML). Rust uses Cargo tests + Clippy + fmt.
- CI: GitHub Actions workflows for Python (`ci-python.yml`), Rust (`ci-rust.yml`), docs, PyPI publishing.
- Docs: Sphinx with `pydata-sphinx-theme`. Build via Pixi task `python-doc` or Makefile (`make html`). Fail on warnings.
- Docs: Sphinx with `pydata-sphinx-theme`. Build via Pixi task `python-docs` or Makefile (`make html`). Fail on warnings.
- Crawling/Search: Playwright + supplemental packages (rebrowser-playwright, camoufox, crawl4ai, tavily, ddgs, etc.).
- LLM Providers: OpenAI (core), Anthropic optional (`anthropic` extras). Configurable via API key environment variables.

Expand Down Expand Up @@ -59,7 +59,7 @@ pixi run -e rdev cargo check --workspace --locked
pixi run -e rdev tests-r

# Build docs
pixi run -e pdoc python-doc
pixi run -e pdoc python-docs

# Demo (requires OPENAI_API_KEY)
pixi run openai-solar-demo <api_key>
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ jobs:
environments: pdoc

- name: Build Docs
run: pixi run -e pdoc python-doc # This errors on warnings
run: pixi run -e pdoc python-docs # This errors on warnings

- name: deploy
uses: peaceiris/actions-gh-pages@v4.0.0
Expand Down
8 changes: 5 additions & 3 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,22 @@ SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = _build
CLEAN_TARGETS = $(BUILDDIR) $(SOURCEDIR)/_autosummary

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile
.PHONY: help Makefile clean

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
rm -rf _build
rm -rf source/_autosummary
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

clean:
rm -rf $(CLEAN_TARGETS)

github: html
-git branch -D gh-pages
-git push origin --delete gh-pages
Expand Down
2 changes: 1 addition & 1 deletion docs/source/dev/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ To check your docstring additions/updates, you can build a local version of the

.. code-block:: shell

pixi r -e pdoc python-doc
pixi r -e pdoc python-docs

After running this command, simply open ``docs/_build/html/index.html`` using your favorite browser, e.g.:

Expand Down
144 changes: 144 additions & 0 deletions docs/source/glossary.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
.. _glossary:

Glossary
========

.. glossary::
:sorted:

INFRA-COMPASS
End-to-end pipeline that discovers, parses, and validates energy
infrastructure ordinances with LLM tooling.

LLM
Large Language Model that interprets ordinance text, classifies
features, and answers structured extraction questions.

OCR
Optical Character Recognition stage powered by ``pytesseract``
that converts scanned ordinance PDFs into searchable text.

Pixi
Environment manager used to install dependencies, run tasks, and
maintain reproducible shells for COMPASS.

Playwright
Browser automation framework used to crawl web portals and
download ordinance documents reliably.

analysis run
Complete invocation of ``compass process`` that ingests a
configuration file, processes jurisdictions, and writes results to
the run directory.

clean directory
Intermediate folder storing cleaned ordinance text used for LLM
prompting during feature extraction.

clean text file
Plain-text excerpt derived from ordinance documents that isolates
relevant sections for prompts and validation.

compass process
CLI command that executes the end-to-end pipeline using the inputs
defined in the configuration file.

configuration file
JSON or JSON5 document that declares inputs, model assignments,
concurrency, and output directories for a run.

decision tree prompt
Structured prompt template that guides the LLM through branching
questions to extract quantitative and qualitative ordinance data.

decision tree
Hierarchical rubric of questions and outcomes that organizes how
ordinance features are extracted and validated.

extraction pipeline
Crawlers, parsers, and feature detectors that transform raw
ordinance text into structured records.

jurisdiction
County or municipality defined in the jurisdiction CSV that
frames the geographic scope of an analysis run.

jurisdiction CSV
Input spreadsheet whose ``County`` and ``State`` columns list the
locations processed in a run.

location
Combination of county and state identifiers that maps to one row
in the jurisdiction CSV and produces a single output bundle.

location file log
Per-location structured log that aggregates runtime diagnostics
and JSON exception summaries.

location manifest
JSON metadata file emitted per location summarizing source
documents, extraction status, and validation outcomes.

log directory
Folder defined by ``log_dir`` that stores run-level logs, prompt
archives, and timing summaries.

llm cost tracker
Runtime utility that multiplies token usage by configured pricing
to report estimated spend per model.

llm service
Abstraction over providers such as OpenAI or Azure OpenAI that
enforces authentication, rate limits, and retry policies.

llm service rate limit
Configuration value that caps tokens per minute for a model to
avoid provider throttling.

llm task
Logical label assigned to prompt templates that maps to a specific
model entry within the configuration.

ordinance
Legal text that governs energy infrastructure within a
jurisdiction and feeds the extraction workflows.

ordinance document
Source PDF or HTML retrieved during crawling that contains the
legal language for the targeted technology.

ordinance file directory
Folder defined by ``ordinance_file_dir`` that caches downloaded
ordinance PDFs and HTML files.

out directory
Root folder defined by ``out_dir`` where structured results,
cleaned text, and logs for each run are written.

``pytesseract``
Python wrapper for the Tesseract OCR engine used to enable text
extraction from scanned ordinance documents.

rate limiter
Token-based throttle that keeps LLM requests within provider
quotas while maximizing throughput.

structured record
Tabular representation of ordinance features, thresholds, and
metadata exported for downstream analysis.

technology
``tech`` configuration key that defines the target infrastructure
domain, such as solar or wind.

text splitter
Utility that chunks ordinance text into overlapping segments sized
for LLM context windows.

validation pipeline
Post-processing stage that verifies extracted features, resolves
conflicts, and confirms location metadata.

web search
Search-and-crawl phase that discovers ordinance links using
providers such as Tavily, DuckDuckGo Search, or custom engines.
3 changes: 3 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
CLI reference <_cli/cli>
Validation <val/validation>
Development <dev/index>
Glossary <glossary>


INFRA-COMPASS documentation
Expand All @@ -20,3 +21,5 @@ What is INFRA-COMPASS?
.. include:: ../../README.rst
:start-after: inclusion-intro
:end-before: Installing INFRA-COMPASS

:ref:`genindex` | :ref:`modindex` | :ref:`glossary`
Loading