NatLabRockies · ppinchuk · Dec 4, 2025 · Dec 1, 2025 · Dec 1, 2025 · Dec 1, 2025
@@ -16,7 +16,7 @@ Primary outcomes: reproducible structured feature/value extraction and structure
 - Linting/Formatting: Ruff (configured in `pyproject.toml`). Line length 79 (72 for docstrings). Ruff also runs format check.
 - Testing: Pytest (unit/integration separation), Coverage (HTML + XML). Rust uses Cargo tests + Clippy + fmt.
 - CI: GitHub Actions workflows for Python (`ci-python.yml`), Rust (`ci-rust.yml`), docs, PyPI publishing.
-- Docs: Sphinx with `pydata-sphinx-theme`. Build via Pixi task `python-doc` or Makefile (`make html`). Fail on warnings.
+- Docs: Sphinx with `pydata-sphinx-theme`. Build via Pixi task `python-docs` or Makefile (`make html`). Fail on warnings.
 - Crawling/Search: Playwright + supplemental packages (rebrowser-playwright, camoufox, crawl4ai, tavily, ddgs, etc.).
 - LLM Providers: OpenAI (core), Anthropic optional (`anthropic` extras). Configurable via API key environment variables.
 
@@ -59,7 +59,7 @@ pixi run -e rdev cargo check --workspace --locked
 pixi run -e rdev tests-r
 
 # Build docs
-pixi run -e pdoc python-doc
+pixi run -e pdoc python-docs
 
 # Demo (requires OPENAI_API_KEY)
 pixi run openai-solar-demo <api_key>

@@ -42,7 +42,7 @@ jobs:
           environments: pdoc
 
       - name: Build Docs
-        run: pixi run -e pdoc python-doc  # This errors on warnings
+        run: pixi run -e pdoc python-docs  # This errors on warnings
 
       - name: deploy
         uses: peaceiris/actions-gh-pages@v4.0.0

@@ -6,20 +6,22 @@ SPHINXOPTS    ?=
 SPHINXBUILD   ?= sphinx-build
 SOURCEDIR     = source
 BUILDDIR      = _build
+CLEAN_TARGETS = $(BUILDDIR) $(SOURCEDIR)/_autosummary
 
 # Put it first so that "make" without argument is like "make help".
 help:
 	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
 
-.PHONY: help Makefile
+.PHONY: help Makefile clean
 
 # Catch-all target: route all unknown targets to Sphinx using the new
 # "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
 %: Makefile
-	rm -rf _build
-	rm -rf source/_autosummary
 	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
 
+clean:
+	rm -rf $(CLEAN_TARGETS)
+
 github: html
 	-git branch -D gh-pages
 	-git push origin --delete gh-pages

@@ -211,7 +211,7 @@ To check your docstring additions/updates, you can build a local version of the
 
 .. code-block:: shell
 
-    pixi r -e pdoc python-doc
+    pixi r -e pdoc python-docs
 
 After running this command, simply open ``docs/_build/html/index.html`` using your favorite browser, e.g.:
 

@@ -0,0 +1,144 @@
+.. _glossary:
+
+Glossary
+========
+
+.. glossary::
+   :sorted:
+
+   INFRA-COMPASS
+      End-to-end pipeline that discovers, parses, and validates energy
+      infrastructure ordinances with LLM tooling.
+
+   LLM
+      Large Language Model that interprets ordinance text, classifies
+      features, and answers structured extraction questions.
+
+   OCR
+      Optical Character Recognition stage powered by ``pytesseract``
+      that converts scanned ordinance PDFs into searchable text.
+
+   Pixi
+      Environment manager used to install dependencies, run tasks, and
+      maintain reproducible shells for COMPASS.
+
+   Playwright
+      Browser automation framework used to crawl web portals and
+      download ordinance documents reliably.
+
+   analysis run
+      Complete invocation of ``compass process`` that ingests a
+      configuration file, processes jurisdictions, and writes results to
+      the run directory.
+
+   clean directory
+      Intermediate folder storing cleaned ordinance text used for LLM
+      prompting during feature extraction.
+
+   clean text file
+      Plain-text excerpt derived from ordinance documents that isolates
+      relevant sections for prompts and validation.
+
+   compass process
+      CLI command that executes the end-to-end pipeline using the inputs
+      defined in the configuration file.
+
+   configuration file
+      JSON or JSON5 document that declares inputs, model assignments,
+      concurrency, and output directories for a run.
+
+   decision tree prompt
+      Structured prompt template that guides the LLM through branching
+      questions to extract quantitative and qualitative ordinance data.
+
+   decision tree
+      Hierarchical rubric of questions and outcomes that organizes how
+      ordinance features are extracted and validated.
+
+   extraction pipeline
+      Crawlers, parsers, and feature detectors that transform raw
+      ordinance text into structured records.
+
+   jurisdiction
+      County or municipality defined in the jurisdiction CSV that
+      frames the geographic scope of an analysis run.
+
+   jurisdiction CSV
+      Input spreadsheet whose ``County`` and ``State`` columns list the
+      locations processed in a run.
+
+   location
+      Combination of county and state identifiers that maps to one row
+      in the jurisdiction CSV and produces a single output bundle.
+
+   location file log
+      Per-location structured log that aggregates runtime diagnostics
+      and JSON exception summaries.
+
+   location manifest
+      JSON metadata file emitted per location summarizing source
+      documents, extraction status, and validation outcomes.
+
+   log directory
+      Folder defined by ``log_dir`` that stores run-level logs, prompt
+      archives, and timing summaries.
+
+   llm cost tracker
+      Runtime utility that multiplies token usage by configured pricing
+      to report estimated spend per model.
+
+   llm service
+      Abstraction over providers such as OpenAI or Azure OpenAI that
+      enforces authentication, rate limits, and retry policies.
+
+   llm service rate limit
+      Configuration value that caps tokens per minute for a model to
+      avoid provider throttling.
+
+   llm task
+      Logical label assigned to prompt templates that maps to a specific
+      model entry within the configuration.
+
+   ordinance
+      Legal text that governs energy infrastructure within a
+      jurisdiction and feeds the extraction workflows.
+
+   ordinance document
+      Source PDF or HTML retrieved during crawling that contains the
+      legal language for the targeted technology.
+
+   ordinance file directory
+      Folder defined by ``ordinance_file_dir`` that caches downloaded
+      ordinance PDFs and HTML files.
+
+   out directory
+      Root folder defined by ``out_dir`` where structured results,
+      cleaned text, and logs for each run are written.
+
+   ``pytesseract``
+      Python wrapper for the Tesseract OCR engine used to enable text
+      extraction from scanned ordinance documents.
+
+   rate limiter
+      Token-based throttle that keeps LLM requests within provider
+      quotas while maximizing throughput.
+
+   structured record
+      Tabular representation of ordinance features, thresholds, and
+      metadata exported for downstream analysis.
+
+   technology
+      ``tech`` configuration key that defines the target infrastructure
+      domain, such as solar or wind.
+
+   text splitter
+      Utility that chunks ordinance text into overlapping segments sized
+      for LLM context windows.
+
+   validation pipeline
+      Post-processing stage that verifies extracted features, resolves
+      conflicts, and confirms location metadata.
+
+   web search
+      Search-and-crawl phase that discovers ordinance links using
+      providers such as Tavily, DuckDuckGo Search, or custom engines.
@@ -8,6 +8,7 @@
    CLI reference <_cli/cli>
    Validation <val/validation>
    Development <dev/index>
+   Glossary <glossary>
 
 
 INFRA-COMPASS documentation
@@ -20,3 +21,5 @@ What is INFRA-COMPASS?
 .. include:: ../../README.rst
    :start-after: inclusion-intro
    :end-before: Installing INFRA-COMPASS
+
+:ref:`genindex` | :ref:`modindex` | :ref:`glossary`