-
Notifications
You must be signed in to change notification settings - Fork 3
Misc Documentation updates #355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
9270605
Update command name
ppinchuk 20530d9
More flexible makefile
ppinchuk 39a5622
Command now clean builds docs
ppinchuk 4f453ef
Add glossary
ppinchuk b0e2e3b
Add links
ppinchuk fcc529b
Update a few deps
ppinchuk 9dee2a1
PR review updates
ppinchuk 76b68a4
Merge remote-tracking branch 'origin/main' into pp/misc
ppinchuk 8587137
Update lockfile
ppinchuk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,144 @@ | ||
| .. _glossary: | ||
|
|
||
| Glossary | ||
| ======== | ||
|
|
||
| .. glossary:: | ||
| :sorted: | ||
|
|
||
| INFRA-COMPASS | ||
| End-to-end pipeline that discovers, parses, and validates energy | ||
| infrastructure ordinances with LLM tooling. | ||
|
|
||
| LLM | ||
| Large Language Model that interprets ordinance text, classifies | ||
| features, and answers structured extraction questions. | ||
|
|
||
| OCR | ||
| Optical Character Recognition stage powered by ``pytesseract`` | ||
| that converts scanned ordinance PDFs into searchable text. | ||
|
|
||
| Pixi | ||
| Environment manager used to install dependencies, run tasks, and | ||
| maintain reproducible shells for COMPASS. | ||
|
|
||
| Playwright | ||
| Browser automation framework used to crawl web portals and | ||
| download ordinance documents reliably. | ||
|
|
||
| analysis run | ||
| Complete invocation of ``compass process`` that ingests a | ||
| configuration file, processes jurisdictions, and writes results to | ||
| the run directory. | ||
|
|
||
| clean directory | ||
| Intermediate folder storing cleaned ordinance text used for LLM | ||
| prompting during feature extraction. | ||
|
|
||
| clean text file | ||
| Plain-text excerpt derived from ordinance documents that isolates | ||
| relevant sections for prompts and validation. | ||
|
|
||
| compass process | ||
| CLI command that executes the end-to-end pipeline using the inputs | ||
| defined in the configuration file. | ||
|
|
||
| configuration file | ||
| JSON or JSON5 document that declares inputs, model assignments, | ||
| concurrency, and output directories for a run. | ||
|
|
||
| decision tree prompt | ||
| Structured prompt template that guides the LLM through branching | ||
| questions to extract quantitative and qualitative ordinance data. | ||
|
|
||
| decision tree | ||
| Hierarchical rubric of questions and outcomes that organizes how | ||
| ordinance features are extracted and validated. | ||
|
|
||
| extraction pipeline | ||
| Crawlers, parsers, and feature detectors that transform raw | ||
| ordinance text into structured records. | ||
|
|
||
| jurisdiction | ||
| County or municipality defined in the jurisdiction CSV that | ||
| frames the geographic scope of an analysis run. | ||
|
|
||
| jurisdiction CSV | ||
| Input spreadsheet whose ``County`` and ``State`` columns list the | ||
| locations processed in a run. | ||
|
|
||
| location | ||
| Combination of county and state identifiers that maps to one row | ||
| in the jurisdiction CSV and produces a single output bundle. | ||
|
|
||
| location file log | ||
| Per-location structured log that aggregates runtime diagnostics | ||
| and JSON exception summaries. | ||
|
|
||
| location manifest | ||
| JSON metadata file emitted per location summarizing source | ||
| documents, extraction status, and validation outcomes. | ||
|
|
||
| log directory | ||
| Folder defined by ``log_dir`` that stores run-level logs, prompt | ||
| archives, and timing summaries. | ||
|
|
||
| llm cost tracker | ||
| Runtime utility that multiplies token usage by configured pricing | ||
| to report estimated spend per model. | ||
|
|
||
| llm service | ||
| Abstraction over providers such as OpenAI or Azure OpenAI that | ||
| enforces authentication, rate limits, and retry policies. | ||
|
|
||
| llm service rate limit | ||
| Configuration value that caps tokens per minute for a model to | ||
| avoid provider throttling. | ||
|
|
||
| llm task | ||
| Logical label assigned to prompt templates that maps to a specific | ||
| model entry within the configuration. | ||
|
|
||
| ordinance | ||
| Legal text that governs energy infrastructure within a | ||
| jurisdiction and feeds the extraction workflows. | ||
|
|
||
| ordinance document | ||
| Source PDF or HTML retrieved during crawling that contains the | ||
| legal language for the targeted technology. | ||
|
|
||
| ordinance file directory | ||
| Folder defined by ``ordinance_file_dir`` that caches downloaded | ||
| ordinance PDFs and HTML files. | ||
|
|
||
| out directory | ||
| Root folder defined by ``out_dir`` where structured results, | ||
| cleaned text, and logs for each run are written. | ||
|
|
||
| ``pytesseract`` | ||
| Python wrapper for the Tesseract OCR engine used to enable text | ||
| extraction from scanned ordinance documents. | ||
|
|
||
| rate limiter | ||
| Token-based throttle that keeps LLM requests within provider | ||
| quotas while maximizing throughput. | ||
|
|
||
| structured record | ||
| Tabular representation of ordinance features, thresholds, and | ||
| metadata exported for downstream analysis. | ||
|
|
||
| technology | ||
| ``tech`` configuration key that defines the target infrastructure | ||
| domain, such as solar or wind. | ||
|
|
||
| text splitter | ||
| Utility that chunks ordinance text into overlapping segments sized | ||
| for LLM context windows. | ||
|
|
||
| validation pipeline | ||
| Post-processing stage that verifies extracted features, resolves | ||
| conflicts, and confirms location metadata. | ||
|
|
||
| web search | ||
| Search-and-crawl phase that discovers ordinance links using | ||
| providers such as Tavily, DuckDuckGo Search, or custom engines. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.