Märk upp belopp och avgifter i SFS som data-taggar#56
Open
Conversation
…centages Implements a function to identify and wrap Swedish currency amounts (kronor, kr, SEK) and percentages (%, procent) with semantic <data> tags containing: - id: context-aware slug (e.g., "avgift-1500-kr", "ranta-8-5-procent") - type: "amount" or "percentage" - value: normalized numeric value The function: - Handles Swedish number formats (space separators, decimal comma) - Supports multipliers (miljoner, miljarder, tusen) - Extracts context words for descriptive slugs - Skips markdown headers and XML/HTML tags - Includes 48 unit tests
Remove numeric value and unit from id attribute, keeping only the context-derived identifier (e.g., "avgift", "ranta", "moms"). This allows tracking the same data point across law amendments, since the id stays constant while only the value changes. Before: id="avgift-1500-kr" After: id="avgift"
Replace context-based slug generation with positional ids that can be
mapped to descriptive slugs via a reference table.
Changes:
- Generate positional ids like "kap5.2-belopp-1" based on section + type + position
- Add reference table support (data/amount-references.json) for custom slugs
- Section tags in the text automatically set the current section id
- Counters reset when entering a new section
This approach allows:
- Consistent ids across law amendments (same position = same id)
- Human/LLM curation of descriptive slugs like "riksbankens-referensranta"
- Tracking value changes over time using the stable id
Example with reference table:
{"kap5.2-belopp-1": "tillstandsavgift"}
Output:
<data id="tillstandsavgift" type="amount" value="1500">1 500 kronor</data>
Include SFS designation (e.g., "2024:123") in positional ids to enable:
- Unique identification across different laws
- Tracking value changes when same slug maps to multiple SFS versions
New id format: sfs-2024-123-kap5.2-belopp-1
Reference table now supports tracking changes over time:
{
"sfs-2020-100-kap5.2-belopp-1": "tillstandsavgift",
"sfs-2024-123-kap5.2-belopp-1": "tillstandsavgift"
}
Both resolve to id="tillstandsavgift" but with different values,
allowing comparison of the same data point across amendments.
Also extracts SFS id from <article selex:id="lag-2024-123"> tags.
Change positional id format from: sfs-2024-123-kap5.2-belopp-1 To: sfs-2024-123/kap5.2-belopp-1 The "/" creates clearer visual hierarchy: - Before slash: the law (SFS designation) - After slash: position within the document Added test for reference table slug resolution.
New function to find amounts/percentages that need slugs in the reference table. Returns list of dicts with: - positional_id: the id that needs mapping - type: "amount" or "percentage" - value: normalized numeric value - matched_text: original text matched - context: surrounding text for understanding Useful for batch curation of slugs with LLM assistance.
Add comprehensive reference table entries covering: - Socialtjänstlagen (2025:400) - sanktionsavgifter - Inkomstskattelagen (1999:1229) - basbelopp, avdrag, skattesatser - Socialförsäkringsbalken (2010:110) - sjukpenning, föräldrapenning - Brottsbalken (1962:700) - straffbestämmelser - Aktiebolagslagen (2005:551) - kapitalkrav - Räntelagen (1975:635) - referensränta - And 27 more Swedish laws This enables tracking of amount changes across law amendments using stable descriptive slugs like "prisbasbelopp", "referensranta", etc.
YAML supports inline comments, making it easier to document and organize the reference table with section headers and annotations. Changes: - Convert data/amount-references.json to data/amount-references.yaml - Update load_reference_table() to use yaml.safe_load() - Replace json import with yaml import
Each entry now includes an inline comment with: - The actual value (e.g., "57 300 kr", "80%") - A text excerpt showing context from the law Example: "sfs-1999-1229/kap2.1-belopp-1": prisbasbelopp # 57 300 kr - "prisbasbeloppet enligt 2 kap." This makes it easier to understand and verify each mapping.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements a function to identify and wrap Swedish currency amounts (kronor,
kr, SEK) and percentages (%, procent) with semantic tags containing:
The function: