diff --git a/README.md b/README.md index 26284d7..a61650e 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,14 @@ # Worldview -A compact notation format for encoding and maintaining conceptual worldviews over time, designed for LLM context persistence. +A constrained notation format for encoding conceptual worldviews, where the structure itself enforces what can and cannot be stored. ## Overview -The **Worldview format** (file extension: `.wvf`) is a declarative notation for storing beliefs, stances, and understanding about concepts. Unlike retrieval-augmented generation (RAG) which selectively includes information, Worldview documents are designed to remain *entirely in context* across all LLM interactions. +The **Worldview format** (file extension: `.wvf`) is a declarative notation for storing beliefs, stances, and understanding about concepts. Its primary value is structural constraint—the rigid hierarchy of Concepts, Facets, and Claims makes it impossible to encode inappropriate content types. -This solves a fundamental problem: LLMs have foundational beliefs and stances that should inform all reasoning, not just topically-matched queries. The Worldview format is dense enough that an entire belief system can remain in context permanently. +This solves a fundamental problem: when beliefs are stored in unstructured formats like Markdown, documents grow with repetitious statements, narrative tangents, and content that strays from the intended purpose. The Worldview format prevents this by requiring that every piece of information fit into a strict hierarchy—if something doesn't fit the structure, it doesn't belong. + +Token efficiency is a consequence, not the goal: when you can only store structured beliefs, documents stay focused and compact. ## Quick Start @@ -58,18 +60,14 @@ Concept (unindented, bare text) ### Brief Forms -Compact operators for common relationships: +Minimal operators for common relationships (less common relationships use natural language): | Symbol | Meaning | Example | |--------|---------|---------| | `=>` | causes, leads to | `power => corruption` | -| `<=` | caused by, results from | `trust <= consistency` | -| `<>` | mutual, bidirectional | `accountability <> trust` | -| `><` | tension, conflicts with | `efficiency >< thoroughness` | | `~` | similar to, resembles | `authority ~ influence` | | `=` | equivalent to, means | `formal = official` | -| `vs` | in contrast to | `asymmetric vs formation` | -| `//` | regardless of | `self-perpetuate // original purpose` | +| `vs` | contrasts with, in tension with | `efficiency vs thoroughness` | ### Modifiers @@ -203,21 +201,37 @@ uv run python -m evals --help The Worldview format follows five core principles: -1. **State over narrative**: Capture what is believed, not how it came to be believed -2. **Predictability allows omission**: If structure makes something inferable, don't write it +1. **Structure as enforcement**: The rigid hierarchy prevents inappropriate content—if it doesn't fit, it doesn't belong +2. **State over narrative**: Capture what is believed, not how it came to be believed 3. **Conflict tolerance**: Real worldviews contain tensions—hold them without forcing resolution -4. **Freeform vocabulary**: Structure is defined; content remains unconstrained -5. **LLM-native, human-inspectable**: Optimized for machine parsing while remaining readable +4. **Minimal notation**: Only universally intuitive symbols; natural language for uncommon relationships +5. **Freeform vocabulary**: Structure is defined; content remains unconstrained + +## LLM-Native Tokens + +The notation uses symbols that LLMs already understand intuitively—tokens whose semantics are well-established in training data: + +- `?` for uncertainty — LLMs reliably interpret `?` as questioning or tentative +- `!` for emphasis — strongly associated with assertion and importance +- `=>` for causation — arrow notation for "leads to" is universal +- `~` for similarity — mathematical approximation notation +- `@` for attribution — familiar from email and social media +- `&` for reference — established linking/joining semantics + +This isn't arbitrary shorthand—it's leveraging semantic associations that already exist in model weights. When an LLM sees `collapse?`, it understands uncertainty without needing explicit instruction. + +Symbols that *don't* have clear pre-existing semantics (like `><` for tension or `//` for "regardless") are avoided. If a relationship requires explanation to understand, natural language is clearer than a novel symbol. ## Why "Worldview"? The name emphasizes the core use case: encoding *how concepts are understood* rather than just storing facts. This is different from: +- **Markdown files**: No structural constraints; content drifts and duplicates - **Knowledge bases**: Store facts, not interpretations - **RAG systems**: Retrieve relevant content per query - **Memory systems**: Log events chronologically -Worldview documents capture the *lens* through which all subsequent reasoning should be filtered. +Worldview documents capture the *lens* through which all subsequent reasoning should be filtered—and the format's constraints ensure documents stay focused on that purpose. ## File Extension diff --git a/SPEC.md b/SPEC.md index 96d400e..20df35b 100644 --- a/SPEC.md +++ b/SPEC.md @@ -6,56 +6,53 @@ ## Abstract -The Worldview format is a compact, declarative notation for encoding and maintaining conceptual worldviews over time. It provides a structured format for storing beliefs, stances, and understanding about concepts—designed to be included in full within every interaction context rather than retrieved selectively. +The Worldview format is a constrained notation for encoding conceptual worldviews. Its primary purpose is to provide a rigid structure that enforces what can and cannot be stored—the syntax itself prevents drift into inappropriate content types. -The Worldview format is not a general-purpose communication language. It is a specialized format for preserving *how concepts are understood*, optimized for semantic density without sacrificing clarity. The notation is intended to be intuitive for large language models to parse, reason about, and autonomously maintain, while remaining human-inspectable. Beliefs are stored as structured claims with conditions and sources, enabling an LLM to hold persistent context about a user, domain, or system across extended interactions. +The format is not optimized for token efficiency as a primary goal; density is a *consequence* of the constraints. By requiring that all content fit into a strict hierarchy of Concepts, Facets, and Claims, the format naturally excludes narratives, logs, predictions, and other content that doesn't belong in a worldview document. --- ## Motivation -Large language models operate within fixed context windows. Existing approaches to persistent memory—such as retrieval-augmented generation (RAG)—selectively include information based on relevance to the current query. This works for factual lookup but fails for *worldview*: the foundational beliefs and stances that should inform all reasoning, not just topically-matched queries. +The problem Worldview solves is not context length—it's content discipline. -The Worldview format solves this by defining a notation dense enough that an entire belief system (potentially tens of thousands of tokens) can remain in context permanently. Rather than retrieving relevant memories, the LLM carries its complete understanding forward into every interaction. +When beliefs and stances are stored in unstructured formats like Markdown, documents tend to grow with repetitious statements, narrative tangents, and content that strays from the intended purpose. There is no enforcement of what belongs versus what doesn't. -The format prioritizes: -- **Semantic density** — Strip prose, keep meaning -- **Structural consistency** — Predictable hierarchy for reliable parsing -- **Evolutionary tracking** — Beliefs change; the notation accommodates revision -- **Autonomous maintenance** — The LLM updates the document without user intervention +The Worldview format solves this through structural constraint: + +1. **Every piece of information has a designated place** — Concepts organize by subject, Facets organize by aspect, Claims make assertions. If something doesn't fit this hierarchy, it doesn't belong. + +2. **The syntax prevents duplication** — Because content is categorized into a clear hierarchy, there's always one canonical location for any given belief. Adding a duplicate requires navigating to the same place, making redundancy obvious. + +3. **Inappropriate content is structurally excluded** — Predictions, evaluations, event logs, and narratives cannot be encoded in the Concept → Facet → Claim structure. The format only accepts statements of understanding. + +Token efficiency is a side effect: when you can only store structured beliefs, documents stay focused and compact. --- ## Design Principles -**State over narrative** -The Worldview format captures what is believed, not the story of how it came to be believed. History is preserved compactly when relevant, but the primary representation is current state. +**Structure as enforcement** +The rigid hierarchy of Concept → Facet → Claim is the primary mechanism for keeping documents focused. If content doesn't fit this structure, it doesn't belong. -**Predictability allows omission** -Borrowed from stenographic shorthand: if structure or context makes something inferable, don't write it. No articles, no copulas, no filler. +**State over narrative** +The format captures what is believed, not the story of how it came to be believed. History is preserved compactly when relevant, but the primary representation is current state. **Conflict tolerance** -Real worldviews contain tensions and contradictions. The Worldview format holds conflicting claims without forcing resolution. +Real worldviews contain tensions and contradictions. The format holds conflicting claims without forcing resolution. -**Freeform vocabulary** -No predefined concept names, facet labels, or claim terms. The notation defines structure and relationships; content remains unconstrained. +**Minimal notation** +Symbols are used sparingly—only where they are universally intuitive (`=>`, `~`, `=`, `vs`). Less common relationships use natural language within claims. -**LLM-native, human-inspectable** -Optimized for machine parsing and reasoning. Human readability is a secondary benefit, not a design constraint. +**Freeform vocabulary** +No predefined concept names, facet labels, or claim terms. The notation defines structure; content remains unconstrained. --- ## Inspirations -### Stenographic Shorthand -Systems like Gregg, Pitman, and Teeline informed the Worldview format's approach to density: -- **Omission of predictable elements** — Common words and inferable structure are dropped -- **Brief forms** — High-frequency relationships get compact symbols -- **Positional grammar** — Location within a line implies role -- **Affix modification** — Small markers inflect meaning - ### Belief Representation -The Worldview format draws on concepts from epistemology and knowledge representation: +The format draws on concepts from epistemology and knowledge representation: - Beliefs as claims with conditions (contextualism) - Sources as grounding for confidence (evidentialism) - Tolerance of contradiction (paraconsistent approaches) @@ -63,6 +60,30 @@ The Worldview format draws on concepts from epistemology and knowledge represent ### Configuration Languages The hierarchical structure echoes YAML and similar formats, using indentation for nesting while avoiding syntactic overhead like quotes and brackets. +### Constraint-Based Design +The format takes inspiration from systems where limitations enhance focus: structured data schemas, controlled vocabularies, and formats where the inability to express certain things is a feature rather than a limitation. + +--- + +## LLM-Native Tokens + +The notation deliberately uses symbols that LLMs already understand intuitively—tokens whose semantics are well-established in training data: + +| Symbol | Why It Works | +|--------|--------------| +| `?` | Universally associated with uncertainty and questioning | +| `!` | Strongly associated with emphasis and assertion | +| `=>` | Arrow notation for causation is universal across programming and logic | +| `~` | Mathematical approximation; resemblance | +| `@` | Attribution and source reference (email, social media, programming) | +| `&` | Joining and linking semantics | +| `^` | Upward direction/increase (caret, superscript) | +| `v` | Downward direction/decrease (shaped like down arrow) | + +This approach leverages semantic associations that already exist in model weights. When an LLM sees `collapse?`, it understands uncertainty without explicit instruction. + +Critically, symbols that lack clear pre-existing semantics are avoided. If a relationship requires explanation to understand, natural language is clearer than a novel symbol. This is why the format uses only four brief form operators (`=>`, `~`, `=`, `vs`) rather than a larger set that would require learning new meanings. + --- ## Structure @@ -134,26 +155,24 @@ Position implies role—no labels needed: ## Brief Forms -Common relationships use compact symbols: +A minimal set of universally intuitive symbols for common relationships: | Symbol | Meaning | |--------|---------| | `=>` | causes, leads to | -| `<=` | caused by, results from | -| `<>` | mutual, bidirectional | -| `><` | tension, conflicts with | | `~` | similar to, resembles | | `=` | equivalent to, means | -| `vs` | in contrast to | -| `//` | regardless of | +| `vs` | contrasts with, in tension with | + +Less common relationships should use natural language within claims rather than forcing additional symbols. ### Examples ``` - power => corruption | unchecked -- trust <= consistency | over time -- efficiency >< thoroughness - formal-authority ~ informal-influence +- efficiency vs thoroughness +- mutual accountability with trust ``` --- @@ -242,7 +261,7 @@ Power - concentration^ => abuse^ @historical-pattern .institutional - self-preserving - - accountability <> trust &Trust.institutional + - mutual accountability with trust &Trust.institutional - diffusion => dilution-of-responsibility Trust @@ -282,7 +301,7 @@ Institutions - coordinate action @game-theory .dysfunction - ossify | over time - - self-perpetuate // original purpose + - self-perpetuates despite original purpose - capture-by-interests^ @public-choice-theory ``` @@ -290,13 +309,14 @@ Institutions ## Non-Goals -The Worldview format explicitly does not attempt to: +The format explicitly does not attempt to: - **Prove logical consistency** — Contradictions are permitted - **Enforce ontology** — No required categories or hierarchies beyond structure -- **Replace natural language** — The Worldview format is for belief state, not communication +- **Replace natural language** — The format is for belief state, not communication - **Assert objective truth** — Claims represent understanding, not facts - **Store predictions, evaluations, or identity** — These are derived from beliefs, not stored directly +- **Maximize symbol density** — Notation is minimal; natural language is preferred for uncommon relationships --- @@ -312,7 +332,7 @@ The Worldview format explicitly does not attempt to: ## Summary -The Worldview format is a notation for meaning, not conversation. It exists to preserve how concepts are understood—compactly enough to remain always in context, structured enough to reason about reliably, and flexible enough to evolve as understanding changes. +The Worldview format is a constrained notation for meaning, not conversation. Its rigid structure enforces that only appropriate content is stored—the syntax itself prevents documents from drifting into narratives, logs, or other inappropriate content types. The format encodes: - **What** is believed (claims) @@ -321,10 +341,10 @@ The format encodes: - **How** beliefs connect (references) - **That** beliefs change (evolution markers) -It deliberately omits: -- Prose and filler -- Explicit confidence scores (derived from conditions and sources) -- Detailed history (supersession markers suffice) -- Evaluative or predictive statements (derived at runtime) +It structurally excludes: +- Narratives and prose +- Event logs and timelines +- Predictions and evaluations +- Duplicate information (hierarchy makes canonical location obvious) -The Worldview format is designed to be carried forward—a persistent lens through which all subsequent reasoning is filtered. +The format's value is constraint: by limiting what can be expressed, it keeps documents focused on their intended purpose—a persistent lens through which all subsequent reasoning is filtered. diff --git a/cli/Cargo.lock b/cli/Cargo.lock index 122c39d..2be0966 100644 --- a/cli/Cargo.lock +++ b/cli/Cargo.lock @@ -247,7 +247,7 @@ checksum = "c3e64b0cc0439b12df2fa678eae89a1c56a529fd067a9115f7827f1fffd22b32" [[package]] name = "codey" -version = "0.1.0-rc.6" +version = "0.1.0-rc.7" dependencies = [ "anyhow", "async-stream", diff --git a/example.wvf b/example.wvf index b36f997..87f6321 100644 --- a/example.wvf +++ b/example.wvf @@ -5,7 +5,7 @@ Power - concentration^ => abuse^ @historical-pattern .institutional - self-preserving - - accountability <> trust &Trust.institutional + - mutual accountability with trust &Trust.institutional - diffusion => dilution-of-responsibility Trust @@ -46,7 +46,7 @@ Institutions - coordinate action @game-theory .dysfunction - ossify | over time - - self-perpetuate // original purpose + - self-perpetuates despite original purpose - capture-by-interests^ @public-choice-theory Python-development diff --git a/spec/generate.py b/spec/generate.py index 44e3a09..bce09ac 100644 --- a/spec/generate.py +++ b/spec/generate.py @@ -74,7 +74,7 @@ def generate_language_spec(tokens: dict) -> str: # Brief forms output.append("### Brief Forms\n") - output.append("Compact operators for common relationships:\n") + output.append("Minimal operators for common relationships (less common relationships use natural language):\n") output.append("| Symbol | Meaning | Example |") output.append("|--------|---------|---------|") for bf in tokens["brief_forms"]: @@ -346,7 +346,7 @@ def generate_markdown_tables(tokens: dict) -> str: # Brief forms table output.append("## Brief Forms\n") - output.append("Common relationships use compact symbols:\n") + output.append("Minimal operators for common relationships (less common relationships use natural language):\n") output.append("| Symbol | Meaning | Example |") output.append("|--------|---------|---------|") for bf in tokens["brief_forms"]: diff --git a/spec/grammar.pest b/spec/grammar.pest index 399024f..dd554d3 100644 --- a/spec/grammar.pest +++ b/spec/grammar.pest @@ -71,16 +71,13 @@ reference_target = @{ identifier ~ "." ~ identifier } /// Brief form: compact relationship notation /// Two operands connected by an operator +/// Only universally intuitive operators are supported; use natural language for others brief_form = { operand ~ " "? ~ operator ~ " "? ~ operand } -/// Brief form operators (ordered by length for proper matching) +/// Brief form operators (minimal set, ordered by length for proper matching) operator = @{ "=>" | // causes, leads to - "<=" | // caused by, results from (NOTE: not inside [...]) - "<>" | // mutual, bidirectional - "><" | // tension, conflicts with - "//" | // regardless of - "vs" | // in contrast to + "vs" | // contrasts with, in tension with "~" | // similar to, resembles "=" // equivalent to, means } diff --git a/spec/tokens.yaml b/spec/tokens.yaml index 9b9c17c..0727c40 100644 --- a/spec/tokens.yaml +++ b/spec/tokens.yaml @@ -47,23 +47,13 @@ inline_elements: example: "&Trust.formation" # Brief forms - compact relationship operators +# Intentionally minimal: only the most universally intuitive symbols are included. +# Less common relationships should use natural language within claims. brief_forms: - symbol: "=>" meaning: "causes, leads to" example: "power => corruption" - - symbol: "<=" - meaning: "caused by, results from" - example: "trust <= consistency" - - - symbol: "<>" - meaning: "mutual, bidirectional" - example: "accountability <> trust" - - - symbol: "><" - meaning: "tension, conflicts with" - example: "efficiency >< thoroughness" - - symbol: "~" meaning: "similar to, resembles" example: "authority ~ influence" @@ -73,12 +63,8 @@ brief_forms: example: "formal = official" - symbol: "vs" - meaning: "in contrast to" - example: "asymmetric vs formation" - - - symbol: "//" - meaning: "regardless of" - example: "self-perpetuate // original purpose" + meaning: "contrasts with, in tension with" + example: "efficiency vs thoroughness" # Modifiers - suffix markers that inflect meaning modifiers: diff --git a/system.md b/system.md index 545c7bc..5eede98 100644 --- a/system.md +++ b/system.md @@ -20,13 +20,9 @@ Every concept has facets. Every facet has claims. Claims may include conditions, | `@` | source (basis for belief) | `@historical-pattern` | | `&` | reference (links to other concept.facet) | `&Trust.formation` | | `=>` | causes, leads to | `power => corruption` | -| `<=` | caused by, results from | `trust <= consistency` | -| `<>` | mutual, bidirectional | `accountability <> trust` | -| `><` | tension, conflicts with | `efficiency >< thoroughness` | | `~` | similar to, resembles | `authority ~ influence` | | `=` | equivalent to, means | `formal = official` | -| `vs` | in contrast to | `asymmetric vs formation` | -| `//` | regardless of | `self-perpetuate // original purpose` | +| `vs` | contrasts with, in tension with | `efficiency vs thoroughness` | | `^` | increasing, trending up | `concentration^` | | `v` | decreasing, trending down | `trust v` | | `!` | strong, emphatic, high confidence | `fast !` | diff --git a/validator/src/lib.rs b/validator/src/lib.rs index aa04bb1..3270141 100644 --- a/validator/src/lib.rs +++ b/validator/src/lib.rs @@ -424,26 +424,22 @@ fn extract_brief_forms(text: &str) -> Vec { // Check for each brief form operator // Order matters: check longer operators first to avoid partial matches - let operators_by_length: &[&str] = &["=>", "<=", "<>", "><", "//", "vs", "~", "="]; + // Minimal set: =>, vs, ~, = + let operators_by_length: &[&str] = &["=>", "vs", "~", "="]; let remaining = text.to_string(); for &op in operators_by_length { - // Skip <= if it's part of [<= (evolution marker) - if op == "<=" && remaining.contains("[<=") { - continue; - } - - // Special handling for = to avoid matching => or <= + // Special handling for = to avoid matching => if op == "=" { - // Look for standalone = not part of => or <= + // Look for standalone = not part of => let mut i = 0; let chars: Vec = remaining.chars().collect(); while i < chars.len() { if chars[i] == '=' { let prev = if i > 0 { Some(chars[i - 1]) } else { None }; let next = chars.get(i + 1); - // Check it's not part of =>, <=, or <> + // Check it's not part of => if prev != Some('<') && prev != Some('>') && next != Some(&'>') { // Found standalone = let before: String = chars[..i].iter().collect(); @@ -953,57 +949,6 @@ Trust } } - #[test] - fn test_brief_form_caused_by() { - let input = r#"Trust - .formation - - trust <= consistency"#; - - let result = validate(input); - assert!(result.is_valid(), "Expected valid: {:?}", result.errors); - - if let Some(line) = result.lines.iter().find(|l| matches!(l.line_type, LineType::Claim(_))) { - if let LineType::Claim(claim) = &line.line_type { - let bf = claim.brief_forms.iter().find(|b| b.operator == "<="); - assert!(bf.is_some(), "Expected <= operator"); - } - } - } - - #[test] - fn test_brief_form_mutual() { - let input = r#"Trust - .institutional - - accountability <> trust"#; - - let result = validate(input); - assert!(result.is_valid(), "Expected valid: {:?}", result.errors); - - if let Some(line) = result.lines.iter().find(|l| matches!(l.line_type, LineType::Claim(_))) { - if let LineType::Claim(claim) = &line.line_type { - let bf = claim.brief_forms.iter().find(|b| b.operator == "<>"); - assert!(bf.is_some(), "Expected <> operator"); - } - } - } - - #[test] - fn test_brief_form_tension() { - let input = r#"Work - .balance - - efficiency >< thoroughness"#; - - let result = validate(input); - assert!(result.is_valid(), "Expected valid: {:?}", result.errors); - - if let Some(line) = result.lines.iter().find(|l| matches!(l.line_type, LineType::Claim(_))) { - if let LineType::Claim(claim) = &line.line_type { - let bf = claim.brief_forms.iter().find(|b| b.operator == "><"); - assert!(bf.is_some(), "Expected >< operator"); - } - } - } - #[test] fn test_brief_form_similar() { let input = r#"Authority @@ -1038,23 +983,6 @@ Trust } } - #[test] - fn test_brief_form_regardless() { - let input = r#"Institutions - .dysfunction - - self-perpetuate // original purpose"#; - - let result = validate(input); - assert!(result.is_valid(), "Expected valid: {:?}", result.errors); - - if let Some(line) = result.lines.iter().find(|l| matches!(l.line_type, LineType::Claim(_))) { - if let LineType::Claim(claim) = &line.line_type { - let bf = claim.brief_forms.iter().find(|b| b.operator == "//"); - assert!(bf.is_some(), "Expected // operator"); - } - } - } - #[test] fn test_brief_form_missing_left_operand() { let input = r#"Power @@ -1262,7 +1190,7 @@ Trust - concentration^ => abuse^ @historical-pattern .institutional - self-preserving - - accountability <> trust &Trust.institutional + - mutual accountability with trust &Trust.institutional - diffusion => dilution-of-responsibility Trust @@ -1302,7 +1230,7 @@ Institutions - coordinate action @game-theory .dysfunction - ossify | over time - - self-perpetuate // original purpose + - self-perpetuates despite original purpose - capture-by-interests^ @public-choice-theory"#; let result = validate(input);