`llm-generation-control`

Why This Repository Exists

Most RAG systems treat generation as an effect of retrieval.

They assume:

retrieval success → answer should be generated
hallucination is primarily a retrieval failure
generation behavior is opaque and uninspectable

These assumptions hold in demos and fail in systems.

This repository exists to isolate and study generation as a controlled system layer, independent of retrieval quality.

This is not a prompt-engineering repo. It is a policy-first examination of when a system should speak at all.

The Question

Given retrieved evidence, when should a system answer, hedge, or refuse — and how can that decision be made observable and auditable?

This repository investigates only:

how generation decisions are made
how those decisions are logged
how refusal and hedging emerge as correct outcomes

It does not attempt to prove that generation quality improves.

What This Repository Explicitly Does NOT Do

This system deliberately avoids:

Claims of improved answer correctness
Claims of hallucination elimination
Question intent inference or semantic parsing
Value-type detection (e.g. “this question requires a year”)
Ambiguity resolution within a single source
LLM-based self-grading or confidence scoring

If you are looking for a system that “answers better,” that is out of scope.

System Boundary

Base system

Built directly on agent-memory-systems.

Frozen components (unchanged)

Corpus
Chunking strategy
Embeddings
Retrieval logic
Planner / Executor split
Memory mechanisms

Only new surface area

➡️ Generation policy and arbitration

No other layer behavior is modified.

Generation Is a Decision, Not an Output

Every run produces exactly one generation decision:

ANSWER — evidence judged sufficient
HEDGE — evidence judged conflicting
REFUSE — evidence judged insufficient

There are no mixed states and no implicit fallbacks.

Generation text (if any) is strictly downstream of this decision.

Evidence Assessment (Minimal by Design)

Retrieved evidence is evaluated using deliberately minimal heuristics:

similarity thresholds
coarse coverage checks
cross-source conflict detection

The system does not:

infer question intent
reason about required answer types
interpret semantic ambiguity

These omissions are intentional architectural limits, not oversights.

Generation Policy (Explicit and Inspectable)

Generation policy is a pure mapping from evidence assessment to outcome:

Evidence Assessment	Generation Outcome
sufficient	ANSWER
conflicting	HEDGE
insufficient	REFUSE

The policy layer:

does not access retrieval artifacts
does not access memory
does not call an LLM

All decisions are logged as first-class events.

Observability (Logs, Not Learning)

This repository produces:

evidence assessment logs
generation policy decision logs
episodic records of generation outcomes

These artifacts exist to explain why a decision occurred.

They are not used to influence future behavior.

Logs are never treated as memory.

Failure-First Test Cases (Declared, Not “Fixed”)

This repository includes explicit failure-first tests designed to break naïve RAG systems.

Some tests are expected to fail by design.

Declared Architectural Limits

The system is known to fail in cases where:

Evidence is topically relevant but lacks a required value → parametric knowledge may leak into generation
A binary question is dissolved by single-source ambiguity → evidence is sufficient but non-decisive

These failure cases are retained, not patched, and are documented as architectural boundaries.

They motivate future work without being silently corrected.

Relationship to Other Repositories

This repository enables:

rag-failure-modes — systematic failure taxonomy using generation decisions as evidence
llm-observability-logs — cross-layer causal tracing over time

It does not attempt synthesis or mitigation.

Reproducibility & Audit Notes

All claims in this README correspond to:

logged evidence assessments
logged generation decisions
explicit failure-first test cases

No claim relies on subjective answer quality. No claim relies on LLM self-evaluation.

If a behavior cannot be pointed to in an artifact, it is not discussed.

Final Constraint

If a system cannot explain why it spoke, it does not understand what it said.

This repository exists to make generation explainable before it is impressive.

🔚 Architectural Closure

This repository completes the generation control layer in the agentic systems arc.

At this point, the system has:

explicit control over retrieval
explicit separation of planning and execution
explicit memory routing
explicit generation policy

No further capability can be meaningfully evaluated without failure synthesis and observability.

That work begins next with rag-failure-modes.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
artifacts/memory		artifacts/memory
data		data
decision		decision
evidence		evidence
executor		executor
generator		generator
llm_generation_control.egg-info		llm_generation_control.egg-info
llm_generation_control		llm_generation_control
logs		logs
memory		memory
planner		planner
policies		policies
runtime		runtime
tests		tests
tools		tools
utils		utils
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`llm-generation-control`

Why This Repository Exists

The Question

What This Repository Explicitly Does NOT Do

System Boundary

Generation Is a Decision, Not an Output

Evidence Assessment (Minimal by Design)

Generation Policy (Explicit and Inspectable)

Observability (Logs, Not Learning)

Failure-First Test Cases (Declared, Not “Fixed”)

Declared Architectural Limits

Relationship to Other Repositories

Reproducibility & Audit Notes

Final Constraint

🔚 Architectural Closure

About

Uh oh!

Releases

Packages

Languages

Arnav-Ajay/llm-generation-control

Folders and files

Latest commit

History

Repository files navigation

llm-generation-control

Why This Repository Exists

The Question

What This Repository Explicitly Does NOT Do

System Boundary

Generation Is a Decision, Not an Output

Evidence Assessment (Minimal by Design)

Generation Policy (Explicit and Inspectable)

Observability (Logs, Not Learning)

Failure-First Test Cases (Declared, Not “Fixed”)

Declared Architectural Limits

Relationship to Other Repositories

Reproducibility & Audit Notes

Final Constraint

🔚 Architectural Closure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`llm-generation-control`

Packages