-
Notifications
You must be signed in to change notification settings - Fork 212
Description
Problem
Claude struggles with specific DWARF questions like "What attributes can exist on a DW_TAG_subprogram DIE?" because:
- The DWARF standard PDFs are too large for context
- Claude does 8-12 web searches, often reading LLVM/libdwarf source as fallback
- Even with web search, answers may be incomplete
Proposed Solution
Create a standalone tool that extracts structured DWARF specification data from PDFs into queryable JSON files. This would enable instant, authoritative answers to structural DWARF questions without web search.
Deliverables
-
JSON Schema Files (
data/dwarf{3,4,5}.json)- Tag encodings (DW_TAG_* → hex code)
- Attribute encodings (DW_AT_* → hex code + classes)
- Tag-attribute mappings (which attributes are valid for each tag, from Appendix A)
- Operation encodings (DW_OP_* → hex code + operands)
-
Extraction Script (
scripts/extract_dwarf_spec.py)- Uses
pdftotextto extract text - Regex parsing of structured tables in Chapter 7 and Appendix A
- Uses
-
Query Script (
scripts/query_dwarf.py)- CLI interface for lookups
- Example:
python query_dwarf.py DW_TAG_subprogramreturns applicable attributes
Example JSON Schema
{
"version": "5",
"tags": {
"DW_TAG_subprogram": {
"code": "0x2e",
"applicable_attributes": ["DW_AT_name", "DW_AT_type", "DW_AT_low_pc", ...]
}
},
"attributes": {
"DW_AT_name": {
"code": "0x03",
"classes": ["string"]
}
}
}Complexity Assessment
| Data | Source | Difficulty |
|---|---|---|
| Tag codes | Ch 7 table | Easy - clean regex |
| Attr codes | Ch 7 table | Easy - clean regex |
| Op codes | Ch 7 table | Easy - clean regex |
| Tag→Attrs | Appendix A | Medium - multi-line tables |
Size Estimate
~1,500-2,000 lines total (scripts + JSON), compared to 20K+ lines of full markdown extraction. This provides programmatic access to the most-queried structural data.
Success Criteria
python query_dwarf.py DW_TAG_subprogramreturns correct attribute list- JSON files are valid and parseable
- Skill can answer "What attrs can X have?" without web search
This enhancement complements the lean skill approach by adding structured, queryable spec data rather than prose.
🤖 Generated with Claude Code