Table-based storage for BMRB's NMR-STAR 3.x.
This code is tied to BMRB's NMR-STAR data model and dictionary and is probably of limited utility to users outside of BMRB. You have been warned.
The tables are relational, sqlite3 and PostgreSQL (psycopg2 with a bit of editing)
are supported.
The code is pure python, main components are
- database loader (
parser.py), - pretty printer (
unparser.py), - NMR-STAR data access classes (
entry.pyandstartable.py) - NMR-STAR dictionary wrapper (
stardict.py) - and a poor man's DB abstraction layer (
db.py)
The format of NMR-STAR 3 (and PDB's mmCIF) tag names is _table.column, that is: underscore -
table name (aka tag category) - dot - column name (aka tag name). The mapping from NMR-STAR/mmCIF to relational
tables is straightforward except for the gotchas:
-
Because some of the names are SQL reserved words, this library double-quotes them all and makes them case-sensitive as a side-effect.
-
NMR-STAR uses "saveframe" block and has several special tags and rules to maintain saveframe information in the relational tables:
Sf_framecodetags contain the name of the parent saveframe (saveframe names must be unique within the entry),Sf_categorytags contain the category, or type, of the enclosing saveframe,- "local ID" tags, typically named
ID, contain the number of the saveframe of a given type within the entry. The(Sf_category, ID)tuple must be unique within the entry. Entry_IDtags contain entry ID.(Entry_ID, Sf_category, ID)is the databse-global unique key for the saveframe. Every data table in the saveframe has a corresponding foreign key tuple that links it to its saveframe.- Last but not least, there is a convenience key:
Sf_IDthat is autoincremented insteger, unique per saveframe accross the entire database with multiple entries. It is regenerated on database reload,Sf_IDtags never appear in the NMR-STAR files.
This code creates one additiona table (see parser.py):
entry_saverames (category text, entryid text, sfid integer, name text, line integer)
It is needed to keep track of various housekeeping info, e.g. line numbers
for error reporting, auto-generated sfid primary keys, etc.
See test subdirectory for code examples.
Required:
BMRB SAS parser and an NMR-STAR dictionary. They are both on GitHub, but the sqlite3 database version of the dictionary is not. Contact us for the latest and greatest.
PyGreSQL (although it can be trivially changed to psycopg2, see db.py), v.5 recommended.