Skip to content

ClinVar Card #20

@SSU02

Description

@SSU02

Description of database

ClinVar is a public archive of reports of the relationships among human variations and phenotypes, with supporting evidence.

Access (API or download)

https://ftp.ncbi.nih.gov/pub/clinvar/vcf_GRCh38/archive_2.0/2018/clinvar_20180701.vcf.gz
https://www.ncbi.nlm.nih.gov/clinvar/docs/api_http/
Also from GCP: bigquery-public-data.human_variant_annotation.ncbi_clinvar_hg38_20180701
Also like the authors of the huggingface space ESM-1b did:
variant_summary
https://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/variant_summary.txt.gz

Data Type

Variant pathogenicity (annotation)

CLNDISDBINCL: MedGen, OMIM and Orphanet codes
CLNHGVS: HCVS mutation code
CLNVC: type of mutation (e.g. Del, Insertion, SNP)
CLNSIGINCL: clinical interpretation (i.e. label) options are Pathgenic, Likely Pathogenic, Benign, Likely benign, Unknown, Not Accessed
CLNSIGCONF: conflicting clinical significance
CLNVI : the variant's clinical sources reported as tag-value pairs of database and variant identifier (e.g. UNIPROT)
GENEINFO: Gene(s) for the variant reported as gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)
MC: comma separated list of molecular consequence in the form of Sequence Ontology ID|molecular_consequence (e.g. missense)

Target metric

CLNSIGINCL: label of pathogenicity

Tried extracting some data

1)SELECT *
FROM bigquery-public-data.human_variant_annotation.ncbi_clinvar_hg38_20180701
WHERE contains_substr(MC, 'missense_variant')
AND contains_substr(CLNSIGINCL, 'athogenic')
Retrieved 510 rows

2)Went through the data in clinvar.csv from huggingface esm1b project
Data contains file_ID, variant, Clinical significance and allele ID (988571, 5) as shown in attached screenshot

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions