Variant to disease/phenotype predicates #1545

EvanDietzMorris · 2025-01-10T18:04:19Z

The model is generally lacking good ways to represent sequence variant to disease relationships. Gene associated with condition exists, but as far as I can tell there is no way to provide more granularity or specifics. Variant to disease association exists, but I don't think there are predicates for those associations yet.

This is a known issue, but it has become timely, as we (ROBOKOP) are currently working with colleagues at ClinGen to bring these kinds of edges into ROBOKOP (and for Translator), so I wanted to get the ball rolling again.

I personally think it would be nice to have two ways to represent these edges:

gene to disease/phenotype edges with qualifiers providing more granularity
variant to disease/phenotype edges

In the short term though, we are specifically interested in predicates like "is pathogenic for" for variant to disease edges.

As example would be:
CAID:CA115937 is pathogenic for MONDO:0016419
https://erepo.clinicalgenome.org/evrepo/ui/classification/4b6c7f5f-b13d-435d-bba1-0d501ef69489

Predicates like "is likely pathogenic for" or "may be pathogenic for" would also be helpful, representing the various Clinical Validity Classifications in ClinGen. I'm not sure if we'd also want ones for Benign associations, or if negation would be better for those.

@bpow @Vibhorgupta31

bpow · 2025-01-14T15:14:20Z

For additional context, the VariantToDiseaseAssociation class has, in its documentation page, an example predicate of "is pathogenic for", so at least someone else at some point thought that it as a good idea.

For the specific nomenclature of variant-to-disease/phenotype, I think that "pathogenic" is a good terminology to use. It's the long-standing terminology that multiple professional groups (American College of Medical Genetics and Genomics, Association for Molecular Pathology, ClinVar, ClinGen, etc.) use. We could potentially also have an expressly negated predicate is_benign_for to indicate that, not only is there not sufficient evidence for pathogenicity, but there is expressly evidence to refute pathogenicity. Alternatively, we could potentially address this with appropriate qualifiers. Current ACMG/AMP recommendations have a 5-valued set of categories (Benign, Likely Benign, Variant of Uncertain Significance, Likely Pathogenic, Pathogenic), but if that's too many additional predicates than we could probably address that sort of thing in qualifiers (but I'd be interested in suggested ways to map this specific terminology which has rather formal domain-specific definition to more-general qualifiers).

For gene-to-disease / gene-to-phenotype edges, I'd advocate for being careful about how we represent the predicates. "Gene is associated with disease" is often a true statement, but I often hear people colloquially say that a "gene causes a disease" or a "gene causes a phenotype", where in general, it is rather a variant allele of a gene which, through inactivation, reduced/increased activity, or novel effect can be said to be the cause of a disease or phenotype.

EvanDietzMorris added enhancement working-group/predicates working-group/ROBOKOP Not really a working group per se labels Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variant to disease/phenotype predicates #1545

Variant to disease/phenotype predicates #1545

EvanDietzMorris commented Jan 10, 2025

bpow commented Jan 14, 2025

Variant to disease/phenotype predicates #1545

Variant to disease/phenotype predicates #1545

Comments

EvanDietzMorris commented Jan 10, 2025

bpow commented Jan 14, 2025