Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to consistently document source versions #1542

Open
matentzn opened this issue Jan 10, 2025 · 0 comments
Open

Add a way to consistently document source versions #1542

matentzn opened this issue Jan 10, 2025 · 0 comments

Comments

@matentzn
Copy link

I would like to slowly promote adopting a strong provenance model for sources in biolink. This issue is the first in a sequence.

We currently have

https://biolink.github.io/biolink-model/primary_knowledge_source/

which is described using an infores identifier.

I am inclined to keep it that way - this is very useful, and breaking this to be an object instead like:

primary_knowledge_source: {id: infores:123, version: 2024-01-01} would break a lot of good work out there. But I am also not opposed to considering this - I am more interested that the KGX format does not break then the model.

And alternative is a second slot, like primary_knowledge_source_version which is conceptually not so nice as it is disconnected from primary_knowledge_source, but it would do the trick.

Any other ideas?

from @sierra-moxon:

we don't have a standard in place to do this consistently, but we can add one.
check out biolink:DatasetVersion https://biolink.github.io/biolink-model/DatasetVersion/ as a starting point. Probably to do this "right" though, we'd need to link up *knoweldge_source with this object instead or in addition to denormalizing it on the resulting edge itself.
Or we need to disambiguate the use of primary_knowledge_source separate from other kinds of providence metadata (e.g. if we only every use primary_knowledge_source as a shorthand, we don't need to worry about grouping it in an object with other related metadata and instead just add something like, or DatasetVersion itself, in addition to primary_knowledge_source without a formal connection between the two).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant