-
Notifications
You must be signed in to change notification settings - Fork 29
May 10, 2020
The Human Disease Mechanism KG benchmarks were originally built and stored using Google Cloud Platform (GCP) resources (for details and a complete description of this process, see here). As of late 2023, we have moved the KG builds to Zenodo. While the original GCP resources contained all associated files (i.e., all data used and processed to create the KGs), due to the file size upload limits associated with each archive, we have limited the archived data to KGs output, associated metadata, and log files. The list of resources used to build each KG, including their URLs, and date of download, can all be found in the associated logs. Details on how to access these files are provided below.
Resources
- Associated GitHub Release: https://github.com/callahantiff/PheKnowLator/releases/tag/v2.0.0
- DockerHub Build: https://hub.docker.com/repository/docker/callahantiff/pheknowlator
- Build Data Sources: https://github.com/callahantiff/PheKnowLator/wiki/v2-Data-Sources
KG Benchmark Builds can also be obtained from Zenodo
- Class Builds
- Standard Relations
- Inverse Relations
- Instance
- Standard Relations
- Inverse Relations
🗂 For additional information on the KG file types please see the following Wiki page, which is also available as a download (here).
Required Input Documents
See here for a detailed descriptions of these resources.
resource_info.txt
edge_source_list.txt
ontology_source_list.txt
Required Curated Data
Curated data sources are manually created and were designed to support the build. See the Data_Preparation.ipynb for a detailed descriptions of these resources.
genomic_sequence_ontology_mappings.xlsx
zooma_tissue_cell_mapping_04JAN2020.xlsx
OWL_NETS_Property_Types.txt
Build Metadata
The metadata documentation provides details on each downloaded resource including the URL, date of download, and file size. The resources listed in these documents should align to the similarly names files listed in the Required Input Documents section.
edge_source_metadata.txt
ontology_source_metadata.txt
Build Logs
The build logs provide detailed information on each step of the build process as well as statistics on the resulting KG builds.
*_Stats.txt
*_Stats_Terminal_Output.txt
Knowledge Graph Output
🚨 Scroll to the right 👉 to see all of the available data 🚨
For additional information on the KG file types please see the following Wiki page.
Instance-based Build | |||
---|---|---|---|
Standard Relations | Inverse Relations | ||
OWL | OWL-NETS | OWL | OWL-NETS |
Master_Edge_List_Dict.json PheKnowLator_Master_Node_Edge_List_Dict.json subclass_map_missing_node_log.json PheKnowLator_v2.0.0_full_relationsOnly_OWL.owl PheKnowLator_v2.0.0_full_Instance_relationsOnly_OWL_NetworkxMultiDiGraph.gpickle PheKnowLator_v2.0.0_full_Instance_relationsOnly_OWL_NodeLabels.txt PheKnowLator_v2.0.0_full_Instance_relationsOnly_OWL_Triples_Identifiers.txt PheKnowLator_v2.0.0_full_Instance_relationsOnly_OWL_Triples_Integer_Identifier_Map.json PheKnowLator_v2.0.0_full_Instance_relationsOnly_OWL_Triples_Integers.txt |
Master_Edge_List_Dict.json PheKnowLator_Master_Node_Edge_List_Dict.json subclass_map_missing_node_log.json PheKnowLator_v2.0.0_full_Instance_relationsOnly_OWL.owl PheKnowLator_v2.0.0_full_Instance_relationsOnly_noOWL_NodeLabels.txt PheKnowLator_v2.0.0_full_Instance_relationsOnly_noOWL_OWLNETS.nt PheKnowLator_v2.0.0_full_Instance_relationsOnly_noOWL_OWLNETS_NetworkxMultiDiGraph.gpickle PheKnowLator_v2.0.0_full_Instance_relationsOnly_noOWL_Triples_Identifiers.txt PheKnowLator_v2.0.0_full_Instance_relationsOnly_noOWL_Triples_Integer_Identifier_Map.json PheKnowLator_v2.0.0_full_Instance_relationsOnly_noOWL_Triples_Integers.txt |
Master_Edge_List_Dict.json PheKnowLator_Master_Node_Edge_List_Dict.json subclass_map_missing_node_log.json PheKnowLator_v2.0.0_full_inverseRelations_OWL.owl PheKnowLator_v2.0.0_full_Instance_inverseRelations_OWL_NetworkxMultiDiGraph.gpickle PheKnowLator_v2.0.0_full_Instance_inverseRelations_OWL_NodeLabels.txt PheKnowLator_v2.0.0_full_Instance_inverseRelations_OWL_Triples_Identifiers.txt PheKnowLator_v2.0.0_full_Instance_inverseRelations_OWL_Triples_Integer_Identifier_Map.json PheKnowLator_v2.0.0_full_Instance_inverseRelations_OWL_Triples_Integers.txt |
Master_Edge_List_Dict.json PheKnowLator_Master_Node_Edge_List_Dict.json subclass_map_missing_node_log.json PheKnowLator_v2.0.0_full_Instance_inverseRelations_OWL.owl PheKnowLator_v2.0.0_full_Instance_inverseRelations_noOWL_NodeLabels.txt PheKnowLator_v2.0.0_full_Instance_inverseRelations_noOWL_OWLNETS.nt PheKnowLator_v2.0.0_full_Instance_inverseRelations_noOWL_OWLNETS_NetworkxMultiDiGraph.gpickle PheKnowLator_v2.0.0_full_Instance_inverseRelations_noOWL_Triples_Identifiers.txt PheKnowLator_v2.0.0_full_Instance_inverseRelations_noOWL_Triples_Integer_Identifier_Map.json PheKnowLator_v2.0.0_full_Instance_inverseRelations_noOWL_Triples_Integers.txt |
Class-based Build | |||
Standard Relations | Inverse Relations | ||
OWL | OWL-NETS | OWL | OWL-NETS |
Master_Edge_List_Dict.json PheKnowLator_Master_Node_Edge_List_Dict.json subclass_map_missing_node_log.json PheKnowLator_v2.0.0_full_relationsOnly_OWL.owl PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL_NetworkxMultiDiGraph.gpickle PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL_NodeLabels.txt PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL_Triples_Identifiers.txt PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL_Triples_Integer_Identifier_Map.json PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL_Triples_Integers.txt |
Master_Edge_List_Dict.json PheKnowLator_Master_Node_Edge_List_Dict.json subclass_map_missing_node_log.json PheKnowLator_v2.0.0_full_subclass_relationsOnly_OWL.owl PheKnowLator_v2.0.0_full_subclass_relationsOnly_noOWL_NodeLabels.txt PheKnowLator_v2.0.0_full_subclass_relationsOnly_noOWL_OWLNETS.nt PheKnowLator_v2.0.0_full_subclass_relationsOnly_noOWL_OWLNETS_NetworkxMultiDiGraph.gpickle PheKnowLator_v2.0.0_full_subclass_relationsOnly_noOWL_Triples_Identifiers.txt PheKnowLator_v2.0.0_full_subclass_relationsOnly_noOWL_Triples_Integer_Identifier_Map.json PheKnowLator_v2.0.0_full_subclass_relationsOnly_noOWL_Triples_Integers.txt |
Master_Edge_List_Dict.json PheKnowLator_Master_Node_Edge_List_Dict.json subclass_map_missing_node_log.json PheKnowLator_v2.0.0_full_inverseRelations_OWL.owl PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL_NetworkxMultiDiGraph.gpickle PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL_NodeLabels.txt PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL_Triples_Identifiers.txt PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL_Triples_Integer_Identifier_Map.json PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL_Triples_Integers.txt |
Master_Edge_List_Dict.json PheKnowLator_Master_Node_Edge_List_Dict.json subclass_map_missing_node_log.json PheKnowLator_v2.0.0_full_subclass_inverseRelations_OWL.owl PheKnowLator_v2.0.0_full_subclass_inverseRelations_noOWL_NodeLabels.txt PheKnowLator_v2.0.0_full_subclass_inverseRelations_noOWL_OWLNETS.nt PheKnowLator_v2.0.0_full_subclass_inverseRelations_noOWL_OWLNETS_NetworkxMultiDiGraph.gpickle PheKnowLator_v2.0.0_full_subclass_inverseRelations_noOWL_Triples_Identifiers.txt PheKnowLator_v2.0.0_full_subclass_inverseRelations_noOWL_Triples_Integer_Identifier_Map.json PheKnowLator_v2.0.0_full_subclass_inverseRelations_noOWL_Triples_Integers.txt |
We provide several different types of output, each of which is described briefly below. Please note that in order to create the logic (XXXX_OWL_LogicOnly.nt
) and annotation (XXXX_OWL_AnnotationsOnly.nt
) subsets of each graph and be able to combine them (XXXX_OWL.nt
) we have added a namespace to all BNode
or anonymous nodes. More specifically, there are two kinds of pkt
namespaces you will find within these files:
-
https://github.com/callahantiff/PheKnowLator/pkt/
. This namespace is used for all non-ontology data definedowl:Class
andowl:NamedIndividual
objects that are added in order to integrate non-ontological entities (see here for more information). -
https://github.com/callahantiff/PheKnowLator/pkt/bnode/
. This namespace is used for all existingBNode
or anonymous nodes and is applied to these types of entities prior to subsetting an input graph.
To remove the second type of namespacing from BNode
that are part of the original ontologies used in each build, you can run the code shown below:
from pkt.utils import removes_namespace_from_bnodes
# remove bnode namespaces
updated_graph = removes_namespace_from_bnodes(org_graph)
Please also note that for all builds prior to v3.0.2
, there are 2,008
nodes in the NodeLabels.txt
files that contain foreign characters. While there is now code in place to prevent this error from happening in the future, there is also a solution to account for the prior builds. The (bad_node_patch.json
) file contains a dictionary where the outer keys are the entity_uri
and the outer values are another dictionary where the inner keys are label
and description/definition
and the inner values for these inner keys are the updated strings without foreign characters. An example of this dictionary is shown below:
key = '<http://purl.obolibrary.org/obo/UBERON_0000468>'
print(bad_node_patch[key])
>>> {'label': 'multicellular organism', 'description/definition': 'Anatomical structure that is an individual member of a species and consists of more than one cell.'}
The code to identify the nodes with erroneous foreign characters is shown below:
import re
import pandas as pd
# link to downloaded `NodeLabels.txt` file
input_file = `'NodeLabels.txt'`
# load data as Pandas DataFrame
nodedf = pd.read_csv(input_file, sep='\t', header=0)
# identify bad nodes and filter DataFrame so it only contains these rows
nodedf['bad'] = nodedf['label'].apply(lambda x: re.search("[\u4e00-\u9FFF]", x) if not pd.isna(x) else None)
nodedf_bad_nodes = nodedf[~pd.isna(nodedf['bad'])].drop_duplicates()