LearnALCLengths

This repository contains our implementation of concept length predictors in the ALC description logic.

Installation

Clone this repository:

https://github.com/dice-group/LearnALCLengths.git

Install Anaconda3, then all required librairies by executing the following commands (Linux):

conda create -n clip python==3.11.5 && conda activate clip
pip install -r requirements.txt
git clone https://github.com/dice-group/Ontolearn.git && cd Ontolearn && git checkout 0.5.4 && pip install -e .

Download DL-Learner-1.4.0 from github and extract it into this repository (cloned above)
Clone DLFoil and DLFocl dlfoil, dlfocl, and extract the two repositories into LearnALCLengths/
Install Java (version 8+) and Apache Maven (Only necessary for running DL-Learner and DL-Foil/DL-Focl)

Reproducing the reported results

Datasets (necessary for running the algorithms)

Download datasets and extract the zip file into LearnALCLengths/ and rename the folder as Datasets

CLIP (our method)

*Open a terminal and navigate into /reproduce_results/ cd LearnALCLengths/reproduce_results/

Reproduce CLIP concept learning results on all KBs sh reproduce_celoe_clp_experiment_all_kbs.sh
Reproduce the training of concept length predictors sh reproduce_training_clp_on_all_kbs.sh
Furthermore, one can train concept length predictors on a single knowledge base as follows python reproduce_training_length_predictors_K_kb.py, where K is one of carcinogenesis, mutagenesis, semantic_bible or vicodi. Use -h to see more training options (example python reproduce_training_length_predictors_carcinogenesis_kb.py -h ).

CELOE, ELTL, OCEL from DL-Learner

*Open a terminal and navigate into /other_learning_systems/scripts cd LearnALCLengths/dllearner/scripts

Reproduce concept learning results on knowledge base K for algorithm Algo python reproduce_dllearner_experiment.py --learning_systems Algo --knowledge_bases K
To reproduce the results for multiple algorithms on multiple knowledge bases, use the schema python reproduce_dllearner_experiment.py --learning_systems Algo1 Algo2... --knowledge_bases K1 K2...

Note that Algo is one of celoe, ocel or eltl, and K is one of carcinogenesis, mutagenesis, semantic_bible or vicodi (all lower cased)

DLFoil and DLFocl

For DLFoil, open a terminal and navigate into /dl-foil/DLFoil2 cd LearnALCLengths/dl-foil/DLFoil2

Run mvn clean install
Open a different terminal and run the following python LearnALCLengths/generators/generate_dlfoil_config_all_kbs.py
Now execute the following in the first terminal (in LearnALCLengths/dl-foil/DLFoil2): mvn -e exec:java -Dexec.mainClass=it.uniba.di.lacam.ml.DLFoilTest -Dexec.args=K_config.xml >> ../dlfoil_out_K.txt, where K is one of carcinogenesis, mutagenesis, semantic_bible or vicodi.

Note that DLFoil fails to solve our learning problems as it gets stuck on the refinement of certain partial descriptions.

We could not run DLFocl.

The authors did not provide sufficient documentation to run their algorithm; the documentation is here

Statistical Test

Open a terminal and navigate into /reproduce_results/ cd LearnALCLengths/reproduce_results/

Run Wilcoxon statistical test on concept learning results All Algos vs CLIP: sh run_statistical_test_on_all_kbs.sh

Use your own data

Add your data into Datasets: it should be a folder containing a file formatted as RDF/XML or OWL/XML and should have the same name as the folder.
Navigate into /generators and run python train_data/generate_training_data.py --kb your_folder_name, use -h to see more options. The generated file Data.json under your_folder_name/Train_data/ should serve for training concept length predictors, see example scripts in /reproduce_results/train_clp/.
Similarly, learning problems can be generated using one of the example files in generators/learning_problems/ (replace folder names by your folder name)
Navigate into /Embeddings/Compute-Embeddings/ and run the following to embed your knowledge base: python run_script.py --path_dataset_folder your_folder_name
Train concept length predictors by preparing and running your python file reproduce_training_length_predictors_K_kb.py following examples in /reproduce_results/train_clp/.
Finally, prepare a script (see examples in /reproduce_results/celoe_clp/) and run CLIP on your data.

Acknowledgement

We based our implementation on the open source implementation of ontolearn. We would like to thank the Ontolearn team for the readable codebase.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Embeddings		Embeddings
celoe_clp		celoe_clp
concept_length_predictors		concept_length_predictors
generators		generators
helper_classes		helper_classes
helper_functions		helper_functions
ontolearn		ontolearn
other_learning_systems		other_learning_systems
owlapy		owlapy
reproduce_results		reproduce_results
superprop		superprop
util		util
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Length_prediction_analysis.ipynb		Length_prediction_analysis.ipynb
README.md		README.md
environment.yml		environment.yml
max_length.json		max_length.json
requirements.txt		requirements.txt
run_clip.py		run_clip.py
train-acc.pdf		train-acc.pdf
train-loss.pdf		train-loss.pdf
val-acc.pdf		val-acc.pdf
val-loss.pdf		val-loss.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LearnALCLengths

Installation

Reproducing the reported results

Datasets (necessary for running the algorithms)

CLIP (our method)

CELOE, ELTL, OCEL from DL-Learner

DLFoil and DLFocl

Statistical Test

Use your own data

Acknowledgement

About

Releases

Packages

Languages

License

dice-group/LearnALCLengths

Folders and files

Latest commit

History

Repository files navigation

LearnALCLengths

Installation

Reproducing the reported results

Datasets (necessary for running the algorithms)

CLIP (our method)

CELOE, ELTL, OCEL from DL-Learner

DLFoil and DLFocl

Statistical Test

Use your own data

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages