Models

Dictionary

Dictionaries are required by several components in ClearNLP. The general dictionary contains general morphology information and the global lexica contains knowledge-base as well as distributional semantics information.

Without Maven

Download the following models and add them to your Java classpath.
General dictionary: clearnlp-dictionary-3.2.jar.
Global lexica: clearnlp-global-lexica-3.1.jar.

export CLASSPATH=clearnlp-dictionary-3.2.jar:\\
                    clearnlp-global-lexica-3.1.jar:.

With Maven

Add the following lines to your pom.xml.

<dependency>
     <groupId>edu.emory.clir</groupId>
     <artifactId>clearnlp-dictionary</artifactId>
     <version>3.2</version>
</dependency>
<dependency>
     <groupId>edu.emory.clir</groupId>
     <artifactId>clearnlp-global-lexica</artifactId>
     <version>3.1</version>
</dependency>
<dependency>
     <groupId>edu.emory.clir</groupId>
     <artifactId>clearnlp-general-en-ner-gazetteer</artifactId>
     <version>3.0</version>
   </dependency>

General Domain

The general models are trained on OntoNotes 5.0, English Web Treebank, and QuestionBank.

OntoNotes 5.0	Sentence Counts	Token Counts
Broadcasting conversations	10,822	171,101
Broadcasting news	10,344	206,020
News magazines	6,672	163,627
Newswires	34,434	875,800
Religious texts	21,418	296,432
Telephone conversations	8,963	85,444
Web texts	12,447	284,951

Engilsh Web Treebank	Sentence Counts	Token Counts
Answers	2,699	43,916
Email	2,983	44,168
Newsgroup	1,995	37,714
Reviews	2,915	44,337
Weblog	1,753	38,770

QuestionBank	Sentence Counts	Token Counts
Questions	3,199	29,715

Without Maven

Download the following models and add them to your Java classpath.
Part-of-speech tagging: clearnlp-general-en-pos-3.2.jar.
Dependency parsing: clearnlp-general-en-dep-3.2.jar.
Semantic role labeling: clearnlp-general-en-srl-3.0.jar.
Named entity recognition: clearnlp-general-en-ner-3.1.jar.
Named entity gazetteers: clearnlp-general-en-ner-gazetteer-3.0.jar.

export CLASSPATH=clearnlp-general-en-pos-3.2.jar:\\
                    clearnlp-general-en-dep-3.2.jar:\\
                    clearnlp-general-en-ner-3.1.jar:\\
                    clearnlp-general-en-ner-gazetteer-3.0:\\

With Maven

Add the following lines to your pom.xml.

<dependency>
     <groupId>edu.emory.clir</groupId>
     <artifactId>clearnlp-general-en-pos</artifactId>
     <version>3.2</version>
</dependency>
<dependency>
     <groupId>edu.emory.clir</groupId>
     <artifactId>clearnlp-general-en-dep</artifactId>
     <version>3.2</version>
</dependency>
<dependency>
     <groupId>edu.emory.clir</groupId>
     <artifactId>clearnlp-general-en-ner</artifactId>
     <version>3.1</version>
</dependency>
<dependency>
     <groupId>edu.emory.clir</groupId>
     <artifactId>clearnlp-general-en-ner-gazetteer</artifactId>
     <version>3.0</version>
</dependency>

Medical Domain

The medical models are trained on MiPACQ, SHARP, and THYME corpora.

MiPACQ	Sentence Counts	Token Counts
Clinical questions	1,600	30,138
Medpedia articles	2,796	49,922
Clinical notes	8,383	113,164
Pathological notes	1,205	21,353

SHARP	Sentence Counts	Token Counts
Seattle group health notes	7,205	94,474
Clinical notes	6,807	93,914
Stratified	4,320	43,536
Stratified SGH	13,668	139,424

THYME	Sentence Counts	Token Counts
Clinical & patheological notes	26,734	388,371
Braincancer	18,700	225,486

Without Maven

Download the following models and add them to your Java classpath.
Part-of-speech tagging: clearnlp-medical-en-pos-3.1.jar.
Dependency parsing: clearnlp-medical-en-dep-3.1.jar.

export CLASSPATH=clearnlp-medical-en-pos-3.1.jar:\\
                    clearnlp-medical-en-dep-3.1.jar:.

With Maven

Add the following lines to your pom.xml.

<dependency>
     <groupId>edu.emory.clir</groupId>
     <artifactId>clearnlp-medical-en-pos</artifactId>
     <version>3.1</version>
</dependency>
<dependency>
     <groupId>edu.emory.clir</groupId>
     <artifactId>clearnlp-medical-en-dep</artifactId>
     <version>3.1</version>
</dependency>

Bioinformatics Domain

The bioinformaitcs models are trained on CRAFT Treebank.

CRAFT	Sentence Counts	Token Counts
Training data	16,297	452,769

Without Maven

Download the following models and add them to your Java classpath.

Part-of-speech tagging: clearnlp-bioinformatics-en-pos-3.1.jar.
Dependency parsing: clearnlp-bioinformatics-en-dep-3.1.jar.

export CLASSPATH=clearnlp-bioinformatics-en-pos-3.1.jar:\\
                    clearnlp-bioinformatics-en-dep-3.1.jar:.

With Maven

Add the following lines to your pom.xml.

<dependency>
     <groupId>edu.emory.clir</groupId>
     <artifactId>clearnlp-bioinformatics-en-pos</artifactId>
     <version>3.1</version>
</dependency>
<dependency>
     <groupId>edu.emory.clir</groupId>
     <artifactId>clearnlp-bioinformatics-en-dep</artifactId>
     <version>3.1</version>
</dependency>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models.md

models.md

Models

Contents

Dictionary

Without Maven

With Maven

General Domain

Without Maven

With Maven

Medical Domain

Without Maven

With Maven

Bioinformatics Domain

Without Maven

With Maven

Files

models.md

Latest commit

History

models.md

File metadata and controls

Models

Contents

Dictionary

Without Maven

With Maven

General Domain

Without Maven

With Maven

Medical Domain

Without Maven

With Maven

Bioinformatics Domain

Without Maven

With Maven