Ontologies of Linguistic Annotation. Machine-readable tagsets and annotation schemata for more than 100 languages.
OLiA Annotation models provide a machine-readable formalization of annotation schemes and tagsets in OWL2/DL (resp., RDF). The majority of these models is provided via this repository and published under resolvable URIs via http://purl.org/olia. Beyond that, additional OLiA annotation models externally hosted and/or provided include
Below, links to external resources are marked with (*). Unless marked otherwise, all ontologies provided via this repository are released under a Creative Commons Attribution licence CC-BY with reference to
Christian Chiarcos, and Maria Sukhareva (2015). OLiA - Ontologies of Linguistic Annotation, SWJ (Semantic Web Journal) 6(4): 379-386.
Below, we provide annotation and linking models for cross-linguistically and language-specific annotation schemas for morphosyntax, morphology and syntax.
tagset / NLP tool | phenomenon | languages | OWL/DL models |
SFB632 annotation standard (Dipper et al. 2008) | parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) | > 30 typologically different languages, including many African languages | Annotation Model, Linking Model |
EAGLES recommendations (Leech and Wilson 1996) |
morphosyntax | 11 EU languages, incl. Romance, Germanic, Greek and Irish | Annotation Model, Linking Model |
Connexor dependency parser | morphosyntax, morphology, dependency syntax | 10 European languages, incl. Romance, Germanic and Uralic languages | Annotation Model, Linking Model |
MULTEXT-East | morphosyntax, morphology | 15 mostly Eastern European languages, incl. Slavic, Romance, Uralic languages and Persian | Annotation Model (common specifications)(*), Linking Model(*); Annotation Model (all languages)(*), see project page and below for individual languages |
IL-POSTS tagset Baskaran et al. (2008) |
morphosyntax | languages of the Indian subcontinent | Annotation Model, Linking Model |
AnnCorra Bharati et al. (2006) |
morphosyntax, chunks | languages of the Indian subcontinent | Annotation Model, Linking Model |
IIIT tagset IIT (2007) |
morphosyntax | languages of the Indian subcontinent | Annotation Model, Linking Model |
PROIEL | morphosyntax, dependency syntax | Older Indo-European languages (Greek, Latin, Gothic, Classical Armenian, Old Church Slavonic, others | Annotation Model, Linking Model |
Universal Dependencies (POS) | parts of speech | various languages | (for language-specific Annotation Model ABoxes see below) Annotation Model TBox(*), Linking Model |
Universal Dependencies (features) | morphosyntax | various languages | (for language-specific Annotation Model ABoxes see below) Annotation Model TBox(*) |
Universal Dependencies (relations) | dependency syntax | various languages | (for language-specific Annotation Model ABoxes see below) Annotation Model TBox(*), Linking Model |
Annotation models for English include, annotation models for German and other Germanic languages below.
tagset / NLP tool | phenomenon | OWL/DL models |
---|---|---|
Brown corpus | morphosyntax | Annotation Model, Linking Model |
Connexor | morphosyntax, morphology, dependencies | Annotation Model, Linking Model |
EAGLES (Leech and Wilson 1996) | morphosyntax | Annotation Model, Linking Model |
GENIA corpus | morphosyntax | Annotation Model, Linking Model |
MULTEXT-East | morphosyntax | Annotation Model(*), Linking Model(*) |
Penn Treebank | morphosyntax | Annotation Model, Linking Model |
Penn Treebank | syntax | Annotation Model, Linking Model |
QTag | morphosyntax | Annotation Model, Linking Model |
Stanford | dependencies | Annotation Model, Linking Model |
Susanne corpus | morphosyntax | Annotation Model, Linking Model |
English UD POS | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
English UD features | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
English UD dependencies | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Annotation models for German include
tagset / NLP tool | phenomenon | OWL/DL models |
Connexor dependency parser | morphosyntax, morphology, dependency syntax | Annotation Model, Linking Model |
EAGLES recommendations (German) (Leech and Wilson 1996) |
morphosyntax | Annotation Model, Linking Model |
Morphisto | morphology | Annotation Model, Linking Model |
STTS | morphosyntax | Annotation Model, Linking Model |
TIGER/NEGRA | morphology | Annotation Model, Linking Model |
constituent syntax | Annotation Model, Linking Model | |
TreeTagger Chunker | chunk labels | Annotation Model, Linking Model |
German UD POS | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
German UD features | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
German UD dependencies | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
RFTagger | morphosyntax, morphology | t.b.a |
Annotation models for other Germanic languages
tagset/NLP tool | language | phenomenon | OWL/DL models |
EAGLES recommendations (Leech and Wilson 1996) |
Danish, Dutch, Swedish (and several non-Germanic languages) | morphosyntax; inflectional morphology | Annotation Model, Linking Model |
Danish UD POS | Danish | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Danish UD features | Danish | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Danish UD dependencies | Danish | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Alpino | Dutch | morphosyntax (POS) | Annotation Model, Linking Model |
Lassy | Dutch | morphosyntax (POS) | Annotation Model, Linking Model |
Dutch UD POS | Dutch | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Dutch UD features | Dutch | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Dutch UD dependencies | Dutch | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Norwegian UD POS | Norwegian | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Norwegian UD features | Norwegian | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Norwegian UD dependencies | Norwegian | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Mamba lexical categories | Swedish | morphosyntax (POS) | Annotation Model, Linking Model |
Mamba dependencies | Swedish | dependency syntax | Annotation Model, Linking Model |
Stockholm—Umeå Corpus (SUC 2.0) | Swedish | morphosyntax | Annotation Model, Linking Model |
Swedish UD POS | Swedish | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Swedish UD features | Swedish | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Swedish UD dependencies | Swedish | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Connexor | Dutch, Swedish, Danish, Norwegian | morphosyntax, morphology, dependency syntax | Annotation Model, Linking Model |
SFB632 annotation standard (Dipper et al. 2008) |
Dutch (among other languages) (SFB 632, project D2) |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) | Annotation Model, Linking Model |
PPCME2 POS tags | Middle English | morphosyntax | Annotation Model, Linking Model |
YCOE POS tags | Old English | morphosyntax | Annotation Model, Linking Model |
MENOTA (incomplete) | Old Norse | morphosyntax | Annotation Model, Linking Model |
T-CODEX | Old High German | morphosyntax, syntax, information structure | Annotation Model, Linking Model |
PROIEL | Gothic (and others) | morphosyntax, dependency syntax | Annotation Model, Linking Model |
Gothic UD POS | Gothic | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Gothic UD features | Gothic | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Gothic UD dependencies | Gothic | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Annotation models for Russian include
tagset / NLP tool | phenomenon | OWL/DL models |
Uppsala corpus tagset | morphosyntax, morphology | Annotation Model, Linking Model |
Russian TreeTagger (Serge Sharoff) |
morphosyntax | Annotation Model, Linking Model |
MULTEXT-East for Russian | morphosyntax, morphology | Annotation Model(*), Linking Model(*) |
Russian UD POS | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Russian UD features | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Russian UD dependencies | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Annotation models for other Slavic and Baltic languages include
tagset / NLP tool | language | phenomenon | OWL/DL models |
---|---|---|---|
MULTEXT-East | Bulgarian | morphosyntax, morphology | Annotation Model(*), Linking Model(*) |
Bulgarian UD POS | Bulgarian | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Bulgarian UD features | Bulgarian | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Bulgarian UD dependencies | Bulgarian | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Croatian UD POS | Croatian | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Croatian UD features | Croatian | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Croatian UD dependencies | Croatian | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
MULTEXT-East | Czech | morphosyntax, morphology | Annotation Model(*), Linking Model(*) |
Czech UD POS | Czech | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Czech UD features | Czech | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Czech UD dependencies | Czech | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Latvian UD POS | Latvian | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Latvian UD features | Latvian | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Latvian UD dependencies | Latvian | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
MULTEXT-East | Macedonian | morphosyntax, morphology | Annotation Model(*), Linking Model(*) |
MULTEXT-East | Polish | morphosyntax, morphology | Annotation Model(*), Linking Model(*) |
Polish UD POS | Polish | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Polish UD features | Polish | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Polish UD dependencies | Polish | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
MULTEXT-East | Serbian | morphosyntax, morphology | Annotation Model(*), Linking Model(*) |
MULTEXT-East | Slovak | morphosyntax, morphology | Annotation Model(*), Linking Model(*) |
MULTEXT-East | Slovene | morphosyntax, morphology | Annotation Model(*), Linking Model(*) |
MULTEXT-East | Resian (Slovene spoken in Italy) | morphosyntax, morphology | Annotation Model(*), Linking Model(*) |
Slovenian UD POS | Slovene | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Slovenian UD features | Slovene | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Slovenian UD dependencies | Slovene | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
MULTEXT-East | Ukrainian | morphosyntax, morphology | Annotation Model,(*) Linking Model(*) |
Ukrainian UD POS | Ukrainian | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Ukrainian UD features | Ukrainian | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Ukrainian UD dependencies | Ukrainian | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
PROIEL | Old Church Slavonic (and others) | morphosyntax, dependency syntax | Annotation Model, Linking Model |
Old Church Slavonic UD POS | Old Church Slavonic | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Old Church Slavonic UD features | Old Church Slavonic | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Old Church Slavonic UD dependencies | Old Church Slavonic | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Annotation models for French include, annotation models for other Romance and Italic languages (Latin) below
tagset / NLP tool | phenomenon | OWL/DL models |
EAGLES recommendations (Leech and Wilson 1996) |
morphosyntax | Annotation Model, Linking Model |
French TreeTagger (Achim Stein) |
morphosyntax | Annotation Model |
Le Monde corpus (Abeillé et al. 2000) |
morphosyntax | Annotation Model |
Connexor | morphosyntax, morphology, dependency syntax | Annotation Model, Linking Model |
SFB632 annotation standard (Dipper et al. 2008) |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) for Canadian French (among other languages, SFB 632, project D2) | Annotation Model, Linking Model |
French UD POS | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
French UD features | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
French UD dependencies | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Annotation models for other Romance and Italic languages include
tagset | language | phenomenon | OWL/DL models |
---|---|---|---|
PROIEL | Latin (and others) | morphosyntax, dependency syntax | Annotation Model, Linking Model |
Latin UD POS | Latin | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Latin UD features | Latin | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Latin UD dependencies | Latin | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
EAGLES recommendations (Leech and Wilson 1996) | Catalan, Portuguese, Spanish | morphosyntax | Annotation Model, Linking Model |
Connexor | Spanish, Italian | morphosyntax, morphology, dependency syntax | Annotation Model, Linking Model |
PAROLE (http://nlp.lsi.upc.edu/freeling) | Spanish, Catalan | morphosyntax, inflectional morphology | Annotation Model |
Catalan UD POS | Catalan | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Catalan UD features | Catalan | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Catalan UD dependencies | Catalan | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Galician UD POS | Galician | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Galician UD features | Galician | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Galician UD dependencies | Galician | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Italian UD POS | Italian | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Italian UD features | Italian | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Italian UD dependencies | Italian | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Portuguese UD POS | Portuguese, Brazilian Portuguese | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Portuguese UD features | Portuguese, Brazilian Portuguese | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Portuguese UD dependencies | Portuguese, Brazilian Portuguese | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Spanish UD POS | Spanish | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Spanish UD features | Spanish | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Spanish UD dependencies | Spanish | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
MULTEXT-East | Romanian | morphosyntax, morphology | Annotation Model(*), Linking Model(*) |
Romanian UD POS | Romanian | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Romanian UD features | Romanian morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) | |
Romanian UD dependencies | Romanian | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
tagset | language | phenomenon | OWL/DL models |
MULTEXT-East | Estonian | morphosyntax, morphology | Annotation Model(*), Linking Model(*) |
Estonian UD POS | Estonian | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Estonian UD features | Estonian | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Estonian UD dependencies | Estonian | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Connexor | Finnish | morphosyntax, morphology, dependency syntax | Annotation Model, Linking Model |
Finnish UD POS | Finnish | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Finnish UD features | Finnish | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Finnish UD dependencies | Finnish | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
MULTEXT-East | Hungarian | morphosyntax, morphology | Annotation Model(*), Linking Model(*) |
SFB632 annotation standard (Dipper et al. 2008) |
Hungarian (among other languages) (SFB 632, project D2) |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) | Annotation Model, Linking Model |
Hungarian UD POS | Hungarian | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Hungarian UD features | Hungarian | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Hungarian UD dependencies | Hungarian | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Kazakh UD POS | Kazakh | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Kazakh UD features | Kazakh | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Kazakh UD dependencies | Kazakh | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Turkish POS tagset (Oflazer et al. 2003) |
Turkish | morphosyntax | Annotation Model |
Turkish UD POS | Turkish | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Turkish UD features | Turkish | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Turkish UD dependencies | Turkish | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
This includes Indo-European and non-Indo-European languages of Europe that are not included in any other group.
tagset | language | phenomenon | OWL/DL models |
EAGLES recommendations (Leech and Wilson 1996) |
Modern Greek, Irish (among other EU languages) | morphosyntax | Annotation Model, Linking Model |
SFB632 annotation standard (Dipper et al. 2008) |
Georgian, Modern Greek (among other languages) (SFB 632, project D2) |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) | Annotation Model, Linking Model |
PROIEL | Ancient Greek, Classical Armenian (and others) | morphosyntax, dependency syntax | Annotation Model, Linking Model |
Ancient Greek UD POS | Ancient Greek | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Ancient Greek UD features | Ancient Greek | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Ancient Greek UD dependencies | Ancient Greek | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
EUSTagger Ezeiza et al. (1998) |
Basque | morphosyntax | Annotation Model |
Basque UD POS | Basque | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Basque UD features | Basque | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Basque UD dependencies | Basque | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Greek UD POS | Greek | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Greek UD features | Greek | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Greek UD dependencies | Greek | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Irish UD POS | Irish | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Irish UD features | Irish | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Irish UD dependencies | Irish | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
tagset | language | phenomenon | OWL/DL models |
Urdu EMILLE tagset Hardie (2003, 2004) |
Urdu | morphosyntax, inflectional morphology | Annotation Model, Linking Model |
Urdu tagset Sajjad (2007) |
Urdu | morphosyntax | Annotation Model, Linking Model |
IL-POSTS tagset Baskaran et al. (2008) |
Bangla, Hindi, Marathi, Sanskrit | morphosyntax, inflectional morphology | Annotation Model, Linking Model |
AnnCorra Bharati et al. (2006) |
Bangla, Hindi | morphosyntax, chunks | Annotation Model, Linking Model |
IIIT tagset IIIT (2007) |
Hindi, Marathi | morphosyntax | Annotation Model, Linking Model |
Hindi UD POS | Hindi | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Hindi UD features | Hindi | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Hindi UD dependencies | Hindi | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
SFB632 annotation standard (Dipper et al. 2008) |
Konkani (among other, unrelated languages) (SFB 632, project D2) |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) | Annotation Model, Linking Model |
MULTEXT-East | Farsi (Persian) | morphosyntax | Annotation Model(*), Linking Model(*) |
Persian UD POS | Farsi (Persian) | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Persian UD features | Farsi (Persian) | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Persian UD dependencies | Farsi (Persian) | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
tagset | language | phenomenon | OWL/DL models |
IL-POSTS tagset Baskaran et al. (2008) |
Kannada, Malayalam, Tamil, Telugu | morphosyntax | Annotation Model, Linking Model |
AnnCorra Bharati et al. (2006) |
Telugu, Tamil | morphosyntax, chunks | Annotation Model, Linking Model |
IIIT tagset IIIT (2007) |
Telugu | morphosyntax | Annotation Model, Linking Model |
Tamil UD POS | Tamil | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Tamil UD features | Tamil | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Tamil UD dependencies | Tamil | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
tagset | language | phenomenon | OWL/DL models |
Dzongkha tagset (Chungku et al. 2010) |
Dzongkha | morphosyntax | Annotation Model, Linking Model |
SFB632 annotation standard (Dipper et al. 2008) |
Prinmi (among other, unrelated languages) (SFB 632, project D2) |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) | Annotation Model, Linking Model |
Tübingen Tibetan Corpora (Wagner & Zeisler 2004) |
Tibetan (Old Tibetan, Classical Tibetan, Balti, Ladakh) | morphosyntax, morphology, syntax | Annotation Model |
annotation scheme / corpus | language | phenomenon | Annotation Model |
Penn Chinese Treebank (Xia 2000) |
Chinese | morphosyntax | Annotation Model |
Chinese UD POS | Chinese | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Chinese UD features | Chinese | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Chinese UD dependencies | Chinese | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
SFB632 annotation standard (Dipper et al. 2008) |
Japanese (among other, unrelated languages) (SFB 632, project D2) |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) | Annotation Model, Linking Model |
Japanese UD POS | Japanese | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Japanese UD features | Japanese | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Japanese UD dependencies | Japanese | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Sejong Treebank Annotation Model | Korean | morphosyntax (POS) | Annotation Model(*), Linking Model(*) |
Korean UD POS | Korean | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Korean UD features | Korean | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Korean UD dependencies | Korean | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Vietnamese UD POS | Vietnamese | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Vietnamese UD features | Vietnamese | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Vietnamese UD dependencies | Vietnamese | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
annotation scheme / corpus | language | phenomenon | Annotation Model |
Amharic UD POS | Amharic | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Amharic UD features | Amharic | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Amharic UD dependencies | Amharic | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Arabic tagset (Khoja 2001) |
Arabic | morphosyntax | Annotation Model |
Arabic UD POS | Arabic | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Arabic UD features | Arabic | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Arabic UD dependencies | Arabic | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
SFB632 annotation standard (Dipper et al. 2008) |
Chadic languages (including Guruntum, Tangale, Hausa) (SFB 632, project B2) |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) | Annotation Model, Linking Model |
Coptic UD POS | Coptic | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Coptic UD features | Coptic | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Coptic UD dependencies | Coptic | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Hausa Internet Corpus Chiarcos et al. (2011) |
Hausa | morphosyntax | t.b.a |
Hebrew UD POS | Hebrew | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Hebrew UD features | Hebrew | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Hebrew UD dependencies | Hebrew | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |
Electronic Text Corpus of Sumerian Royal Inscriptions (ETSCRI) | Sumerian | morphology | Annotation Model |
annotation scheme / corpus | language | phenomenon | Annotation Model |
SFB632 annotation standard (Dipper et al. 2008) |
Gur and Kwa languages (including Aja, Dagbani, Buli, Byali, Ditammari, Fon, Foodo, Konni, Nateni, Waamma, Yom) (SFB 632, project B1) |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) | Annotation Model, Linking Model |
Chadic languages (including Guruntum, Tangale, Hausa) (SFB 632, project B2) |
|||
Hausa Internet Corpus Chiarcos et al. (2011) |
Hausa | morphosyntax | t.b.a |
Annotation Models for indigenous languages of the Americas, Australia and the Pacific
annotation scheme / corpus | language | phenomenon | Annotation Model |
SFB632 annotation standard (Dipper et al. 2008) |
Teribe, Yucatec Maya, Mawng, Niue (SFB 632, project D2) |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) | Annotation Model, Linking Model |
Indonesian UD POS | Indonesian | parts of speech | language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model |
Indonesian UD features | Indonesian | morphosyntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*) |
Indonesian UD dependencies | Indonesian | dependency syntax | language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model |