olia

Ontologies of Linguistic Annotation. Machine-readable tagsets and annotation schemata for more than 100 languages.

OLiA Annotation models provide a machine-readable formalization of annotation schemes and tagsets in OWL2/DL (resp., RDF). The majority of these models is provided via this repository and published under resolvable URIs via http://purl.org/olia. Beyond that, additional OLiA annotation models externally hosted and/or provided include

Below, links to external resources are marked with (*). Unless marked otherwise, all ontologies provided via this repository are released under a Creative Commons Attribution licence CC-BY with reference to

Christian Chiarcos, and Maria Sukhareva (2015). OLiA - Ontologies of Linguistic Annotation, SWJ (Semantic Web Journal) 6(4): 379-386.

OLiA Annotation Models: Morphosyntax, Morphology, Syntax

Below, we provide annotation and linking models for cross-linguistically and language-specific annotation schemas for morphosyntax, morphology and syntax.

Cross-linguistically Applicable

tagset / NLP tool phenomenon languages OWL/DL models
SFB632 annotation standard (Dipper et al. 2008) parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) > 30 typologically different languages, including many African languages Annotation Model, Linking Model
EAGLES recommendations
(Leech and Wilson 1996)
morphosyntax 11 EU languages, incl. Romance, Germanic, Greek and Irish Annotation Model, Linking Model
Connexor dependency parser morphosyntax, morphology, dependency syntax 10 European languages, incl. Romance, Germanic and Uralic languages Annotation Model, Linking Model
MULTEXT-East morphosyntax, morphology 15 mostly Eastern European languages, incl. Slavic, Romance, Uralic languages and Persian Annotation Model (common specifications)(*), Linking Model(*); Annotation Model (all languages)(*), see project page and below for individual languages
IL-POSTS tagset
Baskaran et al. (2008)
morphosyntax languages of the Indian subcontinent Annotation Model, Linking Model
AnnCorra
Bharati et al. (2006)
morphosyntax, chunks languages of the Indian subcontinent Annotation Model, Linking Model
IIIT tagset
IIT (2007)
morphosyntax languages of the Indian subcontinent Annotation Model, Linking Model
PROIEL morphosyntax, dependency syntax Older Indo-European languages (Greek, Latin, Gothic, Classical Armenian, Old Church Slavonic, others Annotation Model, Linking Model
Universal Dependencies (POS) parts of speech various languages (for language-specific Annotation Model ABoxes see below) Annotation Model TBox(*), Linking Model
Universal Dependencies (features) morphosyntax various languages (for language-specific Annotation Model ABoxes see below) Annotation Model TBox(*)
Universal Dependencies (relations) dependency syntax various languages (for language-specific Annotation Model ABoxes see below) Annotation Model TBox(*), Linking Model

Germanic

Annotation models for English include, annotation models for German and other Germanic languages below.

tagset / NLP tool phenomenon OWL/DL models
Brown corpus morphosyntax Annotation Model, Linking Model
Connexor morphosyntax, morphology, dependencies Annotation Model, Linking Model
EAGLES (Leech and Wilson 1996) morphosyntax Annotation Model, Linking Model
GENIA corpus morphosyntax Annotation Model, Linking Model
MULTEXT-East morphosyntax Annotation Model(*), Linking Model(*)
Penn Treebank morphosyntax Annotation Model, Linking Model
Penn Treebank syntax Annotation Model, Linking Model
QTag morphosyntax Annotation Model, Linking Model
Stanford dependencies Annotation Model, Linking Model
Susanne corpus morphosyntax Annotation Model, Linking Model
English UD POS parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
English UD features morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
English UD dependencies dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model

Annotation models for German include

tagset / NLP tool phenomenon OWL/DL models
Connexor dependency parser morphosyntax, morphology, dependency syntax Annotation Model, Linking Model
EAGLES recommendations (German)
(Leech and Wilson 1996)
morphosyntax Annotation Model, Linking Model
Morphisto morphology Annotation Model, Linking Model
STTS morphosyntax Annotation Model, Linking Model
TIGER/NEGRA morphology Annotation Model, Linking Model
  constituent syntax Annotation Model, Linking Model
TreeTagger Chunker chunk labels Annotation Model, Linking Model
German UD POS parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
German UD features morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
German UD dependencies dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
RFTagger morphosyntax, morphology t.b.a

Annotation models for other Germanic languages

tagset/NLP tool language phenomenon OWL/DL models
EAGLES recommendations
(Leech and Wilson 1996)
Danish, Dutch, Swedish (and several non-Germanic languages) morphosyntax; inflectional morphology Annotation Model, Linking Model
Danish UD POS Danish parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Danish UD features Danish morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Danish UD dependencies Danish dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Alpino Dutch morphosyntax (POS) Annotation Model, Linking Model
Lassy Dutch morphosyntax (POS) Annotation Model, Linking Model
Dutch UD POS Dutch parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Dutch UD features Dutch morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Dutch UD dependencies Dutch dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Norwegian UD POS Norwegian parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Norwegian UD features Norwegian morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Norwegian UD dependencies Norwegian dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Mamba lexical categories Swedish morphosyntax (POS) Annotation Model, Linking Model
Mamba dependencies Swedish dependency syntax Annotation Model, Linking Model
Stockholm—Umeå Corpus (SUC 2.0) Swedish morphosyntax Annotation Model, Linking Model
Swedish UD POS Swedish parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Swedish UD features Swedish morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Swedish UD dependencies Swedish dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Connexor Dutch, Swedish, Danish, Norwegian morphosyntax, morphology, dependency syntax Annotation Model, Linking Model
SFB632 annotation standard
(Dipper et al. 2008)
Dutch (among other languages)
(SFB 632, project D2)
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) Annotation Model, Linking Model
PPCME2 POS tags Middle English morphosyntax Annotation Model, Linking Model
YCOE POS tags Old English morphosyntax Annotation Model, Linking Model
MENOTA (incomplete) Old Norse morphosyntax Annotation Model, Linking Model
T-CODEX Old High German morphosyntax, syntax, information structure Annotation Model, Linking Model
PROIEL Gothic (and others) morphosyntax, dependency syntax Annotation Model, Linking Model
Gothic UD POS Gothic parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Gothic UD features Gothic morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Gothic UD dependencies Gothic dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model

Slavic and Baltic

Annotation models for Russian include

tagset / NLP tool phenomenon OWL/DL models
Uppsala corpus tagset morphosyntax, morphology Annotation Model, Linking Model
Russian TreeTagger
(Serge Sharoff)
morphosyntax Annotation Model, Linking Model
MULTEXT-East for Russian morphosyntax, morphology Annotation Model(*), Linking Model(*)
Russian UD POS parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Russian UD features morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Russian UD dependencies dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model

Annotation models for other Slavic and Baltic languages include

tagset / NLP tool language phenomenon OWL/DL models
MULTEXT-East Bulgarian morphosyntax, morphology Annotation Model(*), Linking Model(*)
Bulgarian UD POS Bulgarian parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Bulgarian UD features Bulgarian morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Bulgarian UD dependencies Bulgarian dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Croatian UD POS Croatian parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Croatian UD features Croatian morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Croatian UD dependencies Croatian dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
MULTEXT-East Czech morphosyntax, morphology Annotation Model(*), Linking Model(*)
Czech UD POS Czech parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Czech UD features Czech morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Czech UD dependencies Czech dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Latvian UD POS Latvian parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Latvian UD features Latvian morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Latvian UD dependencies Latvian dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
MULTEXT-East Macedonian morphosyntax, morphology Annotation Model(*), Linking Model(*)
MULTEXT-East Polish morphosyntax, morphology Annotation Model(*), Linking Model(*)
Polish UD POS Polish parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Polish UD features Polish morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Polish UD dependencies Polish dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
MULTEXT-East Serbian morphosyntax, morphology Annotation Model(*), Linking Model(*)
MULTEXT-East Slovak morphosyntax, morphology Annotation Model(*), Linking Model(*)
MULTEXT-East Slovene morphosyntax, morphology Annotation Model(*), Linking Model(*)
MULTEXT-East Resian (Slovene spoken in Italy) morphosyntax, morphology Annotation Model(*), Linking Model(*)
Slovenian UD POS Slovene parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Slovenian UD features Slovene morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Slovenian UD dependencies Slovene dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
MULTEXT-East Ukrainian morphosyntax, morphology Annotation Model,(*) Linking Model(*)
Ukrainian UD POS Ukrainian parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Ukrainian UD features Ukrainian morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Ukrainian UD dependencies Ukrainian dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
PROIEL Old Church Slavonic (and others) morphosyntax, dependency syntax Annotation Model, Linking Model
Old Church Slavonic UD POS Old Church Slavonic parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Old Church Slavonic UD features Old Church Slavonic morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Old Church Slavonic UD dependencies Old Church Slavonic dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model

Romance, Italic, Latin

Annotation models for French include, annotation models for other Romance and Italic languages (Latin) below

tagset / NLP tool phenomenon OWL/DL models
EAGLES recommendations
(Leech and Wilson 1996)
morphosyntax Annotation Model, Linking Model
French TreeTagger
(Achim Stein)
morphosyntax Annotation Model
Le Monde corpus
(Abeillé et al. 2000)
morphosyntax Annotation Model
Connexor morphosyntax, morphology, dependency syntax Annotation Model, Linking Model
SFB632 annotation standard
(Dipper et al. 2008)
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) for Canadian French (among other languages, SFB 632, project D2) Annotation Model, Linking Model
French UD POS parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
French UD features morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
French UD dependencies dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model

Annotation models for other Romance and Italic languages include

tagset language phenomenon OWL/DL models
PROIEL Latin (and others) morphosyntax, dependency syntax Annotation Model, Linking Model
Latin UD POS Latin parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Latin UD features Latin morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Latin UD dependencies Latin dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
EAGLES recommendations (Leech and Wilson 1996) Catalan, Portuguese, Spanish morphosyntax Annotation Model, Linking Model
Connexor Spanish, Italian morphosyntax, morphology, dependency syntax Annotation Model, Linking Model
PAROLE (http://nlp.lsi.upc.edu/freeling) Spanish, Catalan morphosyntax, inflectional morphology Annotation Model
Catalan UD POS Catalan parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Catalan UD features Catalan morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Catalan UD dependencies Catalan dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Galician UD POS Galician parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Galician UD features Galician morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Galician UD dependencies Galician dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Italian UD POS Italian parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Italian UD features Italian morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Italian UD dependencies Italian dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Portuguese UD POS Portuguese, Brazilian Portuguese parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Portuguese UD features Portuguese, Brazilian Portuguese morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Portuguese UD dependencies Portuguese, Brazilian Portuguese dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Spanish UD POS Spanish parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Spanish UD features Spanish morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Spanish UD dependencies Spanish dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
MULTEXT-East Romanian morphosyntax, morphology Annotation Model(*), Linking Model(*)
Romanian UD POS Romanian parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Romanian UD features Romanian morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)  
Romanian UD dependencies Romanian dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model

Uralic and Altaic

tagset language phenomenon OWL/DL models
MULTEXT-East Estonian morphosyntax, morphology Annotation Model(*), Linking Model(*)
Estonian UD POS Estonian parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Estonian UD features Estonian morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Estonian UD dependencies Estonian dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Connexor Finnish morphosyntax, morphology, dependency syntax Annotation Model, Linking Model
Finnish UD POS Finnish parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Finnish UD features Finnish morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Finnish UD dependencies Finnish dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
MULTEXT-East Hungarian morphosyntax, morphology Annotation Model(*), Linking Model(*)
SFB632 annotation standard
(Dipper et al. 2008)
Hungarian (among other languages)
(SFB 632, project D2)
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) Annotation Model, Linking Model
Hungarian UD POS Hungarian parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Hungarian UD features Hungarian morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Hungarian UD dependencies Hungarian dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Kazakh UD POS Kazakh parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Kazakh UD features Kazakh morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Kazakh UD dependencies Kazakh dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Turkish POS tagset
(Oflazer et al. 2003)
Turkish morphosyntax Annotation Model
Turkish UD POS Turkish parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Turkish UD features Turkish morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Turkish UD dependencies Turkish dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model

Other European languages

This includes Indo-European and non-Indo-European languages of Europe that are not included in any other group.

tagset language phenomenon OWL/DL models
EAGLES recommendations
(Leech and Wilson 1996)
Modern Greek, Irish (among other EU languages) morphosyntax Annotation Model, Linking Model
SFB632 annotation standard
(Dipper et al. 2008)
Georgian, Modern Greek (among other languages)
(SFB 632, project D2)
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) Annotation Model, Linking Model
PROIEL Ancient Greek, Classical Armenian (and others) morphosyntax, dependency syntax Annotation Model, Linking Model
Ancient Greek UD POS Ancient Greek parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Ancient Greek UD features Ancient Greek morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Ancient Greek UD dependencies Ancient Greek dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
EUSTagger
Ezeiza et al. (1998)
Basque morphosyntax Annotation Model
Basque UD POS Basque parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Basque UD features Basque morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Basque UD dependencies Basque dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Greek UD POS Greek parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Greek UD features Greek morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Greek UD dependencies Greek dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Irish UD POS Irish parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Irish UD features Irish morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Irish UD dependencies Irish dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model

Indo-Iranian languages

tagset language phenomenon OWL/DL models
Urdu EMILLE tagset
Hardie (2003, 2004)
Urdu morphosyntax, inflectional morphology Annotation Model, Linking Model
Urdu tagset
Sajjad (2007)
Urdu morphosyntax Annotation Model, Linking Model
IL-POSTS tagset
Baskaran et al. (2008)
Bangla, Hindi, Marathi, Sanskrit morphosyntax, inflectional morphology Annotation Model, Linking Model
AnnCorra
Bharati et al. (2006)
Bangla, Hindi morphosyntax, chunks Annotation Model, Linking Model
IIIT tagset
IIIT (2007)
Hindi, Marathi morphosyntax Annotation Model, Linking Model
Hindi UD POS Hindi parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Hindi UD features Hindi morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Hindi UD dependencies Hindi dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
SFB632 annotation standard
(Dipper et al. 2008)
Konkani (among other, unrelated languages)
(SFB 632, project D2)
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) Annotation Model, Linking Model
MULTEXT-East Farsi (Persian) morphosyntax Annotation Model(*), Linking Model(*)
Persian UD POS Farsi (Persian) parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Persian UD features Farsi (Persian) morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Persian UD dependencies Farsi (Persian) dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model

Dravidian

tagset language phenomenon OWL/DL models
IL-POSTS tagset
Baskaran et al. (2008)
Kannada, Malayalam, Tamil, Telugu morphosyntax Annotation Model, Linking Model
AnnCorra
Bharati et al. (2006)
Telugu, Tamil morphosyntax, chunks Annotation Model, Linking Model
IIIT tagset
IIIT (2007)
Telugu morphosyntax Annotation Model, Linking Model
Tamil UD POS Tamil parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Tamil UD features Tamil morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Tamil UD dependencies Tamil dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model

Tibeto-Burman

tagset language phenomenon OWL/DL models
Dzongkha tagset
(Chungku et al. 2010)
Dzongkha morphosyntax Annotation Model, Linking Model
SFB632 annotation standard
(Dipper et al. 2008)
Prinmi (among other, unrelated languages)
(SFB 632, project D2)
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) Annotation Model, Linking Model
Tübingen Tibetan Corpora
(Wagner & Zeisler 2004)
Tibetan (Old Tibetan, Classical Tibetan, Balti, Ladakh) morphosyntax, morphology, syntax Annotation Model

East Asian languages

annotation scheme / corpus language phenomenon Annotation Model
Penn Chinese Treebank
(Xia 2000)
Chinese morphosyntax Annotation Model
Chinese UD POS Chinese parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Chinese UD features Chinese morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Chinese UD dependencies Chinese dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
SFB632 annotation standard
(Dipper et al. 2008)
Japanese (among other, unrelated languages)
(SFB 632, project D2)
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) Annotation Model, Linking Model
Japanese UD POS Japanese parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Japanese UD features Japanese morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Japanese UD dependencies Japanese dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Sejong Treebank Annotation Model Korean morphosyntax (POS) Annotation Model(*), Linking Model(*)
Korean UD POS Korean parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Korean UD features Korean morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Korean UD dependencies Korean dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Vietnamese UD POS Vietnamese parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Vietnamese UD features Vietnamese morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Vietnamese UD dependencies Vietnamese dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model

Afroasiatic and Ancient Near Eastern languages

annotation scheme / corpus language phenomenon Annotation Model
Amharic UD POS Amharic parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Amharic UD features Amharic morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Amharic UD dependencies Amharic dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Arabic tagset
(Khoja 2001)
Arabic morphosyntax Annotation Model
Arabic UD POS Arabic parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Arabic UD features Arabic morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Arabic UD dependencies Arabic dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
SFB632 annotation standard
(Dipper et al. 2008)
Chadic languages (including Guruntum, Tangale, Hausa)
(SFB 632, project B2)
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) Annotation Model, Linking Model
Coptic UD POS Coptic parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Coptic UD features Coptic morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Coptic UD dependencies Coptic dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Hausa Internet Corpus
Chiarcos et al. (2011)
Hausa morphosyntax t.b.a
Hebrew UD POS Hebrew parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Hebrew UD features Hebrew morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Hebrew UD dependencies Hebrew dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model
Electronic Text Corpus of Sumerian Royal Inscriptions (ETSCRI) Sumerian morphology Annotation Model

Subsaharic Africa

annotation scheme / corpus language phenomenon Annotation Model
SFB632 annotation standard
(Dipper et al. 2008)
Gur and Kwa languages (including Aja, Dagbani, Buli, Byali, Ditammari, Fon, Foodo, Konni, Nateni, Waamma, Yom)
(SFB 632, project B1)
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) Annotation Model, Linking Model
Chadic languages (including Guruntum, Tangale, Hausa)
(SFB 632, project B2)
Hausa Internet Corpus
Chiarcos et al. (2011)
Hausa morphosyntax t.b.a

The Americas, Australia and the Pacific

Annotation Models for indigenous languages of the Americas, Australia and the Pacific

annotation scheme / corpus language phenomenon Annotation Model
SFB632 annotation standard
(Dipper et al. 2008)
Teribe, Yucatec Maya, Mawng, Niue
(SFB 632, project D2)
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) Annotation Model, Linking Model
Indonesian UD POS Indonesian parts of speech language-specific Annotation Model ABox(*), Annotation Model TBox*, Linking Model
Indonesian UD features Indonesian morphosyntax language-specific Annotation Model ABox(*), Annotation Model TBox(*)
Indonesian UD dependencies Indonesian dependency syntax language-specific Annotation Model ABox(*), Annotation Model TBox(*), Linking Model