OLiA annotation model for the EAGLES recommendations for the annotation of morphosyntax. Unless specified otherwise, all comments are quotes from Leech & Wilson (1996)
Originally applied to the following languages: Catalan, Danish, Dutch, English, French, German, Greek, Irish, Portuguese, Spanish, Swedish (http://www.ilc.cnr.it/EAGLES96/morphsyn/node4.html#SECTION00022000000000000000)
G. Leech & A. Wilson (1996), EAGLES Recommendations for the Morphosyntactic Annotation of Corpora (EAG--TCWG--MAC/R, Version of Mar, 1996), http://www.ilc.cnr.it/EAGLES/annotate/annotate.html
(v) Inflection-type: 1. Weak-Flection 2. Strong-Flection 3. Mixed
Weak and Strong (attribute (v)) are values for adjectival inflection in the Germanic languages German, Dutch and Danish.
There is also an argument for subsuming Articles under Determiners. The present guidelines do not prevent such a realignment of categories, but do propose that articles (assuming they exist in a language) should always be recognised as a separate class, whether or not included within determiners. The requirement is that the descriptive scheme adopted should be automatically mappable into the present one via an Intermediate Tagset.
Attribute (xii) (Auxiliary) is applied to main verbs in French, German, Dutch, etc., and determines the selection of avoir or ?tre, etc., as auxiliary for the Perfect.
Type=Coordinating
The attribute Coord-Type subclassifies coordinating conjunctions. It is easier to assign one tag to one orthographic word and it is therefore suggested that the four values are assigned as follows: Simple applies to the regular type of coordinator occurring between conjuncts: German und, for example. When the same word is also placed before the first conjunct, as in French ou...ou..., the former occurrence is given the Correlative value and the latter the Simple value. When two distinct words occur, as in German weder...noch..., then the first is given the Initial value and the second the Non-initial value.
When the same word is also placed before the first conjunct, as in French ou...ou..., the former occurrence is given the Correlative value and the latter the Simple value.
An additional language-specific attribute is:
(vi) Definiteness: 1. Definite 2. Indefinite 3. Unmarked [Danish]
This is to handle the suffixed definite article in Danish: e.g. haven (`the garden'); havet (`the sea')
Attribute (i) Degree applies only to inflectional comparatives and superlatives. In some languages, e.g. Spanish, the number of such adjectives is very small.
this is the Eagles:PronounDeterminer with Category Determiner
note that the Eagles original class (i.e. motivated by the category "both") arises due to ambiguity of lexemes, which can wqithin such an ontology better described as "belonging to the join of two classes"
Person,PronType,SpecialPronType,Politeness are relevant for Pronouns only
Adposition:Type=FusedPrepArt
The additional value Fused prep-art is for the benefit of those who do not find it practical to split fused words such as French au (= ? + le) into two textwords. This very common phenomenon of a fused preposition + article in West European languages should preferably, however, be handled by assigning two tags to the same orthographic word (one for the preposition and one for the article).
An additional value to the non-finite category of verbs is arguably needed for English, because of the merger in that language of the gerund and participle functions. The -ing form does service for both and the two traditional categories are not easily distinguishable.
PronounOrDeterminer, (v) Case 6. Oblique
Under attribute (v) Case, the value Oblique applies to pronouns such as them and me in English, and equivalent pronouns such as dem and mig in Danish. These occur in object function, and also after prepositions.
(ii) Adverb-Type: 3. Particle
In some tagging schemes, especially for English, a particle such as out, off or up counts as a subclass of adverb. In other tagging schemes, the particle may be treated under Residual (Explanation) as a special word-class.
prontype=pers/refl
It is often difficult to distinguish these in automatic tagging, but they may be optionally distinguished at a more delicate level of granularity. So, under attribute (vi), Personal and Reflexive pronouns are brought together as a single value Pers./Refl.. They may be optionally separated at a more delicate level.
In some languages (e.g. French) it is possible to treat Polite and Familiar simply as pragmatic values encoded through other attributes -- especially person and number. In languages where there are special polite pronoun forms (e.g. Dutch u and Spanish usted), the additional Politeness attribute is required.
Type=Postposition
German entlang is a Postposition, and arguably, the 's which forms the genitive in English is no longer a case marking, but an enclitic postposition, as in the Secretary of State's decision, in a month or so's time.
this is the Eagles:PronounDeterminer with Category Pronoun
note that the Eagles original class (i.e. motivated by the category "both") arises due to ambiguity of lexemes, which can wqithin such an ontology better described as "belonging to the join of two classes"
ignored inflectional features such as case, gender, number, and possessive (i.e. inherent number of possessive pronouns)
DetType is not relevant for Pronouns
The parts of speech Pronoun, Determiner and Article heavily overlap in their formal and functional characteristics, and different analyses for different languages entail separating them out in different ways. For the present purpose, we have proposed placing Pronouns and Determiners in one `super-category', recognising that for some descriptions it may be thought best to treat them as totally different parts of speech.
There is also an argument for subsuming Articles under Determiners. The present guidelines do not prevent such a realignment of categories, but do propose that articles (assuming they exist in a language) should always be recognised as a separate class, whether or not included within determiners. The requirement is that the descriptive scheme adopted should be automatically mappable into the present one via an Intermediate Tagset.
PronounOrDeterminer:
(xii) Strength 1. Weak 2. Strong [French, Dutch, Greek]
Weak and Strong distinguish, for example, me from moi in French, and me from mij in Dutch.
Punctuation marks (PU) are (perhaps surprisingly) treated here as a part of morphosyntactic annotation, as it is very common for punctuation marks to be tagged and to be treated as equivalent to words for the purposes of automatic tag assignment.
Word-external punctuation marks, if treated as words for morphosyntactic tagging, are sometimes assigned a separate tag (in effect, an attribute value) for each main punctuation mark:
(i) 1. Period 2. Comma 3. Question mark ...etc. ...
An alternative is to group the punctuation marks into positional classes:
(i) 1. Sentence-final 2. Sentence-medial 3. Left-Parenthetical 4. Right-Parenthetical
Under 1 are grouped . ? !. Under 2 are grouped , ; : -- . Under 3 are placed punctuation marks which signal the initiation of a constituent, such as (, [ , and ? in Spanish). Under 4 are grouped punctuation marks which conclude a constituent the opening of which is marked by one of the devices in 3: e.g. ), ] and Spanish ? . We make no recommendation about choosing between these two sets of punctuation values.
Attribute (xii) is applied to main verbs in French, German, Dutch, etc., and determines the selection of avoir or ?tre, etc., as auxiliary for the Perfect.
The residual value (R) is assigned to classes of textword which lie outside the traditionally accepted range of grammatical classes, although they occur quite commonly in many texts and very commonly in some. For example: foreign words, or mathematical formulae. It can be argued that these are on the fringes of the grammar or lexicon of the language in which the text is written. Nevertheless, they need to be tagged.
Although words in the Residual category are on the periphery of the lexicon, they may take some of the grammatical characteristics, e.g., of nouns. Acronyms such as IBM are similar to proper nouns; symbols such as alphabetic characters can vary for singular and plural (e.g. How many Ps are there in `psychopath'?), and are in this respect like common nouns. In some languages (e.g. Portuguese) such symbols also have gender. It is quite reasonable that in some tagging schemes some of these classes of word will be classified under other parts of speech.
verb/Status=Semi-auxiliary
In addition to main and auxiliary verbs, it may be useful (e.g. in English) to recognise an intermediate category of semi-auxiliary for such verbs as be going to, have got to, ought to.
Simple applies to the regular type of coordinator occurring between conjuncts: German und, for example. When the same word is also placed before the first conjunct, as in French ou...ou..., the former occurrence is given the Correlative value and the latter the Simple value.
Type=Subordinating
Subclassification follows Subord.-Type, an additional attribute, applying to subordinating conjunctions only:
(iii) Subord.-type: 1. With-finite 2. With-infin. 3. Comparative [German]
For example, in German, weil introduces a clause with a finite verb, whereas ohne (zu...) is followed by an infinitive, and als is followed by various kinds of comparative clause (including clauses without finite verbs).
For example, in German, weil introduces a clause with a finite verb, whereas ohne (zu...) is followed by an infinitive, and als is followed by various kinds of comparative clause (including clauses without finite verbs).
For example, in German, weil introduces a clause with a finite verb, whereas ohne (zu...) is followed by an infinitive, and als is followed by various kinds of comparative clause (including clauses without finite verbs).
For example, in German, weil introduces a clause with a finite verb, whereas ohne (zu...) is followed by an infinitive, and als is followed by various kinds of comparative clause (including clauses without finite verbs).
The unique value (U) is applied to categories with a unique or very small membership, such as negative particle, which are `unassigned' to any of the standard part-of-speech categories. The value unique cannot always be strictly applied, since (for example) Greek has three negative particles, , , and .
No subcategories are recommended, although it is expected that tagsets for individual languages will need to identify such one-member word-classes as Negative particle, Existential particle, Infinitive marker, etc.
Eagles:V
again, grammatical information is skipped, i.e. person, gender, number, tense, voice, aspect
skip optional information (separability, reflexivity, auxiliary, aux-function)
Attribute (vii) Voice refers to the morphologically-encoded passive, e.g. in Danish and in Greek. Where the passive is realised by more than one verb, this does not need to be represented in the tagset.
DetType=Int/Rel
Under attributes (vi) and (vii), the subcategories Interrogative and Relative are merged into a single value Int./Rel.. It is often difficult to distinguish these in automatic tagging, but they may be optionally distinguished at a more delicate level of granularity.
Prontype=IntRel
Under attributes (vi) and (vii), the subcategories Interrogative and Relative are merged into a single value Int./Rel.. It is often difficult to distinguish these in automatic tagging, but they may be optionally distinguished at a more delicate level of granularity.
(iv) Possessive: 1. Singular 2. Plural
Attribute (iv) accounts for the fact that a possessive pronoun or possessive determiner may have two different numbers. This attribute handles the number which is inherent to the possessive form (e.g. Italian (la) mia, (la) nostra as first-person singular and first-person plural) as contrasted with the number it has by virtue of agreeing with a particular noun (e.g. Italian (la) mia, (le) mie).