Ontology - emille

Abstract
OLiA Annotation Model for the morphosyntactic annotation of the Urdu section of the EMILLE corpus (Hardie 2003, 2004). Unless marked otherwise, all comments are quotes from Hardie (2004), Chapter 3. The tagset discussed here was created in accordance with the EAGLES guidelines for morphosyntactic annotation of corpora. Although these guidelines were written to cover the languages of the European Union, they can be applied fairly easily to Urdu, which, coming as it does from another branch of the Indo- European family, is structurally quite similar. They can also be extended to deal with the idiosyncrasies presented by Urdu grammar. (Hardie 2003) The first stage of the work was to develop a tagset for use in Urdu texts and corpora, an area which has not been research extensively heretofore2. The next stage, now underway, is to test the tagset’s usability in manual tagging, and build up a set of tagged texts to serve as training data for the final phase of this part of the project. This will be to automate the tagging and subsequently tag the whole of the EMILLE Urdu corpus. (Hardie 2003) References Hardie, A (2003) Developing a tagset for automated part-of-speech tagging in Urdu. In: Corpus Linguistics 2003, 2003-03-01, Lancaster. http://eprints.lancs.ac.uk/103/ Hardie, Andrew (2004) The computational analysis of morphosyntactic categories in Urdu. Other thesis, Lancaster University. http://eprints.lancs.ac.uk/106/ Ruth Laila Schmidt (1999) Urdu, an essential grammar, Routledge, London.
Latest Version
http://purl.org/olia/emille.owl#

Imports

Classes - Overview

G ObliqueOrVocativeCase Oblique Or Vocative Case Participle Participle ImperfectiveParticiple Imperfective Participle Participle->ImperfectiveParticiple is a PerfectiveParticiple Perfective Participle Participle->PerfectiveParticiple is a system_Feature Feature (system) Aspect Aspect system_Feature->Aspect is a Case Case system_Feature->Case is a Finiteness Finiteness system_Feature->Finiteness is a Gender Gender system_Feature->Gender is a GenderMarking Gender Marking system_Feature->GenderMarking is a Mood Mood system_Feature->Mood is a Number Number system_Feature->Number is a Person Person system_Feature->Person is a Tense Tense system_Feature->Tense is a system_UnitOfAnnotation Unit Of Annotation (system) PartOfSpeech Part Of Speech system_UnitOfAnnotation->PartOfSpeech is a Abbreviation Abbreviation Acronym Acronym AdjectivalOccupationalParticle Adjectival Occupational Particle AdjectivalParticle Adjectival Particle Adjective Adjective AttributiveOrPredicativeAdjective Attributive Or Predicative Adjective Adjective->AttributiveOrPredicativeAdjective is a PredicativeAdjective Predicative Adjective Adjective->PredicativeAdjective is a Adposition Adposition Postposition Postposition Adposition->Postposition is a Preposition Preposition Adposition->Preposition is a Adverb Adverb GeneralAdverb General Adverb Adverb->GeneralAdverb is a NonLexicalAdverb Non Lexical Adverb Adverb->NonLexicalAdverb is a Article Article ImperfectiveAspect Imperfective Aspect Aspect->ImperfectiveAspect is a PerfectiveAspect Perfective Aspect Aspect->PerfectiveAspect is a AuxiliaryVerb Auxiliary Verb CahieAuxiliary Cahie Auxiliary AuxiliaryVerb->CahieAuxiliary is a GaAuxiliary Ga Auxiliary AuxiliaryVerb->GaAuxiliary is a HonaAuxiliary Hona Auxiliary AuxiliaryVerb->HonaAuxiliary is a RahaAuxiliary Raha Auxiliary AuxiliaryVerb->RahaAuxiliary is a CardinalNumber Cardinal Number NominativeCase Nominative Case Case->NominativeCase is a ObliqueCase Oblique Case Case->ObliqueCase is a VocativeCase Vocative Case Case->VocativeCase is a CliticExclusiveEmphaticParticle Clitic Exclusive Emphatic Particle CliticPostposition Clitic Postposition CloseParenthesis Close Parenthesis CloseQuotationMark Close Quotation Mark CloseSquareBracket Close Square Bracket Colon Colon Comma Comma CommonNoun Common Noun CompoundFormingConjunction Compound Forming Conjunction Conjunction Conjunction CoordinatingConjunction Coordinating Conjunction Conjunction->CoordinatingConjunction is a CorrelativeCoordinatingConjunction Correlative Coordinating Conjunction Conjunction->CorrelativeCoordinatingConjunction is a SubordinatingConjunction Subordinating Conjunction Conjunction->SubordinatingConjunction is a ContrastiveEmphaticParticle Contrastive Emphatic Particle DeadjectivalAdverb Deadjectival Adverb DegreeAdverb Degree Adverb DemonstrativeOrInterrogativeOrRelativePronounOrDeterminer Demonstrative Or Interrogative Or Relative Pronoun Or Determiner DistalDemonstrativeAdjective Distal Demonstrative Adjective DemonstrativeOrInterrogativeOrRelativePronounOrDeterminer->DistalDemonstrativeAdjective is a DistalDemonstrativePronoun Distal Demonstrative Pronoun DemonstrativeOrInterrogativeOrRelativePronounOrDeterminer->DistalDemonstrativePronoun is a InterrogativeAdjective Interrogative Adjective DemonstrativeOrInterrogativeOrRelativePronounOrDeterminer->InterrogativeAdjective is a InterrogativePronoun Interrogative Pronoun DemonstrativeOrInterrogativeOrRelativePronounOrDeterminer->InterrogativePronoun is a ProximalDemonstrativeAdjective Proximal Demonstrative Adjective DemonstrativeOrInterrogativeOrRelativePronounOrDeterminer->ProximalDemonstrativeAdjective is a ProximalDemonstrativePronoun Proximal Demonstrative Pronoun DemonstrativeOrInterrogativeOrRelativePronounOrDeterminer->ProximalDemonstrativePronoun is a RelativeAdjective Relative Adjective DemonstrativeOrInterrogativeOrRelativePronounOrDeterminer->RelativeAdjective is a RelativePronoun Relative Pronoun DemonstrativeOrInterrogativeOrRelativePronounOrDeterminer->RelativePronoun is a DistalDemonstrativeAdverb Distal Demonstrative Adverb DistalDemonstrativeDeadjectivalAdverb Distal Demonstrative Deadjectival Adverb ExclamationMark Exclamation Mark ExclusiveEmphaticParticle Exclusive Emphatic Particle FeminineGender Feminine Gender Finite Finite Finiteness->Finite is a NonFinite Non Finite Finiteness->NonFinite is a FirstPerson First Person ForeignWord Foreign Word Formula Formula Fraction Fraction FullStop Full Stop FutureTense Future Tense Gender->FeminineGender is a MasculineGender Masculine Gender Gender->MasculineGender is a MarkedForGender Marked For Gender GenderMarking->MarkedForGender is a UnmarkedForGender Unmarked For Gender GenderMarking->UnmarkedForGender is a GeneralAdverb->DeadjectivalAdverb is a GeneralAuxiliary General Auxiliary HonorificSecondPerson Honorific Second Person Imperative Imperative ImperativeMood Imperative Mood InclusiveEmphaticParticle Inclusive Emphatic Particle IndefiniteDeterminer Indefinite Determiner IndefinitePronoun Indefinite Pronoun IndicativeMood Indicative Mood Infinitive Infinitive InfinitiveMood Infinitive Mood Interjection Interjection InterrogativeAdverb Interrogative Adverb InterrogativeDeadjectivalAdverb Interrogative Deadjectival Adverb Izafat Izafat Letter Letter LexicalVerb Lexical Verb LexicalVerb->Imperative is a LexicalVerb->Infinitive is a Root Root LexicalVerb->Root is a Subjunctive Subjunctive LexicalVerb->Subjunctive is a ModalAdverb Modal Adverb NegativeModalAdverb Negative Modal Adverb ModalAdverb->NegativeModalAdverb is a Mood->ImperativeMood is a Mood->IndicativeMood is a Mood->InfinitiveMood is a ParticipleMood Participle Mood Mood->ParticipleMood is a SubjunctiveMood Subjunctive Mood Mood->SubjunctiveMood is a MultiplicativeMarker Multiplicative Marker NeutralQuotation Neutral Quotation NonLexicalAdverb->DegreeAdverb is a NonLexicalAdverb->ModalAdverb is a PronominalAdverb Pronominal Adverb NonLexicalAdverb->PronominalAdverb is a NonPersoArabicString Non Perso Arabic String NongrammaticalLexicalElement Nongrammatical Lexical Element Noun Noun Noun->CommonNoun is a ProperNoun Proper Noun Noun->ProperNoun is a PluralNumber Plural Number Number->PluralNumber is a SingularNumber Singular Number Number->SingularNumber is a Numeral Numeral Numeral->CardinalNumber is a Numeral->Fraction is a OrdinalNumber Ordinal Number Numeral->OrdinalNumber is a OpenParenthesis Open Parenthesis OpenQuotationMark Open Quotation Mark OpenSquareBracket Open Square Bracket OtherPronounOrDeterminer Other Pronoun Or Determiner OtherPronounOrDeterminer->IndefiniteDeterminer is a OtherPronounOrDeterminer->IndefinitePronoun is a OtherSymbol Other Symbol OtherUnclassifiableNonUrduElement Other Unclassifiable Non Urdu Element PartOfSpeech->Adjective is a PartOfSpeech->Adposition is a PartOfSpeech->Adverb is a PartOfSpeech->Article is a PartOfSpeech->Conjunction is a PartOfSpeech->Interjection is a PartOfSpeech->Noun is a PartOfSpeech->Numeral is a PronounOrDeterminer Pronoun Or Determiner PartOfSpeech->PronounOrDeterminer is a Punctuation Punctuation PartOfSpeech->Punctuation is a Residual Residual PartOfSpeech->Residual is a Unique Unique PartOfSpeech->Unique is a Verb Verb PartOfSpeech->Verb is a PastTense Past Tense Person->FirstPerson is a Person->HonorificSecondPerson is a SecondPerson Second Person Person->SecondPerson is a ThirdPerson Third Person Person->ThirdPerson is a PersonalPronoun Personal Pronoun SecondPersonHonorificPronoun Second Person Honorific Pronoun PersonalPronoun->SecondPersonHonorificPronoun is a PossessiveAdjective Possessive Adjective ReflexivePossessiveAdjective Reflexive Possessive Adjective PossessiveAdjective->ReflexivePossessiveAdjective is a PremultiplicativeCliticNumeral Premultiplicative Clitic Numeral PresentTense Present Tense PronominalAdverb->DistalDemonstrativeAdverb is a PronominalAdverb->DistalDemonstrativeDeadjectivalAdverb is a PronominalAdverb->InterrogativeAdverb is a PronominalAdverb->InterrogativeDeadjectivalAdverb is a ProximalDemonstrativeAdverb Proximal Demonstrative Adverb PronominalAdverb->ProximalDemonstrativeAdverb is a ProximalDemonstrativeDeadjectivalAdverb Proximal Demonstrative Deadjectival Adverb PronominalAdverb->ProximalDemonstrativeDeadjectivalAdverb is a RelativeAdverb Relative Adverb PronominalAdverb->RelativeAdverb is a RelativeDeadjectivalAdverb Relative Deadjectival Adverb PronominalAdverb->RelativeDeadjectivalAdverb is a PronounOrDeterminer->DemonstrativeOrInterrogativeOrRelativePronounOrDeterminer is a PronounOrDeterminer->OtherPronounOrDeterminer is a PronounOrDeterminer->PersonalPronoun is a PronounOrDeterminer->PossessiveAdjective is a ReciprocalPronoun Reciprocal Pronoun PronounOrDeterminer->ReciprocalPronoun is a ReflexivePronoun Reflexive Pronoun PronounOrDeterminer->ReflexivePronoun is a Punctuation->CloseParenthesis is a Punctuation->CloseQuotationMark is a Punctuation->CloseSquareBracket is a Punctuation->Colon is a Punctuation->Comma is a Punctuation->ExclamationMark is a Punctuation->FullStop is a Punctuation->NeutralQuotation is a Punctuation->OpenParenthesis is a Punctuation->OpenQuotationMark is a Punctuation->OpenSquareBracket is a QuestionMark Question Mark Punctuation->QuestionMark is a SemiColon Semi Colon Punctuation->SemiColon is a QuestionMarker Question Marker Residual->Abbreviation is a Residual->Acronym is a Residual->ForeignWord is a Residual->Formula is a Residual->Letter is a Residual->NonPersoArabicString is a Residual->OtherSymbol is a Residual->OtherUnclassifiableNonUrduElement is a SentenceTagWord Sentence Tag Word Tense->FutureTense is a Tense->PastTense is a Tense->PresentTense is a Unique->AdjectivalOccupationalParticle is a Unique->AdjectivalParticle is a Unique->CliticExclusiveEmphaticParticle is a Unique->CliticPostposition is a Unique->CompoundFormingConjunction is a Unique->ContrastiveEmphaticParticle is a Unique->ExclusiveEmphaticParticle is a Unique->InclusiveEmphaticParticle is a Unique->Izafat is a Unique->MultiplicativeMarker is a Unique->NongrammaticalLexicalElement is a Unique->PremultiplicativeCliticNumeral is a Unique->QuestionMarker is a Unique->SentenceTagWord is a Verb->AuxiliaryVerb is a Verb->GeneralAuxiliary is a Verb->LexicalVerb is a

Properties - Overview

Classes

Abbreviation G Abbreviation Abbreviation
SubClass Of
Acronym G Acronym Acronym
SubClass Of
AdjectivalOccupationalParticle G AdjectivalOccupationalParticle Adjectival Occupational Particle
Abstract Adjectival / occupational particle (v?l?) This element is the source of the English word / suffix ?wallah? (Kachru 1990: 70), which may help the reader to gain some grasp on its meaning.
SubClass Of
AdjectivalParticle G AdjectivalParticle Adjectival Particle
Abstract Adjectival particle (s?)
SubClass Of
Adjective G Adjective Adjective
Abstract use ... refers to whether an adjective may be used in attributive or predicative positions only. The default value for this is naturally both. In the absence of a specification in the EAGLES guidelines, I represent this with 0. There are a number of common Perso-Arabic adjectives in Urdu that can only be used in predicative position (Schmidt 1999: 37), for which this attribute can take the value 2. This is the rationale for including this attribute, which is however a prime candidate to be underspecified in a practical subtagset. It is anticipated that it will be difficult for a POS tagger to detect predicate-only adjectives. Since the predicate-only adjectives are Perso-Arabic, it ought to follow that they are all unmarked adjectives. However, this is a point on which Schmidt (1999) is silent. For this reason, tags have been included for predicate-only adjectives that are marked for gender/number/case. These may need to be removed if it turns out from the data that they do indeed describe nonexistent categories, as I suspect
SubClass Of
Sub-Classes
Adposition G Adposition Adposition
Abstract It should be noted at the outset that I treat as adpositions those elements of Urdu that some writers (e.g. Kellogg 1875, Butt 1995) describe as case suffixes or clitics. This is firstly because Schmidt (1999), the model of the language being used, does so. Secondly, however, treating n? (among other markers) as adpositions allows theoretical neutrality to be maintained on the question of whether Urdu displays ergativity48. The EAGLES guidelines give only one attribute for adpositions, Type, which has a range of recommended and optional values: preposition, fused preposition- 48 See also the discussion of the ergativity controversy in 1.1.5.4 and the discussion of noun cases and the etymology of postpositions in 3.1.3. 177 article, postposition, and circumposition. The second and fourth of these do not apply to Urdu, which lacks articles49 and circumpositions. The vast majority of Urdu adpositions are postpositions, but there are some prepositions borrowed from Persian and Arabic (Schmidt 1999: 68, 250, 267), so this attribute is relevant. There are two other issues. The first is that of iz?fat (Bhatia and Koul 2000: 339; Schmidt 1999: 246-247). The iz?fat is a Persian enclitic (pronounced as a shorter form of ???) which in some circumstances can be considered a preposition: it links two nouns in a possessive relationship, although the phrase thus produced may often have a different meaning to a phrase produced with the native Urdu postposition k?. However, the iz?fat may also join a noun to an adjective, in which case it is not so clearly accurate to describe it as a preposition parallel to the prepositions in European languages for which the EAGLES guidelines were compiled. A better way to treat iz?fat is in the context of the Unique category of miscellaneous one-member wordclasses, discussed below. The second issue is that in Urdu, the postposition k? can be marked for number/gender/case agreement (Schmidt 1999: 68-69). It does not agree with the noun it governs, but with the head noun of the noun phrase that contains its postposition phrase. This is not a phenomenon allowed for by the EAGLES guidelines as they now stand. k? takes the same inflectional endings as marked adjectives (having the forms k?, k?, and k?). Therefore, it is necessary for the same number/gender/case categories to be distinguished by the tagset for postpositions as for adjectives50. This means that the intermediate tagset contains three more attributes than are suggested in the EAGLES guidelines.
SubClass Of
Sub-Classes
Adverb G Adverb Adverb
Abstract As with verbs, there are lexical and non-lexical adverbs, which will be considered in turn. In the EAGLES guideline, the recommended attribute for adverbs is degree44, which is not relevant morphologically to Urdu (as discussed with reference to adjectives: see 3.3 above). However, the remaining three features are relevant, and have been included. These are adverb-type, which distinguishes general and degree adverbs, and polarity and wh-type, which distinguish interrogative and relative pronouns. The following summarises the features used in the intermediate tagset. There are a total of 13 adverb tags.
SubClass Of
Sub-Classes
Article G Article Article
Abstract Articles Urdu lacks articles. However, some phrases borrowed from Arabic contain the clitic Arabic definite article, which receives the single tag AL (the spelling of the Arabic article). I have not included a C in this tag, as I have done for other clitics (see section 3.12), because this would make the tag less transparent. The use of the AT intermediate tag could be queried here, because the use of the Arabic definite article in Urdu does not parallel that of, for example, the in English or le/la/les in French. For example, the Arabic definite article is only found with Arabic loanwords43, whereas of course the can appear with the vast majority of nouns in English. However, on balance it seems that this disadvantage is outweighed by the advantage of indicating that the Arabic definite article in Urdu does do pretty much what other languages? articles do. Khoja et al.?s (2001) Arabic tagset does not have a separate tag for the article, but considers definiteness a feature of nouns: this would not be an appropriate approach for Urdu because non-Arabic nouns cannot be made definite by use of the Arabic definite form.
SubClass Of
Aspect G Aspect Aspect
SubClass Of
Sub-Classes
AttributiveOrPredicativeAdjective G AttributiveOrPredicativeAdjective Attributive Or Predicative Adjective
Abstract Adjective/Use=both
SubClass Of
AuxiliaryVerb G AuxiliaryVerb Auxiliary Verb
Abstract It should be noted that, whereas I have in this category treated all auxiliary elements as verbs, in the terms of the EAGLES guidelines for intermediate tagsets some could easily be characterised as unique or unassigned words (see below). The EAGLES guidelines treat the English infinitive marker to in this manner, for example. However, treating them as verbs in the intermediate is firstly in keeping with the structure of the Urdu tagset, and secondly allows verbal attributes such as gender and number to be used (the EAGLES unique intermediate tags include no such attributes).
SubClass Of
Sub-Classes
CahieAuxiliary G CahieAuxiliary Cahie Auxiliary
Abstract The word c?hi? is used in combination with the infinitive of a lexical verb to express advisability. It is also used (as described by Bhatia and Koul 2000: 60) as a polite form of the verb c?hn?, ?want?. It is derived from an old morphologically marked passive form (Schmidt 1999: 137) of c?hn?20; however, c?hn? is a lexical verb and other than this use of c?hi?, it does not deviate from the pattern of other lexical verbs. Therefore the best approach would seem to be to give c?hi? its own tags (it requires two tags because it agrees with the number of the object of the preceding infinitive in certain circumstances21). This is the approach taken in many English tagsets for modal auxiliary verbs, which are, like c?hi?, anomalous forms. The intermediate tags given to c?hi? and its plural form c?hi?~ list them as being without person or gender, without finiteness (since it can be used with or without a following tense-bearing auxiliary), indicative, present tense and without aspect. In the descriptions, these words are defined as ?c?hi?-type?, rather than attempt to find an English word to accurately summarise the range of meanings associated with desirability and/or advisability that these words can convey.
SubClass Of
CardinalNumber G CardinalNumber Cardinal Number
Abstract Numeral/type=cardinal Cardinal numbers function as grammatically unmarked determiner-like adjectives (Schmidt 1999: 228). However, they can appear in the oblique plural ? with the same suffix as an unmarked noun ? to express totality (Schmidt 1999: 10-11). There is therefore an additional tag for this (indicated only by O, since there is no oblique singular to make a contrast). In the intermediate tagset I have given their function as determiner, in line with the determiners that are in the pronoun category above. Numerals are to be tagged as below, even if written as figures rather than words (and whatever set of figures are used: Urdu uses both the Western European and the Arabic-Indic digits).
SubClass Of
Case G Case Case
Abstract In the model of the language given by Schmidt, Urdu has three cases, nominative, oblique and vocative. McGregor (1972: 1-2) uses a different classification, treating the vocative as a special form of the oblique case. However, since the special form would still need to be tagged separately, it makes sense to treat it as a vocative case, a phenomenon for which the EAGLES guidelines already allow for. As Schmidt (1999: 7) points out, some grammarians4 have treated Urdu postpositions as being either suffixes or clitics indicating cases, in which case Urdu would possess many more than three cases. However, this is a minority view amongst writers of general grammars: Schmidt (1999), Barz (1977), Bhatia and Koul (2000), McGregor (1972), Bailey et al. (1956) all do not treat postpositions as marking cases. There is an etymological basis for this view. Kellogg (1875: 128-133) reports that the postpositions do not derive from Sanskrit case markers, but rather from independent words (e.g. k?, ?to?, from Sanskrit k?kshe, ?armpit, side?; m?~, ?in?, from Sanskrit madhye, ?middle?, both locative nouns; tak, ?until?, from the Sanskrit past participle tarita, ?passed to?, plus a dative affix ku.). Furthermore, the suffix/clitic approach would require case to be determined across multi-token units, which would breach the design principle of including no multiword tags. It would also have implications for the principle of theoretical neutrality, since it would be necessary to take some standpoint on the subject of whether or not Urdu has ergative case marking, a theoretically controversial point (see 1.1.5.4). Thus I use the nominative-obliquevocative distinction as exemplified below: laRk?, laRk? ?boy(s)? (nominative singular/plural) laRk?, laRk?~ (oblique singular/plural) laRk?, laRk? (vocative singular/plural) (example from Schmidt 1999: 10-12) There is something of an issue with the names of the cases. Vocative is straightforward enough, and is one of the values given for the case attribute in the EAGLES guidelines. Nominative, however, is usually given meaning by its contrast with accusative ? a case that does not exist in Urdu. The nominative may in Urdu be used for either, neither or both of the subject and the direct object. Thus it is not certain whether the nominative in Urdu really corresponds with the nominative that is value 1 in the EAGLES guidelines5. Certainly it does not correspond with the nominative as it exists in, for example, German or Latin. However, I have used value 1 in the intermediate tagset for the Urdu case, on the basis that no Urdu case resembles the nominative in the European languages for which the EAGLES guidelines were devised any more closely than the Urdu nominative. There is no value in the EAGLES guidelines for oblique. Nor is there one for postpositional, locative or instrumental (alternative names used by Bailey et al. 1956 for this case6). Rather than invent an extra value (undesirable for reasons given with regard to markedness above), I have used the value for dative to represent oblique, on the grounds that in some European languages (e.g. German) prepositions frequently govern the dative, and in Urdu postpositions govern the oblique.
SubClass Of
Sub-Classes
CliticExclusiveEmphaticParticle G CliticExclusiveEmphaticParticle Clitic Exclusive Emphatic Particle
Abstract Clitic exclusive emphatic particle ((h)?(~))
SubClass Of
CliticPostposition G CliticPostposition Clitic Postposition
Abstract ?, ?~, h?~ A form of k? added to a pronoun.
SubClass Of
CloseParenthesis G CloseParenthesis Close Parenthesis
SubClass Of
CloseQuotationMark G CloseQuotationMark Close Quotation Mark
SubClass Of
CloseSquareBracket G CloseSquareBracket Close Square Bracket
SubClass Of
Colon G Colon Colon
SubClass Of
Comma G Comma Comma
SubClass Of
CommonNoun G CommonNoun Common Noun
Abstract Noun/Type=common
SubClass Of
CompoundFormingConjunction G CompoundFormingConjunction Compound Forming Conjunction
Abstract Persian compoundforming conjunction (?)
SubClass Of
Conjunction G Conjunction Conjunction
Abstract The EAGLES guidelines suggest that conjunctions be classified firstly for whether they are coordinating or subordinating, and then secondly as one of four coordinating types or one of three subordinating types. I have disregarded the attribute for subordinate-type, since it was developed for German and does not seem relevant to Urdu subordinating conjunction as described by Schmidt (1999: 223-227). Urdu correlative conjunctions (such as bh??bh?, y??y?) do not have initial and non-initial forms, so those features are also not needed. This gives three types of conjunctions: simple coordinating, correlative coordinating, and subordinate. Note that phrases involving the relative j-set of pronouns, adjectives and adverbs are often translated by conjunctions, but are not to be tagged as such.
SubClass Of
Sub-Classes
ContrastiveEmphaticParticle G ContrastiveEmphaticParticle Contrastive Emphatic Particle
Abstract Contrastive emphatic particle t?
SubClass Of
CoordinatingConjunction G CoordinatingConjunction Coordinating Conjunction
SubClass Of
CorrelativeCoordinatingConjunction G CorrelativeCoordinatingConjunction Correlative Coordinating Conjunction
Abstract The EAGLES guidelines (Leech and Wilson 1999: 68) specify that a conjunction is correlative when it is at the start of the first of a pair of correlated clauses. The conjunction at the start of the second half of the pair is then a simple coordinating conjunction (CC)51. This practice will be followed to ensure compliance with the EAGLES guidelines.
SubClass Of
DeadjectivalAdverb G DeadjectivalAdverb Deadjectival Adverb
Abstract In Urdu these are of two sorts: adverbs which are derived from adjectives by inflecting them to their masculine oblique form or adding a Persian or Arabic loaned 44 This use of ?degree? (i.e. inflected superlative or comparative) should be clearly distinguished from the use of ?degree adverb? below (i.e. words with meanings such as ?very?, ?more?). 173 derivational suffix45 (RRJ), and adverbs which are not (RR). While this unfortunately violates the principle of not including derivational information, this distinction has been included in the tagset for two reasons. Firstly, it helps avoid ambiguity, since an adverb derived from an adjective has the same form as that adjective in its masculine singular oblique form (see Schmidt 1999: 57). If adjectival adverbs were marked RR, this would lead to a wide ambiguity between RR and JJM1O, which would make non-adjectival adverbs ambiguous as well! Using a separate tag, there is only an RRJ~JJM1O ambiguity, which significantly reduces the scope of the ambiguity. Although this is a pragmatic consideration which should probably be included at the subtagset level, it involves creating a distinction rather than collapsing one, and must thus exist in the top level tagset. However, there is another motivation for the RRJ tag, which is that it is necessary to maintain theoretical neutrality. It is possible that some analyst might wish to treat the RRJ adverbs as if they were actually adjectives ? that is, identify them with JJ? categories instead of RR. Indeed Bailey et al. (1956: 18) come close to saying this. The principle of theoretical neutrality must here override the principle of excluding derivational information.
SubClass Of
DegreeAdverb G DegreeAdverb Degree Adverb
SubClass Of
DemonstrativeOrInterrogativeOrRelativePronounOrDeterminer G DemonstrativeOrInterrogativeOrRelativePronounOrDeterminer Demonstrative Or Interrogative Or Relative Pronoun Or Determiner
Abstract Third person pronouns/demonstratives, interrogative and relative pronouns and determiners This class of pronouns consists of all those pronouns that fall into the parallel classes of what Schmidt (1999: 39) calls ?symmetrical y-v-k-j word sets?. These classes contain a variety of pronouns and adjectives that are of similar form, the first letter indicating what set they belong to, thus: 161 ? y or a vowel indicates the set of proximal demonstratives (this, now, etc.) ? v or t35 indicates the set of distal demonstratives (that, then, etc.) ? k indicates the set of interrogatives (who, what, how, etc.) ? j indicates the set of relative words (who, where, whither, etc.) Thus, in Urdu there is 1) a significant distinction between proximal and distal words, for which there is no distinction in the EAGLES guidelines; 2) a significant distinction between interrogatives and relatives, which is only made by the EAGLES guidelines at the secondary optional level (the recommended features include only int./rel., presumably on the basis that these have similar forms in many European languages ? the so-called wh-words). This means that the intermediate tags for these pronouns are not as elegant as they might be, and the tags for the y-set and the v-set are the same36. However, I will make this distinction in the Urdu tags, which begin with P followed by the letter of the relevant y-v-k-j set. The proximal and distal demonstratives have not been distinguished for any other language that I am aware of. For example, no English tagset I know of distinguishes here/hither from there/thither. However, most distinguish where/whither from the non-interrogative/relative words. In Urdu, the ?near~far? phonological pattern is much more consistent ? there are no odd pairs such as English this~that ? and is formally of an equal degree to the ?demonstrative~interrogative? distinction. Furthermore, there is a difference of usage between the proximal and distal sets ? the latter are used in correlative clauses where the former are not37. For this reason I tag the four-way distinction, since it would be odd to arbitrarily merge two of what are on a language-internal basis clearly different categories. The pronouns in the y-v-k-j sets are used as demonstrative pronouns and third person personal pronouns (so yah and vah38 mean both ?this? and ?that? and ?he/she/it?). They can also act as determiners within a noun phrase. I have not tagged these uses differently, because this would fall under the heading of syntactic information, which this tagset does not include. See also section 3.4.1.1. I do not, as Schmidt (1999: 38-41) does, characterise the determiner-usage as adjectival, since these pronouns do not display gender agreement, as adjectives (including other members of the y-v-k-j sets) do. They are however marked for case and number39. They also have the peculiarity that their plurals have a third case-like form, which appear solely before the postposition n? (which indicates the subject of an ergative-type clause). This is tagged separately (and, like the proximal/distal distinction, not distinguished in the intermediate tagset, since it is difficult to see how this could be achieved). There are two interrogative pronouns, both beginning in k; one means ?what? and one means ?who?. They both receive the same tags, since tagging an animacy distinction would be odd when this is done nowhere else in the tagset. 37 There is one minor exception to this (Schmidt 1999: 206). 38 These two words are almost always transcribed as y? and v?, which is how they are pronounced. However, the spellings with h are closer to the Perso-Arabic (Bhatia and Koul 2000: 36). 39 However, in the nominative case the singular and plural forms are identical. 163 In the intermediate tagset, following what is done for such pronouns in the example English tagset given in the EAGLES guidelines I give person as zero, and for the k-set words the wh-type is ?240, since ky? may also be exclamatory. The category attribute is both, because these words are both pronouns and determiners. There are also in the y-v-k-j sets a number of words that are more like 165 determiners than pronouns, i.e. they take adjectival inflection and cannot stand alone as pronouns. However they behave in some respects more like adjectives, e.g. they can be predicative rather than attributive. In terms of the EAGLES guidelines they are best characterised within the pronoun/determiner category. They correspond to English words like ?such?, ?this/that much/many? and so on. In terms of the Urdu tagset, I have classified them as JD ? determiner-like adjectives41.
SubClass Of
Sub-Classes
DistalDemonstrativeAdjective G DistalDemonstrativeAdjective Distal Demonstrative Adjective
Abstract There are also in the y-v-k-j sets a number of words that are more like 165 determiners than pronouns, i.e. they take adjectival inflection and cannot stand alone as pronouns. However they behave in some respects more like adjectives, e.g. they can be predicative rather than attributive. In terms of the EAGLES guidelines they are best characterised within the pronoun/determiner category. They correspond to English words like ?such?, ?this/that much/many? and so on. In terms of the Urdu tagset, I have classified them as JD ? determiner-like adjectives41.
SubClass Of
DistalDemonstrativeAdverb G DistalDemonstrativeAdverb Distal Demonstrative Adverb
SubClass Of
DistalDemonstrativeDeadjectivalAdverb G DistalDemonstrativeDeadjectivalAdverb Distal Demonstrative Deadjectival Adverb
SubClass Of
DistalDemonstrativePronoun G DistalDemonstrativePronoun Distal Demonstrative Pronoun
SubClass Of
ExclamationMark G ExclamationMark Exclamation Mark
SubClass Of
ExclusiveEmphaticParticle G ExclusiveEmphaticParticle Exclusive Emphatic Particle
Abstract Exclusive emphatic particle (h?)
SubClass Of
FeminineGender G FeminineGender Feminine Gender
Abstract Gender=feminine
SubClass Of
Finite G Finite Finite
Abstract Finiteness=finite
SubClass Of
Finiteness G Finiteness Finiteness
Abstract The last two attributes, finiteness and mood, are problematic. Firstly, inherent in the EAGLES guidelines is the problem that the mood attribute contains values relevant to both finite and non-finite forms, so that the finiteness attribute becomes redundant. Secondly, the finite/non-finite distinction may be hard to draw in Urdu. The forms described below as participles would traditionally be considered non-finite in European languages. However, in Urdu they have certain features which make them seem more like finite forms. For example, they can occur as the only verb in a main clause, and can agree with a subject or object ? not a property prototypically associated with non-finite forms. These properties are illustrated by the following example from Schmidt (1999: 126)15: unh?~ n? an paRh k? b?t nah 3-PLRL-OBL ERG un educated of-FEM speech not m?n? accept-PERF.PART-FEM-SING They did not accept what the uneducated person said. The verb form m?n? is a participle, but it is the only verb form in the sentence, and it is marked for agreement (with the object, since this clause is of the ergative type). It, like the postposition k?, agrees with the feminine singular noun b?t.
SubClass Of
Sub-Classes
FirstPerson G FirstPerson First Person
Abstract Person=first
SubClass Of
ForeignWord G ForeignWord Foreign Word
SubClass Of
Formula G Formula Formula
SubClass Of
Fraction G Fraction Fraction
Abstract Urdu has a fairly wide range of words for fractions (there are for example words for ?plus one quarter? (sav?), ?less one quarter? (paun, paun?), ?one half? (?dh, ?dh?), ?one and a half? (D?Rh), ?plus one half? (s?Rh?)), which can modify cardinal numerals as well as nouns. They are therefore tagged separately (although the intermediate tags are not all distinct). Most are unmarked, but two are marked. Two others can also function as nouns, in which case they should receive standard noun tagging.
SubClass Of
FullStop G FullStop Full Stop
SubClass Of
FutureTense G FutureTense Future Tense
Abstract Tense=future
SubClass Of
GaAuxiliary G GaAuxiliary Ga Auxiliary
Abstract The form g? indicates future tense when it follows a verb in the subjunctive form. It may also follow the polite imperative as a marker of additional politeness (Bhatia and Koul 2000: 332). It is considered by Schmidt (1999) to be a suffix, although one that is written as a separate word; Bhatia and Koul (2000) go so far as to write the inflected verb and the g? as a single word. However, given that the orthography must lead g? to be treated by the tagging system as a separate token (see 2.2.6.1), and given that the form of the future is otherwise identical to the subjunctive, it makes sense to tag g? separately from the lexical verb. Since g? is marked for gender and number and the subjunctive is marked for person and number, the future would, if treated as a simple rather than a compound tense, be marked for all three of these features ? which is not true of any other simple tense in Urdu. Furthermore, as Schmidt (1999: 94) explains, g? derives from a contraction of the perfective participle of the verb j?n?, ?go?. Therefore, g? is tagged independently. In the intermediate tagset it is considered to be finite, indicative, future, and with zero aspect.
SubClass Of
Gender G Gender Gender
Abstract Urdu has two genders, masculine and feminine. Some nouns are marked for gender, whereas others are not3. This means that there is in effect a four-way distinction among nouns: masculine marked, masculine unmarked, feminine marked and feminine unmarked. For example: r?payah ?money? (marked masculine) ghar ?house? (unmarked masculine) bacc? ?female child? (marked feminine) kit?b ?book? (unmarked feminine) (examples from Schmidt 1999: 1-2.) Note that since some unmarked nouns coincidentally display the suffixes typical of marked nouns, the diagnostic feature of a marked noun is that its plural inflection follows that of the marked nouns (e.g. masculine ?? changing to ??, feminine ?? to ?iy?~, and so on). This four-way split could be encoded into a tagset in two ways: by creating two new values for the gender attribute (the EAGLES guidelines have only masculine, feminine, neuter, and common) or by creating a new markedness attribute with two values, 1 = marked for gender and 2 = not marked for gender. The latter approach has been followed since it will almost certainly be easier for software processing the intermediate tagset to ignore an entire attribute than to work out what to do about values it does not recognise in existing attributes. This is especially the case if the extra attribute is added at the end of the tag, as I have done.
SubClass Of
Sub-Classes
GenderMarking G GenderMarking Gender Marking
Abstract Urdu has two genders, masculine and feminine. Some nouns are marked for gender, whereas others are not3. This means that there is in effect a four-way distinction among nouns: masculine marked, masculine unmarked, feminine marked and feminine unmarked. For example: r?payah ?money? (marked masculine) ghar ?house? (unmarked masculine) bacc? ?female child? (marked feminine) kit?b ?book? (unmarked feminine) (examples from Schmidt 1999: 1-2.) Note that since some unmarked nouns coincidentally display the suffixes typical of marked nouns, the diagnostic feature of a marked noun is that its plural inflection follows that of the marked nouns (e.g. masculine ?? changing to ??, feminine ?? to ?iy?~, and so on). This four-way split could be encoded into a tagset in two ways: by creating two new values for the gender attribute (the EAGLES guidelines have only masculine, feminine, neuter, and common) or by creating a new markedness attribute with two values, 1 = marked for gender and 2 = not marked for gender. The latter approach has been followed since it will almost certainly be easier for software processing the intermediate tagset to ignore an entire attribute than to work out what to do about values it does not recognise in existing attributes. This is especially the case if the extra attribute is added at the end of the tag, as I have done.
SubClass Of
Sub-Classes
GeneralAdverb G GeneralAdverb General Adverb
Abstract Adverb/Adverb-Type=general "lexical adverb" Lexical adverbs In Urdu these are of two sorts: adverbs which are derived from adjectives by inflecting them to their masculine oblique form or adding a Persian or Arabic loaned 44 This use of ?degree? (i.e. inflected superlative or comparative) should be clearly distinguished from the use of ?degree adverb? below (i.e. words with meanings such as ?very?, ?more?). 173 derivational suffix45 (RRJ), and adverbs which are not (RR). While this unfortunately violates the principle of not including derivational information, this distinction has been included in the tagset for two reasons. Firstly, it helps avoid ambiguity, since an adverb derived from an adjective has the same form as that adjective in its masculine singular oblique form (see Schmidt 1999: 57). If adjectival adverbs were marked RR, this would lead to a wide ambiguity between RR and JJM1O, which would make non-adjectival adverbs ambiguous as well! Using a separate tag, there is only an RRJ~JJM1O ambiguity, which significantly reduces the scope of the ambiguity. Although this is a pragmatic consideration which should probably be included at the subtagset level, it involves creating a distinction rather than collapsing one, and must thus exist in the top level tagset. However, there is another motivation for the RRJ tag, which is that it is necessary to maintain theoretical neutrality. It is possible that some analyst might wish to treat the RRJ adverbs as if they were actually adjectives ? that is, identify them with JJ? categories instead of RR. Indeed Bailey et al. (1956: 18) come close to saying this. The principle of theoretical neutrality must here override the principle of excluding derivational information. The EAGLES intermediate tags for RR and RRJ are the same.
SubClass Of
Sub-Classes
GeneralAuxiliary G GeneralAuxiliary General Auxiliary
SubClass Of
HonaAuxiliary G HonaAuxiliary Hona Auxiliary
Abstract The verb h?n?, ?be?, is the auxiliary with the greatest range of application: the Urdu compound tenses are formed with it, and it has other uses, such as the copula. It can also be the sole verb of a main clause, but as explained above (section 3.2) it will be tagged the same whether it is a main verb or an auxiliary. The following examples from Schmidt (1999: 94, 120, 126) demonstrate the range of h?n?: ?j mai~ daftar m?~ nah?~ h?~ today 1-SING-NOM office in not be-PRES-1-SING Today I am not in the office (h?n? as copula with postpositional phrase) kal mausam acch? th? yesterday weather good-MASC-SING-NOM be-PAST-MASC-SING Yesterday the weather was fine (h?n? as copula with adjective) ham far? par s?t? hai~ 1-PLRL-NOM floor on sleep-IMPERF.PART-MASC-PLRL be-PRES-1- PLRL we sleep on the floor (h?n? as auxiliary marking the habitual present with imperfective participle) b?ri? h?? hai rain be-PERF.PART-FEM-SING be-PRES-3-SING It has rained (h?n? as auxiliary marking immediate past with perfective participle of h?n? as main verb; more literal translation would be ?There has been rain?) Some of the parts of h?n? are equivalent to the parts of lexical verbs; this being so, their tags are the same for those of lexical verbs, except that they commence in VH? instead of VV?. In the intermediate tagset, this difference is expressed by the verbs being marked as auxiliary instead of main. Unfortunately, Schmidt (1999) does not give a full listing of all the forms of h?n?, and I was forced to use other methods as outlined in 2.3. The first recourse was to refer to other works ? in this case Bailey 133 et al. (1956). However, there were still gaps in the listing of forms of h?n?. When initially composing the tagset, I was forced by the underspecification in the literature to infer the existence and shape of some forms of the infinitive and imperative. In the case of an irregular verb like h?n?, implying its forms on the basis of regular verbal inflections involves making unwarranted assumptions. Therefore, these forms were treated as highly provisional in nature until the stage of manual tagging was undertaken (as described in the next chapter). At this point, it was possible to find examples in tagged texts for most of the forms. The polite imperative was a very notable exception to this. It did not occur in any of the manually tagged texts, and of two native speaker informants consulted on the issue, one concluded that the form h?iy? was not possible. However, the other informant suggested that it was possible. This being the case, the VHIA tag stands ? since there can be no harm in maintaining the parallelism with other verbs even if this form is rare to vanishing point. The past participle of h?n?, as with that of other verbs, can be used alone as a simple past tense. The participial tags above would be used in this case. However, there is also an irregular inflected simple past tense ? which, as might be expected, differs slightly in its meaning (Bailey et al. 1956: 109; Barz 1977: 48-49 considers this to be an instance of two separate verbs with the same infinitive22). There is, in addition, an irregular inflected simple present tense (the only one in the whole language). These inflected forms are the basis of the compound tense system and both require separate tags, as follows. Like the regular inflected subjunctive mood, the present indicative of h?n? is marked for person and number but not gender. The intermediate tags for the present tense are the same for those of the subjunctive except that the mood is indicative. In the mnemonic tags I use H to indicate the present tense, since this tense is entirely characteristic of h?n?. The irregular past tense is marked for gender and number in the same way as a perfective participle, but it is a finite form. The intermediate tags are the same as those for the present tense, except that 1) gender is not zero, 2) person is zero, and 3) tense is past rather than present.
SubClass Of
HonorificSecondPerson G HonorificSecondPerson Honorific Second Person
Abstract tag. The existence of a second person honorific form does not undermine the general principle, stated above, that the ?p pronoun takes a third person verb form since, in the imperative, there is no third person, and the subject is not expressed anyway. For the purposes of the intermediate tagset the tense is considered to be present, and the number of the honorific form is considered to be ( 1 | 2 ), since both singular and plural ?subjects? are possible. This also serves to distinguish the VVIA tag in the intermediate tagset. The mnemonic ?A? is the same as that used for the ?p pronoun, and thus refers to politeness.
SubClass Of
Imperative G Imperative Imperative
Abstract There are three simple imperative forms: second person singular (which is identical to the ?root? form), second person plural (which is identical to the second person plural subjunctive form) and second person honorific. Each of these receives a separate tag. The existence of a second person honorific form does not undermine the general principle, stated above, that the ?p pronoun takes a third person verb form since, in the imperative, there is no third person, and the subject is not expressed anyway. For the purposes of the intermediate tagset the tense is considered to be present, and the number of the honorific form is considered to be ( 1 | 2 ), since both singular and plural ?subjects? are possible. This also serves to distinguish the VVIA tag in the intermediate tagset. The mnemonic ?A? is the same as that used for the ?p pronoun, and thus refers to politeness.
SubClass Of
ImperativeMood G ImperativeMood Imperative Mood
Abstract Mood=imperative
SubClass Of
ImperfectiveAspect G ImperfectiveAspect Imperfective Aspect
Abstract Aspect=imperfective
SubClass Of
ImperfectiveParticiple G ImperfectiveParticiple Imperfective Participle
Abstract Urdu has two participles, the imperfective and the perfective. However, unlike participles in many European languages, they can be used as the sole verb of a main clause. This creates the tenses referred to as the irrealis and the simple past respectively. However, the presence or absence of an auxiliary makes no difference to the form of the participle. It would therefore be misleading to use two tags for a single form of the verb. These tags are thus used for both finite and non-finite, and the notions of irrealis and simple past are not referred to in the precise definitions of the tags. The dual finite and non-finite nature of the tags is indicated in the intermediate tagset using the OR operator, | . There is a value in the EAGLES tagset for past tense, but there is not one for irrealis. The closest approximation to an irrealis in the EAGLES guidelines is subjunctive past (see the discussion of this point in 3.2 above). This is not a perfect solution, but without adding extra values to the intermediate tagset it is the best that can be managed. Thus, the imperfective is finite subjunctive past with zero aspect or non-finite participle imperfective with zero tense. The perfective is finite indicative18 past with zero aspect or non-finite participle perfective with zero tense. The participles are not marked for person, but are marked for gender and 18 It is hard to justify this use of ?indicative?, since Urdu lexical verbs do not possess any indicative form as such. Therefore the notion of the indicative is not used in the definitions of the tags themselves, but only in the intermediate tagset (where something is needed to distinguish the finite use of the perfective participle from the finite use of the imperfective participle). 123 number. Their inflection is the same as that of adjectives, except that in some circumstances a distinction is made between feminine singular and plural which is not made by adjectives. Participles can also function as adjectives (see discussion of adjectives in 3.3 below), in which case this extra feminine singular/feminine plural distinction is not made (though this does not affect the tagging). That is to say, an adjective which agrees with a feminine plural noun or pronoun will always receive an F2 tag, regardless of whether it has the plural ending ??~ or the more general feminine ending ??. When participles are used as adjectives, it would in theory be possible to tag them as if they were adjectives. However, this has not been done, since even when being used attributively, participles appear in structures that normal adjectives do not. For example, they frequently occur in participial phrases with the perfective participle of the auxiliary verb h?n? (see below). When used adjectivally rather than verbally, participles may be marked for case as well as number and gender. This feature is also included in the tagset. Of course, the feature case only applies to the non-finite usage of the participle; this is reflected in the intermediate tagset by the use of ( 0 | 1 ) for the nominative or finite form. As with adjectives (see below), the ?oblique? case is ( 3 | 5 ) in the intermediate tagset. The characters Y and T have been used for the perfective and imperfective participles respectively, since these are the consonants that indicate the suffixes for these forms19.
SubClass Of
InclusiveEmphaticParticle G InclusiveEmphaticParticle Inclusive Emphatic Particle
Abstract Inclusive emphatic particle (bh?)
SubClass Of
IndefiniteDeterminer G IndefiniteDeterminer Indefinite Determiner
Abstract There is also a tag for indefinite determiners. Two words in this class are zy?dah ?more? and k?f? ?enough?. Following Schmidt (1999) these are classed broadly as adjectives for two reasons: to keep them in line with the possessive adjectives, which are determiners; and because they can also function as adverbs (see section 3.6 below), which is characteristic of adjectives. These are not marked for gender, number or case.
SubClass Of
IndefinitePronoun G IndefinitePronoun Indefinite Pronoun
Abstract In this miscellaneous group of pronouns are included two indefinite pronouns, k?? and kuch, which may function as pronouns or determiners (just as yah and vah do). Also included in the PN* category is sab, ?all?, which has an inflected oblique plural (like numerals ? see section 3.9) which is tagged as PNO.
SubClass Of
IndicativeMood G IndicativeMood Indicative Mood
Abstract Mood=indicative
SubClass Of
Infinitive G Infinitive Infinitive
Abstract The infinitive of the verb is regularly formed. Mostly it is used as a verbal noun or as part of a complex verb phrase. It is also used as a neutral request form, in which case it is the main verb of its clause; however, I do not think that this usage is 121 sufficient to justify separate tagging; this is better treated example of a secondary usage of the same word, rather than a separate word (which giving it a separate tag would imply). The ?default? ending of the infinitive is ?n?, which is a masculine singular ending. When used as a noun it may occur in the oblique case; when it occurs in a verb phrase it may display gender and number agreement (in a similar way to an adjective). However these conditions cannot both occur17; therefore there is no feminine oblique or plural oblique, which reduces the number of tags necessary. There is a problem creating the intermediate tagset: inasmuch as there is no attribute for ?case? in the EAGLES guidelines for verbs (presumably non-finite verb forms in European languages do not display case inflection). An attribute, case, has therefore been added to the end of the intermediate tags. Otherwise this set of intermediate tags is fairly unproblematic. The ?N? in the mnemonic tags is derived from the ?n? suffix that indicates the infinitive.
SubClass Of
InfinitiveMood G InfinitiveMood Infinitive Mood
Abstract Mood=infinitive
SubClass Of
Interjection G Interjection Interjection
Abstract The EAGLES guidelines do not recommend any additional attributes for the class of interjections. Nor have I introduced any of my own. There is thus one tag. The mnemonic tag represent the spelling of ? (Schmidt 1999: 217), which has been selected as a representative interjection.
SubClass Of
InterrogativeAdjective G InterrogativeAdjective Interrogative Adjective
Abstract There are also in the y-v-k-j sets a number of words that are more like 165 determiners than pronouns, i.e. they take adjectival inflection and cannot stand alone as pronouns. However they behave in some respects more like adjectives, e.g. they can be predicative rather than attributive. In terms of the EAGLES guidelines they are best characterised within the pronoun/determiner category. They correspond to English words like ?such?, ?this/that much/many? and so on. In terms of the Urdu tagset, I have classified them as JD ? determiner-like adjectives41.
SubClass Of
InterrogativeAdverb G InterrogativeAdverb Interrogative Adverb
SubClass Of
InterrogativeDeadjectivalAdverb G InterrogativeDeadjectivalAdverb Interrogative Deadjectival Adverb
SubClass Of
InterrogativePronoun G InterrogativePronoun Interrogative Pronoun
SubClass Of
Izafat G Izafat Izafat
Abstract The iz?fat is a Persian enclitic (pronounced as a shorter form of ???) which in some circumstances can be considered a preposition: it links two nouns in a possessive relationship, although the phrase thus produced may often have a different meaning to a phrase produced with the native Urdu postposition k?. However, the iz?fat may also join a noun to an adjective, in which case it is not so clearly accurate to describe it as a preposition parallel to the prepositions in European languages for which the EAGLES guidelines were compiled. A better way to treat iz?fat is in the context of the Unique category of miscellaneous one-member wordclasses, discussed below.
SubClass Of
Letter G Letter Letter
SubClass Of
LexicalVerb G LexicalVerb Lexical Verb
Abstract The EAGLES guidelines do not consider lexical and auxiliary verbs to be separate major parts of speech, although this is a view that some have held (e.g. the ICE tagset ? Greenbaum and Yibin 1996). However, in Urdu this distinction is very significant, since auxiliary forms pattern differently to the forms of lexical verbs. Therefore, this tagset will employ a high-level (but not top-level) distinction between lexical verbal elements (whose tags will commence with VV) and non-lexical or auxiliary verbal elements (whose tags will commence with V and one other letter ? either one indicating what word it is, for auxiliary verbs whose inflectional behaviour is anomalous, or X for a general auxiliary). Thus both the EAGLES guidelines and the demands of Urdu morphology are complied with. There exist in Urdu two widely applicable derivational suffixes which attach to the root of a lexical verb and increase its valence, making it transitive or causative in sense. This has been highlighted as a significant feature of the language (e.g. by Kachru 1990: 63)and is described in some detail by Schmidt (1999: 87, 157-175). It might be possible to distinguish such derived verbs from non-derived verbs in the tagset, but I do not, because of the design principle that no derivational information should be included. Furthermore, such a distinction would be difficult to automate, and also probably difficult for humans to annotate. Lexical verbs occur in a number of inflected forms. The names of these forms are perhaps not very useful, since each of them has a variety of uses hard to capture by one of the traditional grammatical category names. However, rather than resort to letters or numbers which would be unlinkable to any previous writing on the Urdu verb, I use the same names for the forms as Schmidt (1999), as I have been doing thus far in this thesis.
SubClass Of
Sub-Classes
MarkedForGender G MarkedForGender Marked For Gender
Abstract Markedness=1
SubClass Of
MasculineGender G MasculineGender Masculine Gender
Abstract Gender=masculine
SubClass Of
ModalAdverb G ModalAdverb Modal Adverb
SubClass Of
Sub-Classes
Mood G Mood Mood
Abstract The last two attributes, finiteness and mood, are problematic. Firstly, inherent in the EAGLES guidelines is the problem that the mood attribute contains values relevant to both finite and non-finite forms, so that the finiteness attribute becomes redundant. Secondly, the finite/non-finite distinction may be hard to draw in Urdu. The forms described below as participles would traditionally be considered non-finite in European languages. However, in Urdu they have certain features which make them seem more like finite forms. For example, they can occur as the only verb in a main clause, and can agree with a subject or object ? not a property prototypically associated with non-finite forms. These properties are illustrated by the following example from Schmidt (1999: 126)15: unh?~ n? an paRh k? b?t nah 3-PLRL-OBL ERG un educated of-FEM speech not m?n? accept-PERF.PART-FEM-SING They did not accept what the uneducated person said. The verb form m?n? is a participle, but it is the only verb form in the sentence, and it is marked for agreement (with the object, since this clause is of the ergative type). It, like the postposition k?, agrees with the feminine singular noun b?t. A third problem with the mood distinctions made in the EAGLES guidelines is that they are not necessarily those made by Urdu. For example, Urdu has forms which 15 Schmidt does not give word-by-word glosses, only whole-sentence translations. I have added the glosses using Schmidt (1999) and Haq (2001) as guides. See also Appendix 2. 117 may be described as subjunctive and imperative moods, but it would seem to lack an indicative (except for the auxiliary h?n?). Because of these difficulties, the concepts of finiteness and mood will not be used to structure the tagset itself, although they are of course inevitable as attributes in the intermediate tagset16. This means that in some cases, the intermediate tagset values used to characterise some Urdu verb forms are somewhat arbitrary, since I have had to simply pick the values that seem closest to describing Urdu. For example, considering the ?irrealis tense? (the term used by Schmidt 1999 for the finite use of the imperfective participle) to be a past tense subjunctive is not warranted by the Urdu verbal system. It was picked as the ?least bad? way to characterise it simply because the Urdu irrealis has a usage similar to that of the past subjunctive in languages included in EAGLES such as German and (vestigially) English (e.g. ich w?re, I were). For example, Schmidt (1999) translates a sentence from the poet Ghalib as follows: agar aur j?t? raht? if and alive-MASC-PLRL stay-IMPERF.PART-MASC-PLRL yah? intiz?r h?t? this-very waiting be-IMPERF.PART-MASC-SING If I were to live longer it would only be to wait like this The presence in the translation of the past tense subjunctive (?I were?) in the first ? but not the second ? of two clauses containing the finite imperfective participle demonstrates the partial parallelism between an Urdu irrealis and an English past subjunctive.
SubClass Of
Sub-Classes
MultiplicativeMarker G MultiplicativeMarker Multiplicative Marker
Abstract Multiplicative marker (gun?)
SubClass Of
NegativeModalAdverb G NegativeModalAdverb Negative Modal Adverb
Abstract particles that mark tense, aspect and negation, cf. Schmidt (1999, p.69f.)
SubClass Of
NeutralQuotation G NeutralQuotation Neutral Quotation
SubClass Of
NominativeCase G NominativeCase Nominative Case
Abstract Case=nominative Nominative is usually given meaning by its contrast with accusative ? a case that does not exist in Urdu. The nominative may in Urdu be used for either, neither or both of the subject and the direct object. Thus it is not certain whether the nominative in Urdu really corresponds with the nominative that is value 1 in the EAGLES guidelines. (Barz (1977) and McGregor (1972) actually call the nominative case the direct case.) Certainly it does not correspond with the nominative as it exists in, for example, German or Latin. However, I have used value 1 in the intermediate tagset for the Urdu case, on the basis that no Urdu case resembles the nominative in the European languages for which the EAGLES guidelines were devised any more closely than the Urdu nominative.
SubClass Of
NonFinite G NonFinite Non Finite
Abstract Finiteness=non-finite
SubClass Of
NongrammaticalLexicalElement G NongrammaticalLexicalElement Nongrammatical Lexical Element
Abstract Nongrammatical lexical element Words that contain an orthographic space which does not actually represent a word break ? principally Persian loans such as zimmah d?r, ?responsible?, x?b tar?n, ?best?, and ham z?t, ?of the same caste?63 ? cause a problem for tokenisation as described in 2.2.6.1. This was solved by the decision to treat every orthographic space as a word break, so that zimmah d?r, etc., are treated as two tokens. However, this leads to another problem, greater if anything, concerned with tagging. How are the two elements to be tagged? 62 This problem is referred to as such because it was first encountered during an attempt to manually tag a sentence from Schmidt (1999) containing the word zimmah d?r using an early trial version of the tagset. 63 All examples from Schmidt (1999: 248-256). 193 As it happens, zimmah, x?b and z?t are independent words (?duty?, ?good? and ?caste? respectively) and could be given the appropriate tags, nominal and adjectival. The problem then becomes, what to do with d?r, t?r?n and ham? The former two could be given some tag to indicate that they were adjective forming clitics or affixes, and the prefix ham could be marked up as an adverb (according to Haq 2001 the part of speech of ham when it occurs independently). However, this has two drawbacks. Firstly, it breaks with the design principle that no derivational information will be included in the tagset by analysing the component morphemes of complex words ? for zimmah d?r etc. are words, not phrases. The word zimmah d?r?, ?responsibility?, is clear evidence of this ? it has been created by a morphological process (suffixation of ??) and morphological processes apply to words, not to syntactic phrases64. Also, the single word zimmah d?r has been given two tags in this approach ? a contravention of the ?one word, one tag? principle65. Secondly, it introduces inconsistency into the tagging. The derivational information would be present for some words formed with the relevant Persian derivational morphemes, but not for all, because not all words formed with them contain the superfluous orthographic token break. Examples of single-token derived words include samajhd?r, ?sensible?, kamtar?n, ?least?, and hamdard?, ?sympathy?. If zimmah and d?r are to be tagged separately, then for consistency samajh would also have to be tagged separately ? opening up whole vistas of morphological analysis that are utterly irrelevant to part-of-speech tagging. Indeed, going down this road subverts the entire enterprise: we would find ourselves engaged in derivational analysis instead of morphosyntactic analysis. To take the opposite approach to tagging zimmah d?r, we might mark a single tag for the whole word (JJU in this case) ? however this also breaks the ?one word, one tag? principle as there is now an untagged token and multiword tag. The best solution to the problem (although far from ideal) would seem to be to use some kind of special tag on the first part of the two-token word to indicate that this is a case of the zimmah d?r problem, and put the tag we would like to give to the whole thing on the second token66. This tag will be LL, the ?nongrammatical lexical element? listed in the previous section, and it will be applied thus67: zimmah_LL d?r_JJU samajhd?r_JJU x?b_LL tar?n_JJU kamtar?n_JJU ham_LL z?t_JJU hamdard?_NNUF1N The first element is described as a nongrammatical lexical element because while it does not contribute to the morphosyntax of the two-token word, it does contribute to its meaning. Therefore it is entirely lexical in nature. It is to be hoped 66 Since d?r, tar?n and other affixes involved in the zimmah d?r problem are derivational suffixes, it is they that determine the part of speech; thus it makes sense for them to carry the actual tag. 67 I use an underscore format to link the words and their tags for clarity in the examples given here; in practice an XML/SGML markup would be used. 195 that the usage of the LL tag can be restricted to one context: alongside a relatively small number of affixes such as d?r.
SubClass Of
NonLexicalAdverb G NonLexicalAdverb Non Lexical Adverb
SubClass Of
Sub-Classes
NonPersoArabicString G NonPersoArabicString Non Perso Arabic String
SubClass Of
Noun G Noun Noun
Abstract The EAGLES guidelines give four recommended attributes for nouns: type, gender, number and case. There are also two optional attribute, countability and definiteness. Type refers to whether a noun is common (denotes one or more members of a class of things2) or proper (is the name of one or more particular things). This attribute is an example of one which is marginal to morphosyntax, but should be included since the distinction between common and proper might well prove useful to some future linguistic investigation of the text. It has been included in the tagset for now, but with the reservation that it might have to be collapsed in any subtagset for automatic tagging. This is because there may well not be any way for the tagger to make this distinction. Unlike the Roman, Greek and Cyrillic alphabets, the Urdu alphabet has no uppercase letters. In the European languages for which the EAGLES guidelines were designed, which use one of the former alphabets, uppercase letters are often used to identify proper nouns. It is clear that no such simple rule could be employed in Urdu. Furthermore there are no articles in Urdu (Bhatia and Koul 2000: 318), the absence and presence of an article being typical of proper and common nouns respectively in English and similar languages.
SubClass Of
Sub-Classes
Number G Number Number
Abstract Urdu has two numbers, singular and plural. This is well agreed on (Schmidt 1999: 1; Bhatia and Koul 2000: 314; Barz 1977: 36; Bailey et al. 1956: 1, 5). The EAGLES guidelines on noun number allow for exactly this possibility, and thus have been implemented unproblematically.
SubClass Of
Sub-Classes
Numeral G Numeral Numeral
Abstract The EAGLES guidelines give numerals as a separate major part-of-speech, but 51 In fact the EAGLES guidelines on this point are significantly more complicated. However, the remainder of the recommendations are concerned with handling phenomena that do not occur in Urdu. 181 say that ?In some languages (e.g. Portuguese) this category is not normally considered to be a separate part of speech, because it can be subsumed under others? We recognise that in some tagsets Numeral may therefore occur as subcategory within other parts of speech? (Leech and Wilson 1999: 65). This approach seems sensible for Urdu, where numerals display very much the behaviour of adjectives. However, for purposes of the intermediate tagset, the numeral class has been used, since it contains the very useful attribute type. In fact, all the EAGLES attributes have been used (though of course, not all of their values). For case, the oblique / vocative value ( 3 | 5 ) is used, as with adjectives.
SubClass Of
Sub-Classes
ObliqueCase G ObliqueCase Oblique Case
Abstract Case=dative There is no value in the EAGLES guidelines for oblique. Nor is there one for postpositional, locative or instrumental (alternative names used by Bailey et al. 1956 for this case6). Rather than invent an extra value (undesirable for reasons given with regard to markedness above), I have used the value for dative to represent oblique, on the grounds that in some European languages (e.g. German) prepositions frequently govern the dative, and in Urdu postpositions govern the oblique.
SubClass Of
ObliqueOrVocativeCase G ObliqueOrVocativeCase Oblique Or Vocative Case
Abstract As far as marked adjectives are concerned, there is again the problem of tagto- meaning many-to-one and one-to-many mapping ? but with adjectives it is, if anything, even greater a problem than it was with nouns. There is no oblique-vocative distinction at all (Schmidt 1999: 36 goes so far as to say that ?An adjective modifying a vocative noun is in the oblique case?) ... Thus the tagset does not distinguish vocative adjectives from oblique adjectives (or participle forms of verbs: see above). In the intermediate tagset, this is represented using the OR and bracket operators, as described in the EAGLES guidelines (Leech and Wilson 1999: 71), as ( 3 | 5 ).
OpenParenthesis G OpenParenthesis Open Parenthesis
SubClass Of
OpenQuotationMark G OpenQuotationMark Open Quotation Mark
SubClass Of
OpenSquareBracket G OpenSquareBracket Open Square Bracket
SubClass Of
OrdinalNumber G OrdinalNumber Ordinal Number
Abstract Numeral/Type=ordinal
SubClass Of
OtherPronounOrDeterminer G OtherPronounOrDeterminer Other Pronoun Or Determiner
Abstract Other pronouns and determiners In this miscellaneous group of pronouns are included two indefinite pronouns, k?? and kuch, which may function as pronouns or determiners (just as yah and vah do). Also included in the PN* category is sab, ?all?, which has an inflected oblique plural (like numerals ? see section 3.9) which is tagged as PNO. There is also a tag for indefinite determiners. Two words in this class are zy?dah ?more? and k?f? ?enough?. Following Schmidt (1999) these are classed broadly as adjectives for two reasons: to keep them in line with the possessive adjectives, which are determiners; and because they can also function as adverbs (see section 3.6 below), which is characteristic of adjectives. These are not marked for gender, number or case.
SubClass Of
Sub-Classes
OtherSymbol G OtherSymbol Other Symbol
SubClass Of
OtherUnclassifiableNonUrduElement G OtherUnclassifiableNonUrduElement Other Unclassifiable Non Urdu Element
SubClass Of
Participle G Participle Participle
Abstract Urdu has two participles, the imperfective and the perfective. However, unlike participles in many European languages, they can be used as the sole verb of a main clause. This creates the tenses referred to as the irrealis and the simple past respectively. However, the presence or absence of an auxiliary makes no difference to the form of the participle. It would therefore be misleading to use two tags for a single form of the verb. These tags are thus used for both finite and non-finite, and the notions of irrealis and simple past are not referred to in the precise definitions of the tags. The dual finite and non-finite nature of the tags is indicated in the intermediate tagset using the OR operator, | . There is a value in the EAGLES tagset for past tense, but there is not one for irrealis. The closest approximation to an irrealis in the EAGLES guidelines is subjunctive past (see the discussion of this point in 3.2 above). This is not a perfect solution, but without adding extra values to the intermediate tagset it is the best that can be managed. Thus, the imperfective is finite subjunctive past with zero aspect or non-finite participle imperfective with zero tense. The perfective is finite indicative18 past with zero aspect or non-finite participle perfective with zero tense. The participles are not marked for person, but are marked for gender and 18 It is hard to justify this use of ?indicative?, since Urdu lexical verbs do not possess any indicative form as such. Therefore the notion of the indicative is not used in the definitions of the tags themselves, but only in the intermediate tagset (where something is needed to distinguish the finite use of the perfective participle from the finite use of the imperfective participle). 123 number. Their inflection is the same as that of adjectives, except that in some circumstances a distinction is made between feminine singular and plural which is not made by adjectives. Participles can also function as adjectives (see discussion of adjectives in 3.3 below), in which case this extra feminine singular/feminine plural distinction is not made (though this does not affect the tagging). That is to say, an adjective which agrees with a feminine plural noun or pronoun will always receive an F2 tag, regardless of whether it has the plural ending ??~ or the more general feminine ending ??. When participles are used as adjectives, it would in theory be possible to tag them as if they were adjectives. However, this has not been done, since even when being used attributively, participles appear in structures that normal adjectives do not. For example, they frequently occur in participial phrases with the perfective participle of the auxiliary verb h?n? (see below). When used adjectivally rather than verbally, participles may be marked for case as well as number and gender. This feature is also included in the tagset. Of course, the feature case only applies to the non-finite usage of the participle; this is reflected in the intermediate tagset by the use of ( 0 | 1 ) for the nominative or finite form. As with adjectives (see below), the ?oblique? case is ( 3 | 5 ) in the intermediate tagset. The characters Y and T have been used for the perfective and imperfective participles respectively, since these are the consonants that indicate the suffixes for these forms19.
Sub-Classes
ParticipleMood G ParticipleMood Participle Mood
Abstract Mood=participle
SubClass Of
PartOfSpeech G PartOfSpeech Part Of Speech
SubClass Of
Sub-Classes
PastTense G PastTense Past Tense
Abstract Tense=past
SubClass Of
PerfectiveAspect G PerfectiveAspect Perfective Aspect
Abstract Aspect=perfective
SubClass Of
PerfectiveParticiple G PerfectiveParticiple Perfective Participle
Abstract Urdu has two participles, the imperfective and the perfective. However, unlike participles in many European languages, they can be used as the sole verb of a main clause. This creates the tenses referred to as the irrealis and the simple past respectively. However, the presence or absence of an auxiliary makes no difference to the form of the participle. It would therefore be misleading to use two tags for a single form of the verb. These tags are thus used for both finite and non-finite, and the notions of irrealis and simple past are not referred to in the precise definitions of the tags. The dual finite and non-finite nature of the tags is indicated in the intermediate tagset using the OR operator, | . There is a value in the EAGLES tagset for past tense, but there is not one for irrealis. The closest approximation to an irrealis in the EAGLES guidelines is subjunctive past (see the discussion of this point in 3.2 above). This is not a perfect solution, but without adding extra values to the intermediate tagset it is the best that can be managed. Thus, the imperfective is finite subjunctive past with zero aspect or non-finite participle imperfective with zero tense. The perfective is finite indicative18 past with zero aspect or non-finite participle perfective with zero tense. The participles are not marked for person, but are marked for gender and 18 It is hard to justify this use of ?indicative?, since Urdu lexical verbs do not possess any indicative form as such. Therefore the notion of the indicative is not used in the definitions of the tags themselves, but only in the intermediate tagset (where something is needed to distinguish the finite use of the perfective participle from the finite use of the imperfective participle). 123 number. Their inflection is the same as that of adjectives, except that in some circumstances a distinction is made between feminine singular and plural which is not made by adjectives. Participles can also function as adjectives (see discussion of adjectives in 3.3 below), in which case this extra feminine singular/feminine plural distinction is not made (though this does not affect the tagging). That is to say, an adjective which agrees with a feminine plural noun or pronoun will always receive an F2 tag, regardless of whether it has the plural ending ??~ or the more general feminine ending ??. When participles are used as adjectives, it would in theory be possible to tag them as if they were adjectives. However, this has not been done, since even when being used attributively, participles appear in structures that normal adjectives do not. For example, they frequently occur in participial phrases with the perfective participle of the auxiliary verb h?n? (see below). When used adjectivally rather than verbally, participles may be marked for case as well as number and gender. This feature is also included in the tagset. Of course, the feature case only applies to the non-finite usage of the participle; this is reflected in the intermediate tagset by the use of ( 0 | 1 ) for the nominative or finite form. As with adjectives (see below), the ?oblique? case is ( 3 | 5 ) in the intermediate tagset. The characters Y and T have been used for the perfective and imperfective participles respectively, since these are the consonants that indicate the suffixes for these forms19.
SubClass Of
Person G Person Person
Abstract Urdu has the three normal persons given in the EAGLES guidelines, each in singular and plural forms. Schmidt (1999: 97) suggests that Urdu verbs also have an additional polite or honorific form, which although second person in meaning (it agrees with a pronoun ?p that refers to one or more interlocutors) is identical to the third person plural form of the verb. In this case I have deviated from the model described by Schmidt, for reasons discussed in my treatment of the ?p pronoun in section 3.4.1.2. There will be no tags for honorific verbal forms, and verb forms which agree with ?p will be tagged as third person forms. The exception to this is the imperative, discussed in the next section.
SubClass Of
Sub-Classes
PersonalPronoun G PersonalPronoun Personal Pronoun
Abstract The issue of what exactly constitutes a personal pronoun is not an easy one in the context of the grammar of Urdu as presented by Schmidt (1999). Therefore, in this section, before discussing the tags of the personal pronouns I elaborate on how I drew the boundary of this category, justifying the minor claim that the pronouns vah and yah (and their various inflected forms) are not personal pronouns, as stated by Schmidt (1999)29. I first consider these third person pronouns (3.4.1.1), and subsequently the problematic honorific pronoun ?p (3.4.1.2). In 3.4.1.3 I deal with the tagging of mai~ and t?, the remaining words in the category of personal pronouns. 3.4.1.1 The non-existence of third person personal pronouns Urdu has no third person personal pronouns. The demonstrative pronouns/determiners are used in their place. This is claimed contrary to Schmidt, who states (1999: 15) that ?The demonstrative pronouns ye and vo are identical in form to the personal pronouns ye and vo (meaning ?he?, ?she?, ?it?)?. However the differences in behaviour between these pronouns and the first and second person pronouns that I list below, also drawn from Schmidt, make it clear that the statement that began this section is justified. There are absolutely no differences in case / number inflection between the third person pronouns and the demonstratives (Schmidt 1999: 16) ? In a perfective transitive sentence (the type that some, such as Dixon 1994, would class as ?ergative?), a third person pronoun subject appears in the oblique case (like a noun); but a first or second person subject pronoun is in the nominative case at all times (Schmidt 1999: 22) ? The third person pronouns take special plural oblique forms before the postposition n? (Schmidt 1999: 22), whereas the first and second do not ? There are no possessive adjectives corresponding to the third person pronouns, whereas there are such adjectives corresponding to the first and second person pronouns (Schmidt 1999: 24) On these grounds, I exclude the third person pronouns from consideration as personal pronouns, and deal with them as demonstratives/determiners, etc. (see section 3.4.2). Thus, the subcategory of first and second person personal pronouns contains only the pronouns mai~ and t?, and inflectionally related forms such as their plurals and possessive forms. All tags in this subcategory begin PP? (or PG? for possessives). Personal pronouns are not marked for gender: as with verbs, that which is marked for person is not marked for gender. (The ?M? in the tags below signifies ?first person?, not ?masculine?.) They are marked for number and case. As noted in the preceding section, the intermediate tagset for pronouns contains an attribute of politeness. All pronouns in this section are given as familiar, to distinguish their intermediate tags from that for ?p. In practice, the singular/plural distinction is often also used to indicate formality in the second person pronouns (Bhatia and Koul 2000: 35-36); tum may apply to one or more than one person. However, the EAGLES guidelines suggest34 that such a pragmatic usage of the number distinction may still be encoded as a number distinction. This is what I have done, tagging tum as plural, on the basis that for purposes of inflection it is the number of the pronoun, not the number of its referent, that counts. There are possessive adjectives corresponding to the personal pronouns above. While the intermediate tagset must treat these as pronouns, within the Urdu tagset they could have been treated as adjectives (as has been done with some other determiner-like pronouns; see below). However, this has not been done, since the possessive adjectives have person. This is not true for any adjectival form, and thus the possessive adjectives are better classed as personal pronouns. As they are adjectival, they may be marked for gender, number and case. The 157 case and gender attributes indicate the features that are in agreement with the head noun rather than inherent features of the pronoun. The number attribute is also for agreement; the inherent number of the possessive adjective itself is shown by the attribute possessive.
SubClass Of
Sub-Classes
PluralNumber G PluralNumber Plural Number
Abstract Number=plural
SubClass Of
PossessiveAdjective G PossessiveAdjective Possessive Adjective
Abstract There are possessive adjectives corresponding to the personal pronouns above. While the intermediate tagset must treat these as pronouns, within the Urdu tagset they could have been treated as adjectives (as has been done with some other determiner-like pronouns; see below). However, this has not been done, since the possessive adjectives have person. This is not true for any adjectival form, and thus the possessive adjectives are better classed as personal pronouns.
SubClass Of
Sub-Classes
Postposition G Postposition Postposition
SubClass Of
PredicativeAdjective G PredicativeAdjective Predicative Adjective
Abstract Adjective/Use=predicative
SubClass Of
PremultiplicativeCliticNumeral G PremultiplicativeCliticNumeral Premultiplicative Clitic Numeral
Abstract Pre-multiplicative clitic cardinal number du-, ti-, cau-
SubClass Of
Preposition G Preposition Preposition
SubClass Of
PresentTense G PresentTense Present Tense
Abstract Tense=present
SubClass Of
PronominalAdverb G PronominalAdverb Pronominal Adverb
SubClass Of
Sub-Classes
PronounOrDeterminer G PronounOrDeterminer Pronoun Or Determiner
Abstract The EAGLES guidelines treat pronouns and determiners together as a single category, although one of the recommended attributes, category, distinguishes between them. Since in Urdu the distinction is not clear (particularly in the area of third person pronouns), I also treat this category as being single at the most fundamental level. The difference between what is considered a determiner and what is considered a pronoun is not made in the EAGLES guidelines, which say ?different analyses for different languages entail separating [these parts of speech] out in different ways? (Leech and Wilson 1999: 63). For Urdu, I have mostly followed Schmidt ? who does not have a separate ?determiner? category ? in the divisions I make. However, I have classed together all third person pronouns/demonstratives, interrogative and relative pronouns/determiners, because these form sets of words 149 displaying morphological symmetry (see 3.4.2). Schmidt counts pronouns such as yah, vah, as both personal pronouns and determiners. However, for the purposes of the tagset, the division should be sharp; therefore I have limited the ?personal pronouns? category to the first and second persons. The justification for this is given in section 3.4.1.1. I have also diverged from Schmidt in classing together a number of her minor categories of pronoun under the covering title ?other? for the purposes of this tagset definition. This gives the following groups of pronoun/determiner-like words ? first and second person personal pronouns ? third person pronouns/demonstratives, interrogative and relative pronouns and determiners ? reflexive pronouns ? other pronouns and determiners There is one pronoun, ?p (a kind of honorific personal pronoun) which does not fit unproblematically into any of these categories. Discussion is devoted to this pronoun in section 3.4.1.2 below.
SubClass Of
Sub-Classes
ProperNoun G ProperNoun Proper Noun
Abstract Noun/Type=proper
SubClass Of
ProximalDemonstrativeAdjective G ProximalDemonstrativeAdjective Proximal Demonstrative Adjective
Abstract There are also in the y-v-k-j sets a number of words that are more like 165 determiners than pronouns, i.e. they take adjectival inflection and cannot stand alone as pronouns. However they behave in some respects more like adjectives, e.g. they can be predicative rather than attributive. In terms of the EAGLES guidelines they are best characterised within the pronoun/determiner category. They correspond to English words like ?such?, ?this/that much/many? and so on. In terms of the Urdu tagset, I have classified them as JD ? determiner-like adjectives41.
SubClass Of
ProximalDemonstrativeAdverb G ProximalDemonstrativeAdverb Proximal Demonstrative Adverb
SubClass Of
ProximalDemonstrativeDeadjectivalAdverb G ProximalDemonstrativeDeadjectivalAdverb Proximal Demonstrative Deadjectival Adverb
SubClass Of
ProximalDemonstrativePronoun G ProximalDemonstrativePronoun Proximal Demonstrative Pronoun
SubClass Of
Punctuation G Punctuation Punctuation
Abstract The EAGLES guidelines allow three options for the markup of word-external punctuation: firstly, to use a single tag for all punctuation marks (the obligatoryattribute- only approach); secondly, to give each punctuation mark its own separate tag; and thirdly, to group punctuation marks into a smaller number of tags according to how they may position in a sentence. The first approach I rejected on the grounds that it needlessly excluded potentially useful information. The third approach, likewise, tags different punctuation marks in the same way. Since punctuation marks can be tagged utterly unambiguously ? a comma is always a comma ? this is needless. The decision was therefore taken to give each punctuation mark a unique tag. This tag is, in fact, the same as the punctuation mark itself (a practice also adhered to in, for example, the C7 tagset: see 2.1.2.1). However, since the tagset is designed to operate in Unicode texts, more forms of punctuation can be distinguished (for example, opening and closing quotation marks). Some of these distinctions may be finer than is necessary (e.g. that between square and normal brackets is useless if one simply wishes to search for brackets in general) but it would be trivial to design search software that could treat the two tags as alike, or to map to a subtagset that collapsed these to a single ?bracket? category. There are 13 tags in this section. The EAGLES guidelines underspecify the value of the one attribute, stating values only for the full stop, comma, and question mark, so I have inferred it (using letters when the available digits ran out). For all punctuation marks, the Unicode of the Perso-Arabic tag is the same as that of the punctuation mark being tagged52. The Roman tags for full stop, comma, question mark, and semi-colon consist of a different Unicode character to the punctuation mark being tagged, but otherwise likewise use the same Unicode. With regard to paired punctuation ? the quotation marks and brackets ? there is a point to be made as regards directionality. The Unicode Standard specifies (Unicode 1996: 6-4) that in bi-directional text53 the same character ? i.e. the same Unicode value ? should represent the opening member of the pair whatever its appearance, and the same with the closing member of the pair. That is, the code U+0028 (OPENING PARENTHESIS) ought always to be the first of the pair, and be rendered as ? ( ? in left-to-right text, such as English, and as ? ) ? in right-to-left text, such as Urdu. Other paired punctuation marks should function similarly54. Therefore for each of these marks, the Roman and Perso-Arabic tags are mirror images of one another, though they are encoded by the same numeric value. This could potentially create confusion when an analyst tags text by hand, inasmuch as the (Roman) tag will have the opposite appearance to the (Perso-Arabic) symbol in the actual text55. However, this will not be problematic when tagging is automated, ?right? and ?left? meaning nothing to a computerised tagger. There remain some problematic points, for example, the ellipsis (?), angle bracket speech marks, and braces. These have not been given tags for now, on the basis that no Urdu text I have yet seen contains these symbols. However, nor does any work on Urdu rule out their use, so extra punctuation tags may prove necessary.
SubClass Of
Sub-Classes
QuestionMark G QuestionMark Question Mark
SubClass Of
QuestionMarker G QuestionMarker Question Marker
Abstract Question marker ky?
SubClass Of
RahaAuxiliary G RahaAuxiliary Raha Auxiliary
Abstract rah? This auxiliary element is used in the formation of tenses in the durative aspect. It is itself the perfective participle of the lexical verb rahn?, ?remain?, but as Schmidt (1999: 111) reports, this form ?has been delexicalised?. It is marked for gender and number. It may seem that treating rah? as auxiliary and rahn? as lexical goes against the principle laid down in 3.2 that the distinction between lexical and auxiliary should be inherent to the verb and not dependent on context, and conflicts, for example, with the treatment of h?n? (see 3.2.2.4 below). However, this is not the case. The verb h?n? may be main but it is never lexical; rahn? is lexical when it is main, and cannot act as an auxiliary at all except for the one, very particular, delexicalised form rah?. There is a problem in the intermediate tagset, in that the EAGLES guidelines contain no value for durative aspect. Therefore, the aspect attribute is given the value zero, since the aspect is neither perfective nor imperfective. This is not a very good solution but it is preferable to adding a value, and there is no satisfactory way to mark durative in the intermediate tagset by adding an attribute. This solution also ensures that each form of auxiliary rah? has a unique value in the intermediate tagset, since every other participial element is either imperfective or perfective. Otherwise in the intermediate tagset, rah? is considered to be a non-finite participle with zero tense. When used lexically, rah? receives the tag VVYM1N, rah? receives VVYF1N or VVYF2N, and so on.
SubClass Of
ReciprocalPronoun G ReciprocalPronoun Reciprocal Pronoun
SubClass Of
ReflexivePossessiveAdjective G ReflexivePossessiveAdjective Reflexive Possessive Adjective
SubClass Of
ReflexivePronoun G ReflexivePronoun Reflexive Pronoun
Abstract Unlike many European languages, Urdu reflexive pronouns are not personal. That is, they have the same form regardless of the person of the pronoun they are reflexing back to. There are two reflexive pronouns, both tagged the same, a reciprocal pronoun (which only appears within a postpositional phrase) and a reflexive possessive adjective. The reflexive possessive adjective is classed with the other possessive adjectives in the hierarchy given in 3.14. See also the discussion of the honorific usage of ?p in section 3.4.1.2 above.
SubClass Of
RelativeAdjective G RelativeAdjective Relative Adjective
Abstract There are also in the y-v-k-j sets a number of words that are more like 165 determiners than pronouns, i.e. they take adjectival inflection and cannot stand alone as pronouns. However they behave in some respects more like adjectives, e.g. they can be predicative rather than attributive. In terms of the EAGLES guidelines they are best characterised within the pronoun/determiner category. They correspond to English words like ?such?, ?this/that much/many? and so on. In terms of the Urdu tagset, I have classified them as JD ? determiner-like adjectives41.
SubClass Of
RelativeAdverb G RelativeAdverb Relative Adverb
Abstract A relative adverb locates an event or an object in one place or time. (Schmidt 1999, p. 218)
SubClass Of
RelativeDeadjectivalAdverb G RelativeDeadjectivalAdverb Relative Deadjectival Adverb
SubClass Of
RelativePronoun G RelativePronoun Relative Pronoun
SubClass Of
Residual G Residual Residual
Abstract The remaining categories (called ?residual? in the EAGLES guidelines) cover, quite simply, everything else. This comprises various semi-linguistic and non-Urdu elements. There are 8 such tags. Although the EAGLES guidelines allows for these elements having number and gender, I have not included this: if such an element is inflected as a verb, noun or adjective, then it may be considered sufficiently a part of that category to be tagged as such. This particularly applies to acronyms and abbreviations. Thus, the second and third EAGLES attributes, number and gender, are zero in the intermediate tags below. Every value from the first EAGLES attribute, type, has been used; with the exception of FX and FS, each tag bears the name of the value in the intermediate tagset it is mapped onto. The tag for ?foreign words? is meant to cover words from other languages written in the Urdu alphabet. It is not meant to cover the large number of Persian, Arabic and English loanwords that exist in Urdu, although it remains to be seen how sharp this distinction can be made in actual tagging. The tag for ?non-Perso-Arabic string? is for foreign words in other alphabets, or for other non-Perso-Arabic incursions into the text. FU is a catch-all ?Unclassified? category, although it is to be hoped that the vast majority of tokens will be catered for by at least one of the other tags outlined in this chapter.
SubClass Of
Sub-Classes
Root G Root Root
Abstract The root consists, as its name suggests, of the root of the verb unadorned by affixation. It is not marked for person, number or gender and cannot occur as the sole verb of a main clause; it is, therefore, non-finite (untensed and also neither imperfective nor perfective in aspect). The exception to this is when it is used as an imperative form (discussed below). However, it does not fit neatly into any of the non-finite values for mood (the choices being infinitive, participle, gerund and supine). Therefore, in the intermediate tagset it is given a 0 for mood. Since this only has one form, there is only one tag. It should be noted that in the intermediate tags for this and all the following forms of lexical verb, all the tags give the status attribute the value main, since by definition a lexical verb is not an auxiliary (see the discussion of the status attribute in 3.2 above).
SubClass Of
SecondPerson G SecondPerson Second Person
Abstract Person=second
SubClass Of
SecondPersonHonorificPronoun G SecondPersonHonorificPronoun Second Person Honorific Pronoun
Abstract The problematic honorific pronoun ?p The case of ?p, the second person honorific pronoun, is by no means as clear as that of the third person pronouns. While the fact of its identical appearance with the reflexive pronoun (also ?p: see 3.4.330) suggests that, like the third person pronouns, it may be best classified elsewhere, there are two very good reasons for regarding ?p as a personal pronoun like mai~ and t?. 30 Kellogg (1875: 180-181) gives the common etymology of (what he sees as) these two pronouns in a single Sanskrit word. 153 The first is semantic. Semantically and pragmatically, ?p has a very similar meaning to t? and its plural form tum ? they both mean ?you?31. The second reason is syntactic. From the examples of ?p given by Schmidt (1999), it would appear that ?p has a very similar distribution to mai~ and t?. It is used, for example, as the subject of a sentence; the reflexive pronoun ?p, by contrast, can never be the subject of a sentence for obvious reasons. There are, on the other hand, a number of reasons to regard ?p as unlike mai~ and t? and either identical or at least more akin to the cognate reflexive pronoun (also ?p. All are morphological. Firstly, ?p (both the honorific and reflexive pronoun) does not have separate nominative and oblique cases, whereas mai~ and t? do. Secondly, as noted above, mai~ and t? have associated possessive adjectives. ?p also has such a possessive adjective, apn?, but this is only used reflexively (see 3.4.3). When the usage is honorific, possession is expressed phrasally with the postposition k?, ?of?. Thirdly, while mai~ and t? agree with verbal forms distinct from those used with nouns or third person pronouns, ?p does not, always taking identical verbal inflections to the third person. This is what we would expect if it were simply a special usage of a reflexive pronoun. So then, is ?p a second person personal pronoun or is it a special usage of the reflexive pronoun? Either position is tenable. The syntax and semantics of the case supports the former approach while the morphology backs up the latter approach. The EAGLES guidelines cannot help in choosing between them, since this problem is an idiosyncrasy of emille: we would therefore not expect it to be covered by a standard drawn up for a set of languages which do not include Urdu. Ultimately, this is a case where an arbitrary decision must be taken: the decision I took was not to treat ?p as a personal pronoun along with mai~ and t?. However, although arbitrary, this decision is consistent: ?p will always be treated separately in this way32. In fact the non-reflexive ?p will be given the tag PA, so that in terms of the hierarchy of the tagset, it is categorised neither with the personal nor the reflexive pronouns, but in a separate subdivision of the pronoun category. This is, to an extent, another arbitrary decision: PPA could have been an equally reasonable tag, emphasising the similarity of syntactic function with mai~ and t?, or PRA, emphasising the similarity of its case inflections to those of the reflexive pronouns, which likewise show no difference between the nominative and oblique cases. However, to impose either of these interpretations might prove theoretically controversial, in breach of a stated design principle33. Note however that in terms of the intermediate tagset, ?p is still treated as a personal pronoun, because the things that it will map onto in other languages will be personal pronouns. Its number is ( 1 | 2 ), on the grounds that it may refer to one person or to more than one. Note that the intermediate tagset for pronouns contains a value, politeness; ?p has been listed as polite, whereas the intermediate tags for t? as given in the next section contain the value for familiar.
SubClass Of
SemiColon G SemiColon Semi Colon
SubClass Of
SentenceTagWord G SentenceTagWord Sentence Tag Word
Abstract Sentence tagword (e.g. s?h?) This category is rather more open than the other ?unique? categories, and may in certain circumstances be ambiguous with adverbs.
SubClass Of
SingularNumber G SingularNumber Singular Number
Abstract Number=singular
SubClass Of
Subjunctive G Subjunctive Subjunctive
Abstract The subjunctive is the only form that is marked for person in Urdu lexical verbs. It is not, however, marked for gender. Therefore the intermediate tagset forms give gender as zero, mood as subjunctive and tense as present.
SubClass Of
SubjunctiveMood G SubjunctiveMood Subjunctive Mood
Abstract Mood=subjunctive
SubClass Of
SubordinatingConjunction G SubordinatingConjunction Subordinating Conjunction
SubClass Of
system:Feature G system_Feature Feature (system)
Namespace http://purl.org/olia/system.owl#
Sub-Classes
system:UnitOfAnnotation G system_UnitOfAnnotation Unit Of Annotation (system)
Namespace http://purl.org/olia/system.owl#
Sub-Classes
Tense G Tense Tense
SubClass Of
Sub-Classes
ThirdPerson G ThirdPerson Third Person
Abstract Person=third
SubClass Of
Unique G Unique Unique
Abstract Unique/unassigned (including particles, clitics and tags) The Unique category in the EAGLES guidelines is meant to contain words that are members of a one-word category; for example, the infinitive marker to or the existential there in English. I will first outline the general nature of the tags defined in this part of the tagset (3.12.1), before going into some depth on the problem that motivated the creation of one particular unique category, that of nongrammatical lexical element: the zimmah d?r problem (3.12.2).
SubClass Of
Sub-Classes
UnmarkedForGender G UnmarkedForGender Unmarked For Gender
Abstract Markedness=2
SubClass Of
Verb G Verb Verb
Abstract There are a considerable number of factors to be taken into account in a description and categorisation of the Urdu verbal system. There are a number of inflected forms, and with the use of one or more auxiliary elements, 15 compound tenses are built up. Furthermore, any part of the compound verb-phrase may be marked for number, person or gender agreement12. There are two conceivable approaches to the markup of such a compound verb-phrase. Firstly, each word could be tagged separately, regardless of its context. So for example the form that Schmidt (1999) refers to as the ?perfective participle? would be tagged the same regardless of what compound tense it was being used in. Secondly, compound verbs could be treated as multi-word units, each such unit receiving a single tag. The latter approach was not followed, for three reasons. In the first place, it goes against the principle that every word should have its own tag, using no multiword tags. Secondly, it goes against the suggestion made by the EAGLES guidelines that ?In general, compound tenses are not dealt with at the morphosyntactic level, since they involve the combination of more than one verb in a larger construction? (Leech and Wilson 1999: 63). Thirdly, it would result in the tagset being much more complicated than need be. For example, each of the 15 compound tenses would need to be distinguished. By contrast the other approach would require a relatively smaller number of distinctions to be made, between the elements of which the compound tenses are built. The over-complicated tagset design that multi-word tagging of compound verbs would necessitate would also have the drawback of going far beyond the EAGLES guidelines on verbal tags. By treating each word of the compound verb as separate, it is possible to stick fairly closely to the guidelines. the agreement attributes number, gender, and person are clearly relevant to the Urdu verbal system. Some writers consider that Urdu displays what has been described as split ergativity (as described in section 1.1.5.4). That is, the verb agrees sometimes with the subject, and sometimes with the direct object. It may also under some circumstances agree with neither (Schmidt 1999: 125). As explained in 1.1.5.4, however, some writers (e.g. Butt 1995) disagree with this analysis. However, for the purposes of defining verbal tags the matter of ergativity is more or less irrelevant. The agreement suffixes which occur on verbs ? and, therefore, the morphosyntactic categories displayed by verbs ? are exactly the same regardless of which argument of the verb is being agreed with. A single morphosyntactic phenomena receives a single tag; so for example when I give 13 Except for one marginal case (see discussion of c?hi? in section 3.2.2.3 below). 115 a verb a tag VVYF1N14 (see 3.2.1.3), it is not specified whether the feminine agreement is with a subject or object. Thus, the principle of theoretical neutrality is upheld: this analysis is as compatible with a theory in which Urdu displays split ergativity as with a theory in which it does not. Status (i.e. whether a verb is main or auxiliary) is relevant throughout. However, the way in which it has been used is a little different to that given in the EAGLES recommendations. The EAGLES guidelines suggest a main/auxiliary distinction which is context dependent. This can be seen by Leech and Wilson?s example tagset for English (1999: 72-74), in which it is made clear that the verb be can be either a main verb or an auxiliary verb. However, the distinction I have used is between lexical verbs and non-lexical auxiliary verbs. This is not context-dependent; English be would be considered an auxiliary regardless of context. The motivation for this is the decidedly irregular morphology of Urdu auxiliary verbs, most particularly h?n?, ?be? (see also 3.2.2.4). This goes far beyond the inflectional oddities found in English non-lexical verbs: h?n? possesses two tenses that no other verb has, and it possesses them regardless of whether it is a main verb or not. To mark up h?n? as a main verb, there would have to be a tag, for example, for a present-tense main verb. But to include such a tag would be to vastly misrepresent the majority of Urdu verbs, which have no inflected present tense. There are similar problems with such non-lexical verbal forms as c?hi? and g?. Thus it makes sense to use the status attribute to distinguish (mostly regular) lexical verbs and (irregular) auxiliary verbs, so that the unique marking on the latter can be tagged exclusively on the latter. The optional third value of the status attribute, semi-auxiliary, has been used as described below.
SubClass Of
Sub-Classes
VocativeCase G VocativeCase Vocative Case
Abstract Case=vocative
SubClass Of

Object Properties

hasAspect G hasAspect hasAspect
Range
hasCase G hasCase hasCase
Range
hasFiniteness G hasFiniteness hasFiniteness
Range
hasGender G hasGender hasGender
Range
hasGenderMarking G hasGenderMarking hasGenderMarking
Range
hasInherentNumber G hasInherentNumber hasInherentNumber
Range
hasMood G hasMood hasMood
Range
hasNumber G hasNumber hasNumber
Range
hasPerson G hasPerson hasPerson
Range
hasTense G hasTense hasTense
Range
system:hasFeature G system_hasFeature hasFeature (system)
Namespace http://purl.org/olia/system.owl#
Sub-Properties
Domain

Individuals

AL G AL AL Article
Class
AU G AU AU Interjection
Class
CC G CC CC CoordinatingConjunction
Class
CCC G CCC CCC CorrelativeCoordinatingConjunction
Class
CS G CS CS SubordinatingConjunction
Class
FA G FA FA Acronym
Class
FB G FB FB Abbreviation
Class
FF G FF FF ForeignWord
Class
FO G FO FO Formula
Class
FS G FS FS OtherSymbol
Class
FU G FU FU OtherUnclassifiableNonUrduElement
Class
FX G FX FX NonPersoArabicString
Class
FZ G FZ FZ Letter
Class
IB G IB IB Postposition
Class
II G II II Preposition
Class
II1 G II1 II1 SingularNumber II1->II1 hasNumber
Class
II2 G II2 II2 PluralNumber II2->II2 hasNumber
Class
IIC G IIC IIC CliticPostposition
Class
IIF G IIF IIF FeminineGender IIF->IIF hasGenderMarking
Class
IIM G IIM IIM MasculineGender IIM->IIM hasGenderMarking
Class
II_ G II_ II UnmarkedForGender II_->II_ hasGenderMarking
Class
II_N G II_N II N NominativeCase II_N->II_N hasCase
Class
II_O G II_O II O ObliqueOrVocativeCase II_O->II_O hasCase
Class
II_gendermarked G II_gendermarked II gendermarked MarkedForGender II_gendermarked->II_gendermarked hasGenderMarking
Class
J1 G J1 J1 SingularNumber J1->J1 hasNumber
Class
J2 G J2 J2 PluralNumber J2->J2 hasNumber
Class
JD G JD JD IndefiniteDeterminer
Class
JDF G JDF JDF Fraction
Class
JDJ G JDJ JDJ RelativeAdjective
Class
JDK G JDK JDK InterrogativeAdjective
Class
JDNU G JDNU JDNU CardinalNumber
Class
JDNUC G JDNUC JDNUC PremultiplicativeCliticNumeral
Class
JDNUO G JDNUO JDNUO ObliqueCase JDNUO->JDNUO hasCase
Class
JDN_O G JDN_O JDN O ObliqueOrVocativeCase JDN_O->JDN_O hasCase
Class
JDN_ordinal G JDN_ordinal JDN ordinal OrdinalNumber
Class
JDV G JDV JDV DistalDemonstrativeAdjective
Class
JDY G JDY JDY ProximalDemonstrativeAdjective
Class
JD_F G JD_F JD F FeminineGender JD_F->JD_F hasGender
Class
JD_M G JD_M JD M MasculineGender JD_M->JD_M hasCase
Class
JD_O G JD_O JD O ObliqueCase JD_O->JD_O hasGender
Class
JD_U G JD_U JD U UnmarkedForGender JD_U->JD_U hasGenderMarking
Class
JD_gendermarked G JD_gendermarked JD gendermarked MarkedForGender JD_gendermarked->JD_gendermarked hasGenderMarking
Class
JJ G JJ JJ AttributiveOrPredicativeAdjective
Class
JP G JP JP PredicativeAdjective
Class
JXG G JXG JXG MultiplicativeMarker
Class
JXS G JXS JXS AdjectivalParticle
Class
JXV G JXV JXV AdjectivalOccupationalParticle
Class
JX_F G JX_F JX F FeminineGender JX_F->JX_F hasGender
Class
JX_M G JX_M JX M MasculineGender JX_M->JX_M hasGender
Class
J_F G J_F J F FeminineGender MarkedForGender J_F->J_F hasGender J_F->J_F hasGenderMarking
Class
J_M G J_M J M MarkedForGender MasculineGender J_M->J_M hasGender J_M->J_M hasGenderMarking
Class
J_N G J_N J N NominativeCase J_N->J_N hasCase
Class
J_O G J_O J O ObliqueOrVocativeCase J_O->J_O hasCase
Class
J_U G J_U J U UnmarkedForGender J_U->J_U hasGenderMarking
Class
LL G LL LL NongrammaticalLexicalElement
Class
NN G NN NN CommonNoun
Class
NP G NP NP ProperNoun
Class
N_F G N_F N F FeminineGender
Class
N_M G N_M N M MasculineGender
Class
N__M G N__M N M MarkedForGender
Class
N__U G N__U N U UnmarkedForGender
Class
N___1 G N___1 N 1 SingularNumber
Class
N___2 G N___2 N 2 PluralNumber
Class
N____N G N____N N N NominativeCase
Class
N____O G N____O N O ObliqueCase
Class
N____V G N____V N V VocativeCase
Class
OO G OO OO CompoundFormingConjunction
Class
PA G PA PA HonorificSecondPerson SecondPersonHonorificPronoun PA->PA hasPerson
Class
PG G PG PG PossessiveAdjective
Class
PGR G PGR PGR ReflexivePossessiveAdjective
Class
PGRF G PGRF PGRF FeminineGender PGRF->PGRF hasGender
Class
PGRM G PGRM PGRM MasculineGender PGRM->PGRM hasGender
Class
PG_1 G PG_1 PG 1 SingularNumber PG_1->PG_1 hasInherentNumber
Class
PG_2 G PG_2 PG 2 PluralNumber PG_2->PG_2 hasInherentNumber
Class
PG_F G PG_F PG F FeminineGender PG_F->PG_F hasGender
Class
PG_M G PG_M PG M MasculineGender PG_M->PG_M hasGender
Class
PJ G PJ PJ RelativePronoun
Class
PK G PK PK InterrogativePronoun
Class
PN G PN PN IndefinitePronoun
Class
PP G PP PP PersonalPronoun
Class
PPM G PPM PPM FirstPerson PPM->PPM hasPerson
Class
PPT G PPT PPT SecondPerson PPT->PPT hasPerson
Class
PP_1 G PP_1 PP 1 SingularNumber PP_1->PP_1 hasNumber
Class
PP_2 G PP_2 PP 2 PluralNumber PP_2->PP_2 hasNumber
Class
PP_N G PP_N PP N NominativeCase PP_N->PP_N hasCase
Class
PP_O G PP_O PP O ObliqueCase PP_O->PP_O hasCase
Class
PRC G PRC PRC ReciprocalPronoun
Class
PRF G PRF PRF ReflexivePronoun
Class
PU1 G PU1 PU1 FullStop
Class
PU2 G PU2 PU2 Comma
Class
PU3 G PU3 PU3 QuestionMark
Class
PU4 G PU4 PU4 ExclamationMark
Class
PU5 G PU5 PU5 Colon
Class
PU6 G PU6 PU6 SemiColon
Class
PU7 G PU7 PU7 NeutralQuotation
Class
PU8 G PU8 PU8 OpenQuotationMark
Class
PU9 G PU9 PU9 CloseQuotationMark
Class
PUA G PUA PUA OpenParenthesis
Class
PUB G PUB PUB CloseParenthesis
Class
PUC G PUC PUC OpenSquareBracket
Class
PUD G PUD PUD CloseSquareBracket
Class
PV G PV PV DistalDemonstrativePronoun
Class
PY G PY PY ProximalDemonstrativePronoun
Class
P_1 G P_1 P 1 SingularNumber P_1->P_1 hasNumber
Class
P_2 G P_2 P 2 PluralNumber P_2->P_2 hasNumber
Class
P_E G P_E P E ObliqueCase P_E->P_E hasCase
Class
QQ G QQ QQ QuestionMarker
Class
RD G RD RD DegreeAdverb
Class
RJ G RJ RJ RelativeAdverb
Class
RJJ G RJJ RJJ RelativeDeadjectivalAdverb
Class
RK G RK RK InterrogativeAdverb
Class
RKJ G RKJ RKJ InterrogativeDeadjectivalAdverb
Class
RM G RM RM ModalAdverb
Class
RMN G RMN RMN NegativeModalAdverb
Class
RR G RR RR GeneralAdverb
Class
RRJ G RRJ RRJ DeadjectivalAdverb
Class
RV G RV RV DistalDemonstrativeAdverb
Class
RVJ G RVJ RVJ DistalDemonstrativeDeadjectivalAdverb
Class
RY G RY RY ProximalDemonstrativeAdverb
Class
RYJ G RYJ RYJ ProximalDemonstrativeDeadjectivalAdverb
Class
TT G TT TT SentenceTagWord
Class
V1 G V1 V1 SingularNumber
Class
V2 G V2 V2 PluralNumber
Class
VC G VC VC CahieAuxiliary
Class
VG G VG VG GaAuxiliary
Class
VGF G VGF VGF FeminineGender
Class
VGM G VGM VGM MasculineGender
Class
VH G VH VH HonaAuxiliary
Class
VHH G VHH VHH IndicativeMood PresentTense VHH->VHH hasTense VHH->VHH hasMood
Class
VHN G VHN VHN InfinitiveMood
Class
VHP G VHP VHP IndicativeMood PastTense VHP->VHP hasMood VHP->VHP hasTense
Class
VR G VR VR RahaAuxiliary
Class
VV G VV VV LexicalVerb
Class
VV0 G VV0 VV0 Root
Class
VVI G VVI VVI Imperative ImperativeMood
Class
VVIA G VVIA VVIA HonorificSecondPerson ImperativeMood
Class
VVN G VVN VVN Infinitive InfinitiveMood
Class
VVNF G VVNF VVNF FeminineGender
Class
VVNM G VVNM VVNM MasculineGender
Class
VVS G VVS VVS Subjunctive SubjunctiveMood
Class
VVSM G VVSM VVSM FirstPerson
Class
VVST G VVST VVST SecondPerson
Class
VVSV G VVSV VVSV ThirdPerson
Class
VVT G VVT VVT ImperfectiveAspect ImperfectiveParticiple ParticipleMood VVT->VVT hasAspect VVT->VVT hasMood
Class
VVY G VVY VVY ParticipleMood PerfectiveAspect PerfectiveParticiple VVY->VVY hasAspect VVY->VVY hasMood
Class
VX G VX VX GeneralAuxiliary
Class
V_N G V_N V N NominativeCase
Class
V_O G V_O V O ObliqueCase
Class
XB G XB XB InclusiveEmphaticParticle
Class
XH G XH XH ExclusiveEmphaticParticle
Class
XHC G XHC XHC CliticExclusiveEmphaticParticle
Class
XT G XT XT ContrastiveEmphaticParticle
Class
ZZ G ZZ ZZ Izafat
Class