OLiA annotation model for morphosyntactic and morphological annotations of Urdu following Sajjad (2007). Unless marked otherwise, all coments are quoted from this document.
Hassan Sajjad (2007), Urdu Part of Speech Tagset, version 1.0.0.0, 07-12-2007, Center for research in Urdu Language Processing. National University of Computer and Emerging Sciences, Lahore, Pakistan, http://www.crulp.org/Downloads/langproc/UrduPOStagger/UrduPOStagset.pdf
Kaf pronouns add interrogative property in the sentence. They are divided into two categories.
Kaf pronouns, represented by KP, are used to ask question about a noun. The second category
includes adverbial kaf pronouns which are used at the place of nouns with adverbial nature.
The adverbial pronouns occur at the place of nouns with adverbial nature and show the property
of time, place, manner, etc. They are represented by AP in the tagset.
Auxiliaries:
Based on the syntactic nature of language, auxiliaries are divided into two categories. Aspectual
auxiliaries always occur after main verb of the sentence. Tense auxiliaries are used to show the
time of the action. They occurred at the end of the verb phrase.
Demonstratives are divided into four categories. All four categories of demonstratives have
ambiguity with four categories of pronoun. Phrase level analysis was done to distinguish
between demonstrative and pronoun.
Expression (Exp):
Any word or symbol which is not handled in this tagset will be catered under expression. It can
be mathematical symbols, digits, etc.
Kaf pronouns add interrogative property in the sentence. They are divided into two categories.
Kaf pronouns, represented by KP, are used to ask question about a noun. The second category
includes adverbial kaf pronouns which are used at the place of nouns with adverbial nature.
Nouns are divided into two categories. First category consists of simple nouns which are
represented by NN in the tagset. However, there are other nouns that show adverbial nature like
time, place, manner, etc. These are also catered under noun. The proper nouns are kept in a
separate category.
Numerals:
Numerals are divided into four categories based on their syntactic structure. Cardinal (CA),
ordinal (OR), fractional (FR) and multiplicative (MUL) are types included in the tagset. Following
are the examples of each category.
Pronouns are divided into six categories based on their syntactic structure. Most of the
categories are consistent with the types provided by Urdu grammarians.
Punctuation marks: In this tagset, punctuation marks are divided into two categories. Sentence
markers mark the boundary of the sentence. Phrase markers are used inside the sentence but
never used at the end of sentence.
Question words (QW):
There are some words instead of kaf pronouns that are used for the interrogation in the sentence.
However, these words cannot be replaced by a noun or pronoun. A separate category of question
words has been formed for these words.