OLiA Annotation Model for Penn Treebank (PTB) part-of-speech annotation (Santorini 1990)
Unless specified otherwise, all comments are taken from Santorini (1990).
References
Beatrice Santorini (1990), Part-of-Speech tagging guidelines for the Penn Treebank Project, 3rd revision, 2nd printing, ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz
These are adjectives, ordinal numerals, and ordinal numbers.
Hyphenated compounds that are used as modifiers are tagged as adjectives, e.g. "happy-go-lucky", "one-of-a-kind", "run-of-the-mill". Ordinal numbers are tagged as adjectives, as are compounds of the form "n-th" or "X-est", like "fourth-largest".
This category includes most words that end in -ly as well as degree words like "quite", "too" and "very", posthead modifiers like "enough" and "indeed" (as in "good enough", "very well indeed"), and negative markers like "not", "n' t" and "never".
This tag subsumes imperatives, infinitives and subjunctives.
EXAMPLES:
Imperative: Do/VB it.
Infinitive: You should do/VB it.
We want them to do/VB it.
We made them do/VB it.
Subjunctive: We suggested that he do/VB it.
These are adjectives mostly with the comparative ending -er and a comparative meaning. "More" or "less" should be tagged as a comparative adjective when it is used without a head noun and it
corresponds to the object of a verb or preposition.
This category includes "and", "but", "nor", "or", "yet" (as in "Yet it's cheap", "cheap yet good"), as well as the mathematical operators "plus", "minus", "less", "times" (in the sense of "multiplied by") and "over" (in the sense of "divided by"), when they are spelled out.
For in the sense of "because" is a coordinating conjunction.
This category includes the articles "a(n)", "every", "no" and "the", the indefinite determiners "another", "any" and "some", "each", "either" (as in "either way"), "neither" (as in "neither decision"), "that", "these", "this" and "those", and instances of "all" and "both" when they do not precede a determiner or possessive pronoun (as in "all roads" or "both times").
Existential "there" is the unstressed "there" that triggers inversion of the inflected verb and the logical subject of a sentence, e.g. "There/EX was a party in progress.", "There/EX ensued a melee.".
This category includes "my" (as in "My, what a gorgeous day"), "oh", "please", "see" (as in "See it's like this"), "uh", "well" and "yes", among others.
This category includes all verbs that don't take an -s ending in the third person singular present: "can",
"could", ("dare"), "may", "might", "must", "ought", "shall", "should", "will", "would".
This category includes the personal pronouns proper, without regard for case distinctions ("I", "me", "you", "he", "him", etc.), the reflexive pronouns ending in -self or -selves, and the nominal possessive pronouns "mine", "yours", "his", "hers", "ours" and "theirs".
The possessive ending on nouns ending in 's or is split off by the tagging algorithm and tagged as if it
were a separate word. e.g. "John/ NP 's/POS idea", "the parents/NNS'/POS distress".
This category includes the following determinerlike elements when they precede an article or possessive pronoun.
EXAMPLES:
all/PDT his marbles
nary/PDT a soul
both/PDT the girls
quite/PDT a mess
half/PDT his time
rather/PDT a nuisance
many/PDT a moon
such/PDT a good time
We make no explicit distinction between prepositions and subordinating conjunctions. (The distinction is not lost, however - a preposition is an IN that precedes a noun phrase or a prepositional phrase, and a subordinate conjunction is an IN that precedes a clause).
The preposition "to" has its own special tag TO.
These are adjectives with the superlative ending -est (as well as "worst"). "Most" and "least" can also be tagged as superlative adjective when they occur by themselves.
This tag should be used for mathematical, scientific and technical symbols or expressions that aren't words of English. It should not used for any and all technical expressions. For instance, the names of chemicals, units of measurements (including abbreviations thereof) and the like should be tagged as nouns.
This category includes "how", "where", "why", etc. When in a temporal sense is tagged as wh-adverb. In the sense of "if", on the other hand, it is a subordinating conjunction.
EXAMPLES:
"When/WRB he finally arrived, I was on my way out."
PP is the used Tag in "Part-of-Speech Tagging Guidelines for the Penn Treebank Project", Beatrice Santorini, 15.03.1991"
(http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/Penn-Treebank-Tagset.ps 21.11.07)
PP$is the used Tag in "Part-of-Speech Tagging Guidelines for the Penn Treebank Project", Beatrice Santorini, 15.03.1991"
(http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/Penn-Treebank-Tagset.ps 21.11.07)
PRP is the used Tag in "Part of Speech Tagging Guidelines for the Penn Treebank Project, June 1990" (ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz 21.11.07)
PRP$ is the used Tag in "Part of Speech Tagging Guidelines for the Penn Treebank Project, June 1990" (ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz 21.11.07)