Ontologies of Linguistic Annotation. Machine-readable tagsets and annotation schemata for more than 100 languages.
The OLiA Discourse Extensions extend the Ontologies of Linguistic Annotation (OLiA) with respect to discourse features. The OLiA ontologies provide a a terminology repository that can be employed to facilitate the conceptual (semantic) interoperability of annotations of discourse phenomena as found in important corpora available to the community.
The OWL2/DL Reference Model of the OLiA Discourse Extensions can be found under http://purl.org/olia/discourse/olia_discourse.owl
The OLiA Discourse Extensions extend the Ontologies of Linguistic Annotation (OLiA) with respect to discourse features. The OLiA ontologies provide a a terminology repository that can be employed to facilitate the conceptual (semantic) interoperability of annotations of discourse phenomena as found in important corpora available to the community, including the RST Discourse Treebank and the Penn Discourse Treebank. Note that the current ontologies are chosen such that they represent typical phenomena, they are, however, by no means exhaustive with respect to available corpora.
Discourse phenomena considered here include
The OLiA ontologies do currently not cover dialogue structure, Gricean and Post-Gricean pragmatics and speech act theory or annotation schemes developed on this basis. In a broad sense, these can be regarded discourse phenomena, as the distinction between discourse and pragmatics is largely underdefined.
Instead, we follow a pragmatic distinction based on the types of available annotations: We restrict ourselves to the annotation of text (no dialogues, hence), with a particular focus on theories of discourse structure and discourse relations (in the sense of the Rhetorical Structure Theory or the Segmented Discourse Representation Theory) and frequently annotated phenomena most often discussed in regard to this (hence, anaphora, information status and information structure). Further extensions are, however, envisioned.
At the moment, the OLiA ontologies cover 9 annotation schemes for the annotation of coreference, information status, information structure and discourse structure for a broad variety of languages. So far, 8 of these are provided on this site. A full publication is planned for Jan 2014. All of these annotation schemes are formalized as self-contained OWL/DL ontologies (Annotation Models), with a declarative linking (Linking Models) linking them to an ontology that provides a generalized vocabulary for discourse annotation (Reference Model). For the latter aspects, we currently provide two ontologies that will subsequently be integrated with the OLiA Reference Model (cf. provisional linking: provisional Linking with OLiA Reference Model).
PS: Note that this site is currently being updated, the publication of further Annotation Models and an update of the Reference Models is in preparation.
Different theories of discourse structure emerged in the past decades, and different models of annotation have been developed, accordingly, as illustrated for two alternative annotations of the same sentence in the figure below.
Fig. 1. Comparing Discourse Structure Annotations (RST Discourse Treebank and
Penn Discourse Treebank, file wsj_1365 (simplified)
With traditional annotation schemes, these annotations can hardly be put in any relation to each other, because different structures are involved: The RST annotation is based on trees, whereas the PDTB annotation is based on relational structures. Yet, grouping together relations on the basis of the utterances they spann across, pairs of relations can be formed:
PDTB | RST | ||
---|---|---|---|
(4)-(5) | CONTINGENCY.Condition.general | (4)-(5-7) | CONDITION |
(2)-(4-7) | CONTINGENY.Cause.result | (1-2)-(3-7) | EXPLANATION |
(4-5)-(6) | EXPANSION.Alternative.chosen alternative | (5)-(6-7) | CONTRAST |
Yet, within their annotation schemes, the relations cannot be directly compared, for, e.g., the PDTB CONDITION and the RST CONDITION have different connotations in terms of discourse structure (no restrictions in PDTB, a hierarchical structure in RST). Using the ontologies, these aspects of information can be distangled (common properties in terms of the Reference Model in bold):
Fig 2. Comparing CONDITIONs
Model</td> | Description | Phenomenon | OWL/DL models |
---|---|---|---|
Reference Model | Reference Model fragment for discourse structure and discourse relations, to be integrated with the OLiA Reference Model | discourse structure, discourse relations, information structure, information status, coreference | OLiA Discourse Extensions Model, Provisional Reference Model linking |
RST Annotation Modell | Annotation Model for RST (http://www.sfu.ca/rst/, English, French, Portuguese, Spanish) | discourse structure, discourse relations | Annotation Model, Linking Model |
RSTDTB Annotation Model | Annotation Model for the RST Discourse Treebank (English, Wallstreet Journal) | discourse structure, discourse relations | Annotation Model, Linking Model |
PDTB Annotation Model | Annotation Model for the Penn Discourse Treebank (English, Wallstreet Journal), also applicable to PDTB-derivatives for Turkish, Hindi, Italian and Chinese | discourse relations | Annotation Model, Linking Model |
PDGB Annotation Model | Annotation Model for the Penn Discourse Graphbank (English, incl. Wallstreet Journal) | discourse relations | Annotation Model, Linking Model |
Knott Annotation Model | Annotation Model for the Knott (1996) discourse cue taxonomy (not used for corpus annotation, but for cue word classification, such discourse cues have been annotated in PDTB, though) | discourse (relation) marker taxonomy | Annotation Model, no Linking Model yet |
Model</td> | Description | Phenomenon | OWL/DL models | </tr>
---|---|---|---|
Reference Model | Reference Model fragment, to be integrated with the OLiA Reference Model | discourse structure, discourse relations, information structure, information status, coreference | OLiA Discourse Extensions model, Provisional Reference Model linking |
CRC632 | Annotation Model for the corpora of the Collaborative Research Center (SFB) 632, "Information Structure" (Potsdam, Berlin, Germany), applied to various, typologically different languages | information structure, information status | Annotation Model, Linking Model |
DIRNDL | Annotation Model for the DIRNDL corpus (German, spoken language) | information status, coreference | Annotation Model, Linking Model |
PoCoS | Annotation Model for the Potsdam Coreference Scheme, applied to English, German and Russian | coreference | Annotation Model, Linking Model |
ARRAU | Annotation Model for the ARRAU corpus (English) | coreference, bridging | Annotation Model, Linking Model |
TüBa-D/Z | Annotation Model for the TüBa-D/Z corpus (German) | coreference | Annotation Model |