Ontologies of Linguistic Annotation. Machine-readable tagsets and annotation schemata for more than 100 languages.
We assume you start with a document (say, a paper, or a whitepaper)
rdfs:subClassOf
properties, so, a feature like, say, Accusative
should ideally be a class, not just an individiual)olias:hasTag
(= http://purl.org/olia/system.owl#hasTag
) property and assign it to one or multiple classes (the latter function is useful for combinatorial tagsets).olias:hasTagContaining
(the individual applies if the actual tag contains a substring), olias:hasTagStartingWith
, olias:hasTagEndingWith
or olias:hasTagMatching
(using a regular expression).This is helpful when doing the manual linking
xyz.owl
(with xyz
being an identifier of your modelIn principle, this follows the same procedure, but use a text editor and Turtle format to write the ontology directly. This has the advantage that the resulting data can structured to provide a human-readable form.
copy/compare the header structure from another ontology, minimally something like
<YOUR_ONTOLOGY_URL> a <http://www.w3.org/2002/07/owl#Ontology> .
<YOUR_ONTOLOGY_URL> <http://purl.org/dc/terms/license> <https://creativecommons.org/licenses/by/3.0/> .
if this ontology doesn’t use Turtle format, this can be created using command-line tools like rapper
(raptor
):
$> rapper -i rdfxml olia.owl -t turtle > olia.ttl
olias:hasTag
, rdfs:subClassOf
, rdf:type
, rdfs:comment
; mark line-breaks with If no other documentation is available, a list of tags (along with examples) may be extracted from a corpus, if a language expert can interpret these while creating the annotation model.
In principle, this follows the same procedure, but the initial creation process an be automatized.
Assume that the source data comes in, say, 3 columns (tag, category, label; there can be more columns, or the label column may be missing). If there is no category column, create one manually.
tag | category | label |
---|---|---|
ACC | CASE | accusative |
NOUN | POS | noun |
… | … | … |
Add URI column, populate by =CONCAT(":";$TAG)
(where $TAG
refers to the corresponding cell of the tag column)
tag | category | label | URI |
---|---|---|---|
ACC | CASE | accusative | :ACC |
NOUN | POS | noun | :NOUN |
… | … | … | … |
olias:hasTag
(or variants), one populated with =CONCAT("'";$TAG;"'; ")
(tag value)tag | category | label | URI | property | value |
---|---|---|---|---|---|
ACC | CASE | accusative | :ACC |
olias:hasTagContaining |
'ACC'; |
NOUN | POS | noun | :NOUN |
olias:hasTagContaining |
'NOUN'; |
… | … | … | … | … | … |
=CONCAT(":";$LABEL)
.tag | category | label | URI | property | value | property | class |
---|---|---|---|---|---|---|---|
ACC | CASE | accusative | :ACC |
olias:hasTagContaining |
'ACC'; |
a |
:accusative |
NOUN | POS | noun | :NOUN |
olias:hasTagContaining |
'NOUN'; |
a |
:noun |
… | … | … | … | … | … | … | … |
.
tag | category | label | URI | property | value | property | class | . |
---|---|---|---|---|---|---|---|---|
ACC | CASE | accusative | :ACC |
olias:hasTagContaining |
'ACC'; |
a |
:accusative |
. |
NOUN | POS | noun | :NOUN |
olias:hasTagContaining |
'NOUN'; |
a |
:noun |
. |
… | … | … | … | … | … | … | … | . |
=CONCAT(":";$category)
. Conclude with .
.tag | category | label | URI | property | value | property | class | . | class | property | category | . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ACC | CASE | accusative | :ACC | olias:hasTagContaining | ‘ACC’; | a | :accusative | . | :accusative | rdfs:subPropertyOf | :CASE | . |
NOUN | POS | noun | :NOUN | olias:hasTagContaining | ‘NOUN’; | a | :noun; | . | :noun | rdfs:subPropertyOf | :POS | . |
… | … | … | … | … | … | … | … | … | … | … | … | … |
create a text (Turtle) file with the following header
PREFIX olia: <http://purl.org/olia/olia.owl#>
PREFIX olias: <http://purl.org/olia/system.owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX : <ADD_YOUR_SCHEMA_URL_HERE>
(replace ADD_YOUR_SCHEMA_URL_HERE
by your URL ;)
rapper
(libraptor
library on most linux systems)copy all columns we just created (starting with URI
) at the end of this file:
:ACC olias:hasTagContaining ‘ACC’; a :accusative . :accusative rdfs:subPropertyOf :CASE . :NOUN olias:hasTagContaining ‘NOUN’; a :noun; . :noun rdfs:subPropertyOf :POS . …
To benefit from synergies with OLiA, annotation models should be linked, i.e., rdfs:subClassOf
/rdfs:subPropertyOf
links with http://purl.org/olia/olia.owl
should be created. By convention, these are stored in separate files, using the naming schema xyz-link.rdf
(with xyz
being the acronym of your annotation model) and provided in RDF/XML format.
add import declarations
<YOUR_LINKING_URL> <http://www.w3.org/2002/07/owl#inports> <YOUR_ONTOLOGY_URL>, <http://purl.org/olia/olia.owl>.
for every annotation class and every OLiA class, add an rdfs:subClassOf
statement to express a mapping, e.g.,
:noun rdfs:subClassOf olia:Noun.
Note that you can use the full power of description logics to encode anything more complex, i.e.,
owl:unionOf
,owl:intersectionOf
,owl:inverseOf
. However, applications may chose to ignore such complicated information and simply rely onrdfs:subClassOf
. As multiple applications ofrdfs:subClassOf
are equivalent toowl:intersectionOf
, better avoid the latter wherever possible.
We provide a script to facilitate semiautomated linking for (Unix-style) shell environments. This is the file /tools/link.sh
in this repository.
Semiautomated linking
$> bash -e ./link.sh http://purl.org/olia/olia.owl YOUR_ANNO_MODEL.owl YOUR_ANNO_MODEL-link.rdf
(Replace YOUR_ANNO_MODEL
with the path to your annotation model. Output will be written to YOUR_ANNO_MODEL-link.rdf
.)
Open the resulting model in Protégé and refine manually.