Ontologies of Linguistic Annotation. Machine-readable tagsets and annotation schemata for more than 100 languages.
We assume you start with a document (say, a paper, or a whitepaper)
rdfs:subClassOf properties, so, a feature like, say, Accusative should ideally be a class, not just an individiual)olias:hasTag (= http://purl.org/olia/system.owl#hasTag) property and assign it to one or multiple classes (the latter function is useful for combinatorial tagsets).olias:hasTagContaining (the individual applies if the actual tag contains a substring), olias:hasTagStartingWith, olias:hasTagEndingWith or olias:hasTagMatching (using a regular expression).This is helpful when doing the manual linking
xyz.owl (with xyz being an identifier of your modelIn principle, this follows the same procedure, but use a text editor and Turtle format to write the ontology directly. This has the advantage that the resulting data can structured to provide a human-readable form.
copy/compare the header structure from another ontology, minimally something like
  <YOUR_ONTOLOGY_URL> a <http://www.w3.org/2002/07/owl#Ontology> .
  <YOUR_ONTOLOGY_URL> <http://purl.org/dc/terms/license> <https://creativecommons.org/licenses/by/3.0/> .
if this ontology doesn’t use Turtle format, this can be created using command-line tools like rapper (raptor):
  $> rapper -i rdfxml olia.owl -t turtle > olia.ttl
olias:hasTag, rdfs:subClassOf, rdf:type, rdfs:comment; mark line-breaks with If no other documentation is available, a list of tags (along with examples) may be extracted from a corpus, if a language expert can interpret these while creating the annotation model.
In principle, this follows the same procedure, but the initial creation process an be automatized.
Assume that the source data comes in, say, 3 columns (tag, category, label; there can be more columns, or the label column may be missing). If there is no category column, create one manually.
| tag | category | label | 
|---|---|---|
| ACC | CASE | accusative | 
| NOUN | POS | noun | 
| … | … | … | 
Add URI column, populate by =CONCAT(":";$TAG) (where $TAG refers to the corresponding cell of the tag column)
| tag | category | label | URI | 
|---|---|---|---|
| ACC | CASE | accusative | :ACC | 
        
| NOUN | POS | noun | :NOUN | 
        
| … | … | … | … | 
olias:hasTag (or variants), one populated with =CONCAT("'";$TAG;"'; ") (tag value)| tag | category | label | URI | property | value | 
|---|---|---|---|---|---|
| ACC | CASE | accusative | :ACC | 
      olias:hasTagContaining | 
      'ACC'; | 
    
| NOUN | POS | noun | :NOUN | 
      olias:hasTagContaining | 
      'NOUN'; | 
    
| … | … | … | … | … | … | 
=CONCAT(":";$LABEL).| tag | category | label | URI | property | value | property | class | 
|---|---|---|---|---|---|---|---|
| ACC | CASE | accusative | :ACC | 
      olias:hasTagContaining | 
      'ACC'; | 
      a | 
      :accusative | 
    
| NOUN | POS | noun | :NOUN | 
      olias:hasTagContaining | 
      'NOUN'; | 
      a | 
      :noun | 
    
| … | … | … | … | … | … | … | … | 
.| tag | category | label | URI | property | value | property | class | . | 
|---|---|---|---|---|---|---|---|---|
| ACC | CASE | accusative | :ACC | 
      olias:hasTagContaining | 
      'ACC'; | 
      a | 
      :accusative | 
      . | 
    
| NOUN | POS | noun | :NOUN | 
      olias:hasTagContaining | 
      'NOUN'; | 
      a | 
      :noun | 
      . | 
    
| … | … | … | … | … | … | … | … | . | 
    
=CONCAT(":";$category). Conclude with ..| tag | category | label | URI | property | value | property | class | . | class | property | category | . | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ACC | CASE | accusative | :ACC | olias:hasTagContaining | ‘ACC’; | a | :accusative | . | :accusative | rdfs:subPropertyOf | :CASE | . | 
| NOUN | POS | noun | :NOUN | olias:hasTagContaining | ‘NOUN’; | a | :noun; | . | :noun | rdfs:subPropertyOf | :POS | . | 
| … | … | … | … | … | … | … | … | … | … | … | … | … | 
create a text (Turtle) file with the following header
  PREFIX olia: <http://purl.org/olia/olia.owl#>
  PREFIX olias: <http://purl.org/olia/system.owl#>
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
  PREFIX owl: <http://www.w3.org/2002/07/owl#>
  PREFIX : <ADD_YOUR_SCHEMA_URL_HERE>
(replace ADD_YOUR_SCHEMA_URL_HERE by your URL ;)
rapper (libraptor library on most linux systems)copy all columns we just created (starting with URI) at the end of this file:
:ACC olias:hasTagContaining ‘ACC’; a :accusative . :accusative rdfs:subPropertyOf :CASE . :NOUN olias:hasTagContaining ‘NOUN’; a :noun; . :noun rdfs:subPropertyOf :POS . …
To benefit from synergies with OLiA, annotation models should be linked, i.e., rdfs:subClassOf/rdfs:subPropertyOf links with http://purl.org/olia/olia.owl should be created. By convention, these are stored in separate files, using the naming schema xyz-link.rdf (with xyz being the acronym of your annotation model) and provided in RDF/XML format.
add import declarations
  <YOUR_LINKING_URL> <http://www.w3.org/2002/07/owl#inports> <YOUR_ONTOLOGY_URL>, <http://purl.org/olia/olia.owl>.
for every annotation class and every OLiA class, add an rdfs:subClassOf statement to express a mapping, e.g.,
  :noun rdfs:subClassOf olia:Noun.
Note that you can use the full power of description logics to encode anything more complex, i.e.,
owl:unionOf,owl:intersectionOf,owl:inverseOf. However, applications may chose to ignore such complicated information and simply rely onrdfs:subClassOf. As multiple applications ofrdfs:subClassOfare equivalent toowl:intersectionOf, better avoid the latter wherever possible.
We provide a script to facilitate semiautomated linking for (Unix-style) shell environments. This is the file /tools/link.sh in this repository.
Semiautomated linking
$> bash -e ./link.sh http://purl.org/olia/olia.owl YOUR_ANNO_MODEL.owl YOUR_ANNO_MODEL-link.rdf
(Replace YOUR_ANNO_MODEL with the path to your annotation model. Output will be written to YOUR_ANNO_MODEL-link.rdf.)
Open the resulting model in Protégé and refine manually.