The project Linked Open Dictionaries (LiODi) aims at developing methodologies and algorithmic solutions to facilitate research on comparative lexicography in the context of linguistic, cultural, sociological and historical studies. In particular, we develop a workbench that facilitates the cross-linguistic search of semantically and / or phonologically related words in various languages. Along with applications in the humanities, it also demonstrates the potential of (Linguistic) Linked Open Data for research problems in philology, historical sciences and linguistics.
LiODi is a joint effort of the Applied Computational Linguistics (ACoLi) lab at the Institute of Computer Science and the Institute of Empirical Linguistics at Goethe University Frankfurt, Germany. We closely collaborate with both institutes in research and teaching, as well as with the Frankfurt Centre for the Digital Foundation of Research in the Humanities, Social, and Educational Sciences (CEDIFOR). The project is funded by the German Ministry for Education and Research as an Independent Research Group on eHumanities. In the pilot phase (December 1st, 2015 - November 30th, 2016), we focus on Turkic languages and selected contact languages (esp., Iranian, Slavic, Caucasian, Arabic), in the main phase (Januar 1st, 2017 - December 31st, 2020), we extend the workbench and our methodology to other language families and language-indepdent functionalities.
Primary LiODi contributions:
- data: We create machine-readable dictionaries in accordance with conventions of the Linguistic Linked Open Data (LLOD) community
- standards: We contribute to the development of community standards for representing language resources. This includes lexical data (dictionaries, word lists), interlinear glossed text and other forms of linguistic annotation
- solutions: We develop tools and algorithms to facilitate language contact studies over lexical and corpus data. This includes, among other aspects, routines for detecting semantically and phonologically similar words between different languages. Taken together, both components facilitate the detection of possible cognates. Other, more elementary tools include software for converting, validating and querying linguistic data addressed in our project.
- case studies: Solutions and data developed in the project are an integral component of the qualification projects of different group members. Linguistic research problems addressed include contact phenomena among North-East Caucasian and neighboring languages in Azerbaidshan, language contact between Georgian and Batsbi in Georgia, and the distribution of loan words in early modern Armenian.
Beyond that, we are very active in dissemination and community building. We organize international community events (summer schools, workshops, conferences) on the topic of linked data in linguistics. In particular, this includes
- the 1st Conference on Language, Data and Knowledge (LDK-2017), Galway, Ireland, June 2017
- the 2nd Summer Datathon on Linguistic Linked Open Data, SD-LLOD-17, Madrid, Spain, June 2017
- the 6th Workshop on Linked Data in Linguistics (LDL-2018), Miyazaki, Japan, May 2018
- the 3rd Summer Datathon on Linguistic Linked Open Data, SD-LLOD-19, Schloss Dagstuhl, Wadern, Germany, May 2019
- the 2nd Conference on Language, Data and Knowledge (LDK-2019), Leipzig, Germany, May 2019
- the 7th Workshop on Linked Data in Linguistics (LDL-2020), Marseille, France, May 2020