In 2009, The Icelandic LT research group received a three years Grant of Excellence from the Icelandic Research Fund to the project Viable Language Technology beyond English – Icelandic as a test case. Our primary objective is to make it realistic to develop three particular types of LT modules with limited resources without sacrificing the quality of the work. The three types of modules are a database of semantic relations, a machine translation system, and a treebank. These modules are chosen because they are central to current LT work and prerequisites for further research and development in Icelandic LT. The project will emphasize the following points:
- Developing methodologies for creating resources for new languages more efficiently, with focus on semi-automatic/ machine assisted resource generation;
- An inquiry into linguistic issues that are of little relevance for English LT but crucial for many other languages, with a special focus on general methods to deal with morphological richness and morphological ambiguity;
- A case study of Icelandic where we use the tools and methods developed to build a treebank, a database of semantic relations and a machine translation system;
- Evaluation of the tools and methods developed – focusing on quality of output as well as the output/manpower ratio;
- Writing and publishing guidelines for creating similar LT modules for less-resourced and/or morphologically rich languages;
- Enhancing research training in the field by giving graduate students the opportunity to work on research projects, as it is vital for the future of Icelandic LT to educate and train young researchers in the field.
In short, the project emphasizes the development of viable research methods and practical solutions that will strengthen Icelandic LT and serve as a model for other less-resourced languages.
The project has as its main objective to develop scientific LT methods that are suited for less-resourced languages. To this effect, we will revise research methods and adapt them to Icelandic; build on the special characteristics of Icelandic to devise low cost methods that make it possible to create resources and tools using less effort than previously considered necessary; and make use of the interdisciplinary character of the research group, our experience from previous projects and our collaboration with prominent foreign researchers by combining methods from different disciplines.
We will develop research methods and resources within three different fields: semantic mining and semantic networks, shallow transfer translation, and parsing methods and treebanking. Emphasis will be laid on the importance of a fruitful interplay between linguistic and statistical methods.
The impact of the project is scientific, technical, economic, and cultural. It will deliver two Ph.D. dissertations, three Master’s theses, and several conference papers and journal articles and lay the ground for the building of important linguistic resources. The scientific value of the project lies both in the foundation that it lays for future research and development within Icelandic LT and in the new methods that will be developed for analyzing Icelandic, but also have a more general value, especially for inflectional languages.
Eiríkur Rögnvaldsson, Professor, University of Iceland
Hrafn Loftsson, Ph.D., Assistant Professor, Reykjavik University
Matthew Whelpton, Ph.D., Associate Professor, University of Iceland
Kristín Bjarnadóttir, Assistant Research Professor, ÁM Institute for Icelandic Studies
Sigrún Helgadóttir, Researcher, ÁM Institute for Icelandic Studies
Anna Björk Nikulásdóttir, doctoral student, University of Iceland
Anton Karl Ingason, master’s student, University of Iceland
Martha Dís Brandt, master’s student, Reykjavík University
Joel Wallenberg, doctoral student, University of Pennsylvania
Anthony Kroch, Ph.D., Professor, University of Pennsylvania
Mikel L. Forcada, Ph.D., Professor, Universitat d’Alacant