Project results
All basic results regarding the evaluation of the reuse potential of audiovisual, annotated language data, which were achieved within the project duration of the QUEST collaborative project, have been included in the joint guideline "QUEST: Guidelines and Specifications for the Assessment of Audiovisual, Annotated Langauge Data". With regard to the individual work packages, the following work results can be recorded in each case:
WP 1.1 "Data Technology Standards" (sub-project Mannheim)
- Hedeland, Hanna (2022): FAIR-Prinzipien und Qualitätskriterien für Transkriptionsdaten: Empfehlungen und offene Fragen. In: Schwarze, C. & Grawunder, S. (Hrsg.) Transkription und Annotation gesprochener Sprache und multimodaler Interaktion. Konzepte, Probleme, Lösungen. Narr Francke Attempto.
- Arkhangelskiy, Timofey, Hedeland, Hanna & Riaposov, Aleksandr (2021): Evaluating and assuring research data quality for audiovisual annotated language data: In: Navarreta, Constanza / Eskevich, Maria (Hg.): Selected Papers from the CLARIN Annual Conference 2020: Virtual Event, 2020, 5-7 October. (Linköping Electronic Conference Proceddings 180). Linköping: Linköping University Electronic Press. pp. 1-7. https://ecp.ep.liu.se/index.php/clarin/article/view/1/1.
- Hedeland, Hanna (2021): Towards comprehensive definitions of data quality for audiovisual annotated language resources. In: Navarreta, Constanza / Eskevich, Maria (Ed.): Selected Papers from the CLARIN Annual Conference 2020: Virtual Event, 2020, 5-7 October. (Linköping Electronic Conference Proceddings 180). Linköping: Linköping University Electronic Press. pp. 93-103. https://ids-pub.bsz-bw.de/frontdoor/deliver/index/docId/10518/file/Hedeland_Towards_comprehensive_definitions_2021.pdf.
WP 1.2 "Quality Standards for Metadata" (sub-project Cologne)
- Rau, F., Majka, N. & Schwiertz, G. (2022). Metadata Recommendations for Audio-Visual Language Data. DOI: 10.5281/zenodo.7346840
- Seyfeddinipur, Mandana & Rau, Felix (2020): "Keeping it real: Video data in language documenation and language archiving". In: Language Documentation and Conservation 14, pp. 503-514. URL: https://scholarspace.manoa.hawaii.edu/handle/10125/24965.
WP 2.1 "Curation Criteria for Language Typology Secondary Use" (sub-project Berlin)
- Aznar, Jocelyn & Seifart, Frank (2022): The RefCo Toolkit. Zenodo. https://zenodo.org/record/7380448#.Y6A1HH2ZNPY.
- Aznar, Jocelyn (2022): Nisvai DoReCo dataset. In: Seifart, Frank, Ludger Paschen and Matthew Stave (eds.). Language Documentation Reference Corpus (DoReCo) 1.1. Berlin & Lyon: Leibniz-Zentrum Allgemeine Sprachwissenschaft & laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). https://doreco.huma-num.fr/languages/nisv1234 (Accessed on 13/09/2022). DOI:10.34847/nkl.2801565f.
- Krifka, Manfred (2022): Daakie DoReCo dataset. In: Seifart, Frank, Ludger Paschen and Matthew Stave (eds.). Language Documentation Reference Corpus (DoReCo) 1.1. Berlin & Lyon: Leibniz-Zentrum Allgemeine Sprachwissenschaft & laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). https://doreco.huma-num.fr/languages/port1286
(Accessed on 13/09/2022). DOI:10.34847/nkl.efeav5l9. - Seifart, Frank, Ludger Paschen & Matthew Stave (eds.) (2022): Language Documentation Reference Corpus (DoReCo) 1.1. Berlin & Lyon: Leibniz-Zentrum Allgemeine Sprachwissenschaft & laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). DOI:10.34847/nkl.7cbfq779.
- Seifart, Frank (2022): Bora DoReCo dataset. In: Seifart, Frank, Ludger Paschen and Matthew Stave (eds.). Language Documentation Reference Corpus (DoReCo) 1.1. Berlin & Lyon: Leibniz-Zentrum Allgemeine Sprachwissenschaft & laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). https://doreco.huma-num.fr/languages/bora1263 (Accessed on 13/09/2022). DOI:10.34847/nkl.6eaf5laq.
- Seifart, Frank (2022): Resígaro DoReCo dataset. In: Seifart, Frank, Ludger Paschen and Matthew Stave (eds.). Language Documentation Reference Corpus (DoReCo) 1.1. Berlin & Lyon: Leibniz-Zentrum Allgemeine Sprachwissenschaft & laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon 2). https://doreco.huma-num.fr/languages/resi1247
(Accessed on 13/09/2022). DOI:10.34847/nkl.ffb96lo8. - Aznar, Jocelyn & Seifart, Frank (2020): RefCo: An initiative to develop a set of quality criteria for fieldwork corpora. 2èmes journées scientifiques du Groupement de Recherche Linguistique Informatique Formelle et de Terrain (LIFT), Montrouge, France. pp. 95-101. https://hal.archives-ouvertes.fr/hal-03047143/document.
- von Prince, Kilu & Sebastian Nordhoff (2020): An Empirical Evaluation of Annotation Practices in Corpora from Language Documentation. In: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020). https://www.aclweb.org/anthology/2020.lrec-1.338.pdf.
WP 2.2 " Curation Criteria for Linguistic Secondary Use of Multilingual Data" (sub-project Hamburg)
- Arestau, Elena (2022): Curation of Learner Corpora. PDF
- Arestau, Elena (2022): Curation of Interpreted Corpora Using The Example of ComInDat. PDF
- Isard, Amy & Arestau, Elena (2022): Curation Criteria for Multimodal and Multilingual Data: A Mixed Study within the QUEST project. In: Navarreta, Constanza / Eskevich, Maria (Ed.): Selected Papers from the CLARIN Annual Conference 2021: Virtual Event, 2021, 27 - 29 September. (Linköping Electronic Conference Proceddings 189). Linköping: Linköping University Electronic Press. S. 56-68. https://ecp.ep.liu.se/index.php/clarin/article/view/417/375.
WP 2.3 "Curation Criteria for Multimodal Data" (sub-project Hamburg)
- Isard, Amy & Arestau, Elena (2022): Curation Criteria for Multimodal and Multilingual Data: A Mixed Study within the QUEST project. In: Navarreta, Constanza / Eskevich, Maria (Ed.): Selected Papers from the CLARIN Annual Conference 2021: Virtual Event, 2021, 27 - 29 September. (Linköping Electronic Conference Proceddings 189). Linköping: Linköping University Electronic Press. S. 56-68. https://ecp.ep.liu.se/index.php/clarin/article/view/417/375.
- Isard, Amy (2020): Approaches to the Anonymisation of Sign Language Corpora. In: Proceedings of the 9th Workshop on the Representation and Processing of Sign Languages (LREC-2020 workshop). https://www.aclweb.org/anthology/2020.signlang-1.15.pdf.
WP 2.4 "Curation Criteria for Secondary Use in the "Third-Mission"" (sub-project Mannheim & Berlin)
- Nordhoff, Sebastian (2020). Modelling and annotating interlinear glossed text from 280 different endangered languages as Linked Data with LIGT. In: Proceedings of the 14th Linguistic Annotation Workshop (LAW XIV). https://www.aclweb.org/anthology/2020.law-1.9.pdf.
- Nordhoff, Sebastian (2020): From the attic to the cloud: mobilization of endangered lanuage resources with linked data. In: Proceedings of LR4SSHOC: Workshop about Language Resources for the SSH Cloud (LREC-2020). https://www.aclweb.org/anthology/2020.lr4sshoc-1.3.pdf.
- von Prince, Kilu & Sebastian Nordhoff (2020): An Empirical Evaluation of Annotation Practices in Corpora from Language Documentation. In: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020). https://www.aclweb.org/anthology/2020.lrec-1.338.pdf.
WP 3 "Quality Assurance Measures" (sub-project Hamburg)
Within the framework of Work Package 3 "Quality Assurance Measures", various tools and workflows have been developed for evaluating the quality or reuse potential of audiovisual, annotated language data. These include a questionnaire that asks for the basic characteristics of a resource and a wide range of automatic checkers that allow the majority of the quality standards and curation criteria developed in the work packages to be checked automatically. A comprehensive documentation of the automatic quality assurance methods is available via the corpus-services wiki in GitLab. The automatic quality checks are also accessible via GitLab.
- Arkhangelskiy, Timofey, Hedeland, Hanna & Riaposov, Aleksandr (2021): Evaluating and assuring research data quality for audiovisual annotated language data: In: Navarreta, Constanza / Eskevich, Maria (Hg.): Selected Papers from the CLARIN Annual Conference 2020: Virtual Event, 2020, 5-7 October. (Linköping Electronic Conference Proceddings 180). Linköping: Linköping University Electronic Press. pp. 1-7. https://ecp.ep.liu.se/index.php/clarin/article/view/1/1.