INEL Selkup Corpus 1.0 finalisied
14 July 2020, by INEL-Webredaktion

Photo: Alexandre Arkhipov
We are very happ to announce that an updated version (now 1.0) of the INEL Selkup Corpus has been completed.
Brykina, Maria; Orlova, Svetlana; Wagner-Nagy, Beáta. 2020. INEL Selkup Corpus. Version 1.0. Publication date 2020-06-30. Archived in Hamburger Zentrum für Sprachkorpora. http://hdl.handle.net/11022/0000-0007-E1D5-A In: Wagner-Nagy, Beáta; Arkhipov, Alexandre; Ferger, Anne; Jettka, Daniel; Lehmberg, Timm (eds.). The INEL corpora of indigenous Northern Eurasian languages.
- The current publication contains 264 texts by 74 speakers for North, Central and South Selkup dialects. In total, there are 7887 sentences and 42466 words.
- Many texts were provided with (partial) annotations for syntactic functions and semantic roles.
- Corrections were made in the audio transcriptions, glossing and other annotations.
- The user documentation (in English) is available here.
- The corpus can be searched web-based by using the Tsakorpus platform.
About the corpus
Selkup is an endangered Samoyedic language (from the Uralic language family) that was spoken in many small settlements spread over the large territory of Western Siberia. The INEL Selkup Corpus is composed of texts from the archive of Angelina Ivanovna Kuzmina (1924-2002). Kuzmina collected a large amount of material on Selkup in almost all the regions where Selkups lived between 1962 and 1977. Most of the texts come from the handwritten part of the archive, which Kuzmina transferred to Hamburg in 2001. The other texts come from her audio recordings, which were digitised in 2001 and later transcribed and translated in the INEL project.
The corpus is published under the licence CC BY-NC-SA 4.0. Parallel to the online search, a complete archive of the corpus files can be downloaded and searched with the programme EXAKT from the EXMARaLDA software system.
Views of individual texts are available online under the tab "Sessions" on the corpus page. Each text can be viewed directly in online formats (e.g. Visualisation: Score) or downloaded in EXB format (EXMARaLDA file format, convertible to ELAN). The sources of the texts, i.e. scanned pages (PDF) or audio files (WAV, MP3) can also be viewed and downloaded.
Please do not hesitate to send us your comments and suggestions: inel@uni-hamburg.de( inel"AT"uni-hamburg.de).