This website brings to you a corpus of aligned stories from the Kiranti mythological cycle. In the course of doing fieldword on three Rai languages, Koyi, Thulung and Khaling, it became apparent that the traditional stories that were recorded were in fact versions of the same story. It is these stories that make up the corpus on this website: stories that have been collected in at least two versions, with those versions representing differences in speaker, dialect or language.

Typological interest

The typological interest of this type of comparable corpus is that it makes it possible to compare the ways the same event is described in different languages, but based on native narrative content as opposed to foreign content, be it image-based stimuli (Pear Story, Frog where are you, ...) or narrative (New Testament stories, ...)

Concept of comparable corpus

The basic concept of a comparable corpus is that the stories are collected independently for each language, and when the stories turn out the be the same, they are considered to be from the same proto-myth. The different versions (speakers, dialects, languages) are lined up side by side in pairs, and the segments that share narrative content are marked as being part of a "similarity". This is done in pairs for all versions of a given story, with the Similarities recorded in an excel spreadsheet and converted into a tag. When looking through the stories, sentences participating in a Similarity pairing are signaled by a special label; clicking on that label makes it possible to see all versions of the story which have been identified as sharing that Similarity. The idea for this type of alignment comes from parallel corpora--essentially translation equivalents of texts which are aligned sentence by sentence for the purposes of training Computer Assisted Translation software. Parallel corpora, because of the size limits on the corpora resulting from the need for translated material, has given way to Comparable corpora, which are aligned corpora of similar but not identical texts (for example, two newspaper articles describing the same sports event could be the basis for a comparable corpus). The Kiranti myths that make up our corpus presumably have the same origin, but because they have evolved in the different languages they are told in, they can be considered to be a comparable rather than parallel corpus.

Khaling is spoken in a number of villages in Solukhumbu district, with the data here coming from the villages of Phuleli and Kanku. The speaker population is the most vibrant of the three Kiranti languages in our corpus, with children learning the language, leading to a speaker population estimated at around 10,000 individuals.
Koyi (members of the community have adopted the spelling Koyee) is spoken in the village of Sungdel in Khotang district. There are very few speakers, perhaps around 1000. The language is the most endangered of the three, with considerable borrowing from Nepali, particularly for nouns (whereas verbs, which have distinct paradigms complete with stem alternations, tend not to be borrowed as frequently).
The Thulung stories in the corpus are from two different dialects, that of Mukli, considered the homeland of the Thulung, and from Deusa, further north and sharing a border with Khaling territory. While Ethnologue lists a population size of 20,000 based on the 2011 census, the actual number of speakers is considerably lower, of the order of 2000 full speakers.
Corpus and dictionaries compiled with ANR funding.