EN
TR
A CORPUS STUDY IN JAPANESE LANGUAGE: THE BALANCED CORPUS OF CONTEMPORARY WRITTEN JAPANESE (BCCWJ)
Abstract
This study describes and informs about the design process and methods of The Balanced Corpus of Contemporary Written Language (BCCWJ), one of the corpora designed by the Center for Corpus Development within National Institute for Japanese Language and Linguistics (NINJAL). There are studies on corpora in different languages around the world. Though the methods of such studies are alike in general, there are also differences depending on the purpose of the corpus design. Since BCCWJ was designed in order to create a balanced corpus, the whole process from the design of the corpus to the selection of samples was carried out in accordance with this purpose. Accordingly, the most significant feature of BCCWJ is that it consists of three sub-corpora: publication sub-corpus, library sub-corpus and special-purpose sub-corpus. Besides, it makes use of two types of sampling: fixed length samples and variable length samples. This corpus, which was prepared in a five-year period and consists of 105 million words, can be taken as an example to design similar corpora in Turkish language
Keywords
Ayrıntılar
Birincil Dil
Türkçe
Konular
-
Bölüm
-
Yayımlanma Tarihi
1 Mayıs 2016
Gönderilme Tarihi
1 Mayıs 2016
Kabul Tarihi
-
Yayımlandığı Sayı
Yıl 2016 Cilt: 45 Sayı: 210