Research Article
BibTex RIS Cite

Arabic Algerian Oranee Dialectal Language Modelling Oriented Topic

Year 2019, Volume: 2 Issue: 2, 1 - 14, 30.12.2019

Abstract

The Modern Standard Arabic (MSA) is the formal language used in the Arab world. In Algeria, the MSA and other varieties of informal Arabic dialects are used in the everyday matter communication. These dialects are by no means subject to further regional variations: eastern, western, central or southern. The Oranee dialect is the most important and used one in the west of Algeria. However, it is an under-resourced language, which lacks both audio and textual corpora. In this paper, we present the most particularities of this western Algerian dialect and introduce a natural language processing on an Oranee textual corpus. A MSA transcribed discourse could contain some dialect vocabularies and viceversa. Therefore, we propose to interpolate dialectal language models and MSA ones with respect to some topics. The best obtained interpolation weights are related to Religion topic data.  

References

  • Biadsy, F., Hirschberg, J., Habash, N. : Spoken Arabic dialect identi cation using phonotactic modeling. In: the eacl 2009 workshop on computational approache to semitic languages. Association for Computational Linguistics, 2009, pp. 53--61 (2009)
  • Shoufan, A., Alameri, S.: Natural language processing for dialectical Arabic: A Survey. In Proceedings of the Second Workshop on Arabic Natural Language Processing, pp. 36-48 (2015)
  • Zaghouani, W.: Critical survey of the freely available Arabic corpora. arXiv preprintarXiv:1702.07835, (2017)
  • Droua-Hamdani, G., Selouani S.A. and Boudraa, M.: Algerian Arabic Speech Database (ALGASD): Corpus Design and Automatic Speech Recognition Application. In The Arabian Journal for Science and Engineering, 35(2C), pp.157--166, (2010
  • Droua-Hamdani, G., Alotaibi, Y. A., Selouani S.A. and Boudraa, M.: Rhythmic Feature across Modern Standard Arabic and Arabic Dialects. In Proceedings of Workshop on free/Open Source Arabic corpora and corpora processing tools, pp.43--46, (2014)
  • Meftouh, N., Bouchemal, S., Smaili, K.: A study of a non-resourced language: an algerian dialect. 3rd workshop on spoken language technologies of under-resourced languages. Cape Town, South Africa. (2012)
  • Harrat, S., Meftouh, K., Abbes, M., Smaili, K.: Building resourced for Algerian Arabic dialects. In proceedings of annual conference of the international communication association (interspeech), Singapore. (2014)
  • Harrat, S., Meftouh, K., Abbas, M., Hidouci, K. W., Smaili, K.: An Algerian dialect: Study and Resources. In International Journal of Advanced Computer Science and Applications, pp. 384--395, (2016).
  • Harrat, S., Meftouh, M., Smaili, K. : Creating parallel Arabic Dialect Corpus: pitfalls to avoid. In proceedings of international conference on computational Linguistics and intelligent text processing. Budapest, Hungary, (2017)
  • Meftouh, K., Harrat, S., Jamoussi, S., Abbes, M., Smaili, K. Machine translation experiments on PADIC: a parallel Arabic dialectl corpus. In proceedings of 29th paci c Asia conference on language, information and computation, Shanghai, China, (2015)
  • Bougrine, S., Cherroun, H., Ziadi, D. Lakhdari, A., and Chorana, A. (2016). Toward a rich Arabic Speech Parallel Corpus for Algerian sub-Dialects. In proceedings of the 2nd Workshop on Arabic Corpora and Processing Tools. Theme: Social Media, pp. 2--10, (2016)
  • Bougrine, S., Chorana, A., Lakhdari, A., and Cherroun, H. Toward a Web-based Speech Corpus for Algerian Arabic Dialectal Varieties. In proceedings of the 3rd Arabic Natural Language Processing. Workshop WANLP, Spain, pp. 138--146, (2017)
  • Djellab, M., Amrouche, A., Bouridane, A., Mehallegue, N.: Algerian Modern Colloquial Arabic Speech Corpus (AMCASC): regional accents recognition within complex socio-linguistic environments. In Language Resources and Evaluation, 51 (3), pp. 613--641, (2017)
  • Labed, Z. : Genealogical koineisation in Oran speech community: the case of young university oranees. Phd Thesis, University of Oran, (2014)
  • Stolcke A. : SRILM-an extensible language modeling toolkit. In InterSpeech, (2002)
  • Stolcke, J. Zheng, W. Wang, and V. Abrash. Srilm at sixteen: Update and outlook. In Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, pp. 5,(2011).
  • Mezzoudj, F., Langlois, D., Jouvet, D., Benyettou, A. : Textual data selection for language modelling in the scope of automatic speech recognition. Procedia Computer Science, pp 55--64, (2018)
  • Mezzoudj, F., Benyettou, A. An empirical study of statistical language models: n-gram language models vs. neural network language models. International Journal of Innovative Computing and Applications, 9(4), pp 89--202, (2018)
  • Ahmed Abdelali, A., Darwish, K., Durrani, N., Mubarak, H. Farasa: A Fast and Furious Segmenter for Arabic. NAACL, (2016)
  • Helmy, M., Basaldella, M., Maddalena, E., Mizzaro, S., Demartini, G.: Towards building a standard dataset for arabic keyphrase extraction evaluation. International Conference on Asian Language Processing, IALP, pp. 26-29, (2016)
Year 2019, Volume: 2 Issue: 2, 1 - 14, 30.12.2019

Abstract

References

  • Biadsy, F., Hirschberg, J., Habash, N. : Spoken Arabic dialect identi cation using phonotactic modeling. In: the eacl 2009 workshop on computational approache to semitic languages. Association for Computational Linguistics, 2009, pp. 53--61 (2009)
  • Shoufan, A., Alameri, S.: Natural language processing for dialectical Arabic: A Survey. In Proceedings of the Second Workshop on Arabic Natural Language Processing, pp. 36-48 (2015)
  • Zaghouani, W.: Critical survey of the freely available Arabic corpora. arXiv preprintarXiv:1702.07835, (2017)
  • Droua-Hamdani, G., Selouani S.A. and Boudraa, M.: Algerian Arabic Speech Database (ALGASD): Corpus Design and Automatic Speech Recognition Application. In The Arabian Journal for Science and Engineering, 35(2C), pp.157--166, (2010
  • Droua-Hamdani, G., Alotaibi, Y. A., Selouani S.A. and Boudraa, M.: Rhythmic Feature across Modern Standard Arabic and Arabic Dialects. In Proceedings of Workshop on free/Open Source Arabic corpora and corpora processing tools, pp.43--46, (2014)
  • Meftouh, N., Bouchemal, S., Smaili, K.: A study of a non-resourced language: an algerian dialect. 3rd workshop on spoken language technologies of under-resourced languages. Cape Town, South Africa. (2012)
  • Harrat, S., Meftouh, K., Abbes, M., Smaili, K.: Building resourced for Algerian Arabic dialects. In proceedings of annual conference of the international communication association (interspeech), Singapore. (2014)
  • Harrat, S., Meftouh, K., Abbas, M., Hidouci, K. W., Smaili, K.: An Algerian dialect: Study and Resources. In International Journal of Advanced Computer Science and Applications, pp. 384--395, (2016).
  • Harrat, S., Meftouh, M., Smaili, K. : Creating parallel Arabic Dialect Corpus: pitfalls to avoid. In proceedings of international conference on computational Linguistics and intelligent text processing. Budapest, Hungary, (2017)
  • Meftouh, K., Harrat, S., Jamoussi, S., Abbes, M., Smaili, K. Machine translation experiments on PADIC: a parallel Arabic dialectl corpus. In proceedings of 29th paci c Asia conference on language, information and computation, Shanghai, China, (2015)
  • Bougrine, S., Cherroun, H., Ziadi, D. Lakhdari, A., and Chorana, A. (2016). Toward a rich Arabic Speech Parallel Corpus for Algerian sub-Dialects. In proceedings of the 2nd Workshop on Arabic Corpora and Processing Tools. Theme: Social Media, pp. 2--10, (2016)
  • Bougrine, S., Chorana, A., Lakhdari, A., and Cherroun, H. Toward a Web-based Speech Corpus for Algerian Arabic Dialectal Varieties. In proceedings of the 3rd Arabic Natural Language Processing. Workshop WANLP, Spain, pp. 138--146, (2017)
  • Djellab, M., Amrouche, A., Bouridane, A., Mehallegue, N.: Algerian Modern Colloquial Arabic Speech Corpus (AMCASC): regional accents recognition within complex socio-linguistic environments. In Language Resources and Evaluation, 51 (3), pp. 613--641, (2017)
  • Labed, Z. : Genealogical koineisation in Oran speech community: the case of young university oranees. Phd Thesis, University of Oran, (2014)
  • Stolcke A. : SRILM-an extensible language modeling toolkit. In InterSpeech, (2002)
  • Stolcke, J. Zheng, W. Wang, and V. Abrash. Srilm at sixteen: Update and outlook. In Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, pp. 5,(2011).
  • Mezzoudj, F., Langlois, D., Jouvet, D., Benyettou, A. : Textual data selection for language modelling in the scope of automatic speech recognition. Procedia Computer Science, pp 55--64, (2018)
  • Mezzoudj, F., Benyettou, A. An empirical study of statistical language models: n-gram language models vs. neural network language models. International Journal of Innovative Computing and Applications, 9(4), pp 89--202, (2018)
  • Ahmed Abdelali, A., Darwish, K., Durrani, N., Mubarak, H. Farasa: A Fast and Furious Segmenter for Arabic. NAACL, (2016)
  • Helmy, M., Basaldella, M., Maddalena, E., Mizzaro, S., Demartini, G.: Towards building a standard dataset for arabic keyphrase extraction evaluation. International Conference on Asian Language Processing, IALP, pp. 26-29, (2016)
There are 20 citations in total.

Details

Primary Language English
Subjects Software Engineering (Other)
Journal Section Articles
Authors

Freha Mezzoudj 0000-0001-6763-0195

Mourad Loukam This is me

Fatma Zohra Belkredim This is me

Publication Date December 30, 2019
Acceptance Date January 29, 2020
Published in Issue Year 2019 Volume: 2 Issue: 2

Cite

APA Mezzoudj, F., Loukam, M., & Belkredim, F. Z. (2019). Arabic Algerian Oranee Dialectal Language Modelling Oriented Topic. International Journal of Informatics and Applied Mathematics, 2(2), 1-14.
AMA Mezzoudj F, Loukam M, Belkredim FZ. Arabic Algerian Oranee Dialectal Language Modelling Oriented Topic. IJIAM. December 2019;2(2):1-14.
Chicago Mezzoudj, Freha, Mourad Loukam, and Fatma Zohra Belkredim. “Arabic Algerian Oranee Dialectal Language Modelling Oriented Topic”. International Journal of Informatics and Applied Mathematics 2, no. 2 (December 2019): 1-14.
EndNote Mezzoudj F, Loukam M, Belkredim FZ (December 1, 2019) Arabic Algerian Oranee Dialectal Language Modelling Oriented Topic. International Journal of Informatics and Applied Mathematics 2 2 1–14.
IEEE F. Mezzoudj, M. Loukam, and F. Z. Belkredim, “Arabic Algerian Oranee Dialectal Language Modelling Oriented Topic”, IJIAM, vol. 2, no. 2, pp. 1–14, 2019.
ISNAD Mezzoudj, Freha et al. “Arabic Algerian Oranee Dialectal Language Modelling Oriented Topic”. International Journal of Informatics and Applied Mathematics 2/2 (December 2019), 1-14.
JAMA Mezzoudj F, Loukam M, Belkredim FZ. Arabic Algerian Oranee Dialectal Language Modelling Oriented Topic. IJIAM. 2019;2:1–14.
MLA Mezzoudj, Freha et al. “Arabic Algerian Oranee Dialectal Language Modelling Oriented Topic”. International Journal of Informatics and Applied Mathematics, vol. 2, no. 2, 2019, pp. 1-14.
Vancouver Mezzoudj F, Loukam M, Belkredim FZ. Arabic Algerian Oranee Dialectal Language Modelling Oriented Topic. IJIAM. 2019;2(2):1-14.

International Journal of Informatics and Applied Mathematics