Research Article

Lip Reading Using Various Deep Learning Models with Visual Turkish Data

Volume: 37 Number: 3 September 1, 2024
EN

Lip Reading Using Various Deep Learning Models with Visual Turkish Data

Abstract

In Human-Computer Interaction, lip reading is essential and still an open research problem. In the last decades, there have been many studies in the field of Automatic Lip-Reading (ALR) in different languages, which is important for societies where the essential applications developed. Similarly to other machine learning and artificial intelligence applications, Deep Learning (DL) based classification algorithms have been applied for ALR in order to improve the performance of ALR. In the field of ALR, few studies have been done on the Turkish language. In this study, we undertook a multifaceted approach to address the challenges inherent to Turkish lip reading research. To begin, we established a foundation by creating an original dataset meticulously curated for the purpose of this investigation. Recognizing the significance of data quality and diversity, we implemented three robust image data augmentation techniques: sigmoidal transform, horizontal flip, and inverse transform. These augmentation methods not only elevated the quality of our dataset but also introduced a rich spectrum of variations, thereby bolstering the dataset's utility. Building upon this augmented dataset, we delved into the application of cutting-edge DL models. Our choice of models encompassed Convolutional Neural Networks (CNN), known for their prowess in extracting intricate visual features, Long-Short Term Memory (LSTM), adept at capturing sequential dependencies, and Bidirectional Gated Recurrent Unit (BGRU), renowned for their effectiveness in handling complex temporal data. These advanced models were selected to leverage the potential of the visual Turkish lip reading dataset, ensuring that our research stands at the forefront of this rapidly evolving field. The dataset utilized in this study was gathered with the primary objective of augmenting the extant corpus of Turkish language datasets, thereby substantively enriching the landscape of Turkish language research while concurrently serving as a benchmark reference. The performance of the applied method has been compared regarding precision, recall, and F1 metrics. According to experiment results, BGRU and LSTM models gave the same results up to the fifth decimal, and BGRU had the fastest training time.

Keywords

Supporting Institution

Aselsan-Bites

Project Number

N/A

Thanks

We are grateful for endless support to Recai Yavuz.

References

  1. [1] Fisher, C. G., “Confusions among visually perceived consonants”, Journal of Speech, Language, and Hearing Research, 11(4): 796–804, (1968).
  2. [2] Easton, R. D., and Basala, M., “Perceptual dominance during lipreading”, Perception and Psychophysics, 32(6): 562–570, (1982).
  3. [3] Lesani, F. S., Ghazvini, F. F., and Dianat, R., “Mobile phone security using automatic lip reading", 9th International Conference on e-Commerce in Developing Countries: With focus on e-Business, Isfahan, Iran, 2015, 1-5, (2015).
  4. [4] Mathulaprangsan, S., Wang, C. Y., Frisky, A. Z. K., Tai, T. C., and Wang, J. C., “A survey of visual lip reading and lip-password verification”, International Conference on Orange Technologies (ICOT), Hong Kong, China, 22-25, (2015).
  5. [5] Bahdanau, D., Chorowski J., Serdyuk D., Brakel P., and Bengio Y., “End-to-end attention-based large vocabulary speech recognition”, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, China, 4945-4949, (2016).
  6. [6] Huang, J. T., Li, J., and Gong, Y., “An analysis of convolutional neural networks for speech recognition”, IEEE International Conference on Acoustics, Speech and Signal Processing, South Brisbane, QLD, Australia, 4989–4993, (2015).
  7. [7] Miao, Y., Gowayyed, M., Metze, and F., “EESEN: End-to-end speech recognition using deep RNN models and WFSTbased decoding”, IEEE Workshop on Automatic Speech Recognition and Understanding, 167–174, (2016).
  8. [8] Hyunmin, C., Kang, C. M., Kim, B., Kim, J., Chung, C. C., and Choi, W., “Autonomous Braking System via Deep Reinforcement Learning”, ArXiv, abs/1702.02302, (2017).

Details

Primary Language

English

Subjects

Engineering

Journal Section

Research Article

Early Pub Date

January 15, 2024

Publication Date

September 1, 2024

Submission Date

January 19, 2023

Acceptance Date

November 15, 2023

Published in Issue

Year 2024 Volume: 37 Number: 3

APA
Berkol, A., Tümer Sivri, T., & Erdem, H. (2024). Lip Reading Using Various Deep Learning Models with Visual Turkish Data. Gazi University Journal of Science, 37(3), 1190-1203. https://doi.org/10.35378/gujs.1239207
AMA
1.Berkol A, Tümer Sivri T, Erdem H. Lip Reading Using Various Deep Learning Models with Visual Turkish Data. Gazi University Journal of Science. 2024;37(3):1190-1203. doi:10.35378/gujs.1239207
Chicago
Berkol, Ali, Talya Tümer Sivri, and Hamit Erdem. 2024. “Lip Reading Using Various Deep Learning Models With Visual Turkish Data”. Gazi University Journal of Science 37 (3): 1190-1203. https://doi.org/10.35378/gujs.1239207.
EndNote
Berkol A, Tümer Sivri T, Erdem H (September 1, 2024) Lip Reading Using Various Deep Learning Models with Visual Turkish Data. Gazi University Journal of Science 37 3 1190–1203.
IEEE
[1]A. Berkol, T. Tümer Sivri, and H. Erdem, “Lip Reading Using Various Deep Learning Models with Visual Turkish Data”, Gazi University Journal of Science, vol. 37, no. 3, pp. 1190–1203, Sept. 2024, doi: 10.35378/gujs.1239207.
ISNAD
Berkol, Ali - Tümer Sivri, Talya - Erdem, Hamit. “Lip Reading Using Various Deep Learning Models With Visual Turkish Data”. Gazi University Journal of Science 37/3 (September 1, 2024): 1190-1203. https://doi.org/10.35378/gujs.1239207.
JAMA
1.Berkol A, Tümer Sivri T, Erdem H. Lip Reading Using Various Deep Learning Models with Visual Turkish Data. Gazi University Journal of Science. 2024;37:1190–1203.
MLA
Berkol, Ali, et al. “Lip Reading Using Various Deep Learning Models With Visual Turkish Data”. Gazi University Journal of Science, vol. 37, no. 3, Sept. 2024, pp. 1190-03, doi:10.35378/gujs.1239207.
Vancouver
1.Ali Berkol, Talya Tümer Sivri, Hamit Erdem. Lip Reading Using Various Deep Learning Models with Visual Turkish Data. Gazi University Journal of Science. 2024 Sep. 1;37(3):1190-203. doi:10.35378/gujs.1239207

Cited By