Review

A Survey on Lip-Reading with Deep Learning

Volume: 14 Number: 2 July 31, 2022
EN

A Survey on Lip-Reading with Deep Learning

Abstract

Very successful results have been obtained in areas such as computer vision and voice recognition when applying deep learning methods. Technologies that facilitate the lives of people have been developed as a result of the successes of deep learning within these areas. One of these technologies is voice recognition devices. Research has shown that these devices do not give good results in noisy environments; although, they do give good results in silent environments. With deep learning methods, voice recognition in noisy environments can be achieved using visual signals. Thanks to computerized vision, the success of voice recognition devices can be increased with the analysis of human lips in order to determine what the speaker is saying. In this study, lip-reading studies using deep learning methods published between 2017 and 2020 were examined and data sets were introduced. As a result of the study, it is seen that CNN and LSTM architectures are used more intensively in lip-reading studies, hybrid models are preferred more and the success rates are increasing day by day. In this context, it is seen that technologies that can be used in line with the need can be developed by conducting more academic studies on lip reading.

Keywords

Lipreading, Deep Learning, Convolutional Neural Networks, Artificial Neural Networks

References

  1. Adeel, A., Gogate, M., & Hussain, A. (2020). Contextual deep learning-based audio-visual switching for speech enhancement in real-world environments. Information Fusion, 59, 163-170.
  2. Afouras, T., Chung, J. S., & Zisserman, A. (2018). Deep lip reading: a comparison of models and an online application. arXiv preprint arXiv:1806.06053.
  3. Afouras, T., Chung, J. S., & Zisserman, A. (2018). LRS3-TED: a large-scale dataset for visual speech recognition. arXiv preprint arXiv:1809.00496.
  4. Akmese Ö.F., Erbay H., Kör H., (2019). Derin Ögrenme ile Görüntü Kümeleme. In: 5th International Management Information Systems Conference, Ankara.
  5. Alpaydin, E. (2020). Introduction to machine learning. MIT press.
  6. Amanullah, M. A., Habeeb, R. A. A., Nasaruddin, F. H., Gani, A., Ahmed, E., Nainar, A. S. M., ... & Imran, M. (2020). Deep learning and big data technologies for IoT security. Computer Communications, 151, 495-517.
  7. Anina, I., Zhou, Z., Zhao, G., & Pietikäinen, M. (2015, May). Ouluvs2: A multi-view audiovisual database for non-rigid mouth motion analysis. In 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG) (Vol. 1, pp. 1-5). IEEE.
  8. Arı, A., & Hanbay, D. (2019). Tumor detection in MR images of regional convolutional neural networks. Journal of the Faculty of Engineering and Architecture of Gazi University, 34(3), 1395-1408.
  9. Bacciu, D., Micheli, A., & Podda, M. (2020). Edge-based sequential graph generation with recurrent neural networks. Neurocomputing, 416, 177-189.
  10. Bayram, F. (2020). Derin öğrenme tabanlı otomatik plaka tanıma. Politeknik Dergisi, 23(4), 955-960.
APA
Erbey, A., & Barışçı, N. (2022). A Survey on Lip-Reading with Deep Learning. International Journal of Engineering Research and Development, 14(2), 844-860. https://doi.org/10.29137/umagd.1038899
AMA
1.Erbey A, Barışçı N. A Survey on Lip-Reading with Deep Learning. IJERAD. 2022;14(2):844-860. doi:10.29137/umagd.1038899
Chicago
Erbey, Ali, and Necaattin Barışçı. 2022. “A Survey on Lip-Reading With Deep Learning”. International Journal of Engineering Research and Development 14 (2): 844-60. https://doi.org/10.29137/umagd.1038899.
EndNote
Erbey A, Barışçı N (July 1, 2022) A Survey on Lip-Reading with Deep Learning. International Journal of Engineering Research and Development 14 2 844–860.
IEEE
[1]A. Erbey and N. Barışçı, “A Survey on Lip-Reading with Deep Learning”, IJERAD, vol. 14, no. 2, pp. 844–860, July 2022, doi: 10.29137/umagd.1038899.
ISNAD
Erbey, Ali - Barışçı, Necaattin. “A Survey on Lip-Reading With Deep Learning”. International Journal of Engineering Research and Development 14/2 (July 1, 2022): 844-860. https://doi.org/10.29137/umagd.1038899.
JAMA
1.Erbey A, Barışçı N. A Survey on Lip-Reading with Deep Learning. IJERAD. 2022;14:844–860.
MLA
Erbey, Ali, and Necaattin Barışçı. “A Survey on Lip-Reading With Deep Learning”. International Journal of Engineering Research and Development, vol. 14, no. 2, July 2022, pp. 844-60, doi:10.29137/umagd.1038899.
Vancouver
1.Ali Erbey, Necaattin Barışçı. A Survey on Lip-Reading with Deep Learning. IJERAD. 2022 Jul. 1;14(2):844-60. doi:10.29137/umagd.1038899