Lip Reading Using Convolutional Neural Networks with and without Pre-Trained Models
Öz
Lip reading has become a popular topic recently. There is a widespread literature studies on lip reading in human action recognition. Deep learning methods are frequently used in this area. In this paper, lip reading from video data is performed using self designed convolutional neural networks (CNNs). For this purpose, standard and also augmented AvLetters dataset is used train and test stages. To optimize network performance, minibatchsize parameter is also tuned and its effect is investigated. Additionally, experimental studies are performed using AlexNet and GoogleNet pre-trained CNNs. Detailed experimental results are presented.
Anahtar Kelimeler
Kaynakça
- S. Agrawal, V. R. Omprakash, and Ranvijay, “Lip reading techniques: A survey,” in 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), pp. 753–757, July 2016.
- A. Garg, J. Noyola, and S. Bagadia, “Lip reading using CNN and LSTM,” in Technical Report, 2016.
- Y. Li, Y. Takashima, T. Takiguchi, and Y. Ariki, “Lip reading using a dynamic feature of lip images and convolutional neural networks,” in 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), pp. 1–6, June 2016.
- S. Petridis, Z. Li, and M. Pantic, “End-to-end visual speech recognition with LSTMs,” CoRR, vol. abs/1701.05847, 2017.
- Y. Takashima, Y. Kakihara, R. Aihara, T. Takiguchi, Y. Ariki, N. Mitani, K. Omori, and K. Nakazono, “Audio-visual speech recognition using convolutive bottleneck networks for a person with severe hearing loss,” IPSJ Transactions on Computer Vision and Applications, vol. 7, pp. 64–68, 2015.
- A. Yargic and M. Dogan, “A lip reading application on MS Kinect camera,” in 2013 IEEE INISTA, pp. 1–5, June 2013.
- A. Rekik, A. Ben-Hamadou, and W. Mahdi, “A new visual speech recognition approach for RGB-D cameras,” in Image Analysis and Recognition (A. Campilho and M. Kamel, eds.), (Cham), pp. 21–28, Springer International Publishing, 2014.
- A. Rekik, A. Ben-Hamadou, andW. Mahdi, “Human machine interaction via visual speech spotting,” in Advanced Concepts for Intelligent Vision Systems (S. Battiato, J. Blanc-Talon, G. Gallo, W. Philips, D. Popescu, and P. Scheunders, eds.), (Cham), pp. 566–574, Springer International Publishing, 2015.
Ayrıntılar
Birincil Dil
İngilizce
Konular
Elektrik Mühendisliği
Bölüm
Araştırma Makalesi
Yayımlanma Tarihi
30 Nisan 2019
Gönderilme Tarihi
7 Kasım 2018
Kabul Tarihi
3 Nisan 2019
Yayımlandığı Sayı
Yıl 2019 Cilt: 7 Sayı: 2
Cited By
Transfer learning-based convolutional neural networks with heuristic optimization for hand gesture recognition
Neural Computing and Applications
https://doi.org/10.1007/s00521-019-04427-yHuman action recognition with deep learning and structural optimization using a hybrid heuristic algorithm
Cluster Computing
https://doi.org/10.1007/s10586-020-03050-0A new composite approach for COVID-19 detection in X-ray images using deep features
Applied Soft Computing
https://doi.org/10.1016/j.asoc.2021.107669Performance Improvement Of Pre-trained Convolutional Neural Networks For Action Recognition
The Computer Journal
https://doi.org/10.1093/comjnl/bxaa029DERİN ÖĞRENME KULLANILARAK OPTİMUM JPEG KALİTE FAKTÖRÜNÜN BELİRLENMESİ
Mühendislik Bilimleri ve Tasarım Dergisi
https://doi.org/10.21923/jesd.698719Visual speech recognition for small scale dataset using VGG16 convolution neural network
Multimedia Tools and Applications
https://doi.org/10.1007/s11042-021-11119-0Static facial expression recognition using convolutional neural networks based on transfer learning and hyperparameter optimization
Multimedia Tools and Applications
https://doi.org/10.1007/s11042-020-09268-9ERUSLR: yeni bir Türkçe işaret dili veri seti ve hiperparametre optimizasyonu destekli evrişimli sinir ağı ile tanınması
Gazi Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi
https://doi.org/10.17341/gazimmfd.746793Entwicklung und Evaluation eines Deep-Learning-Algorithmus für die Worterkennung aus Lippenbewegungen für die deutsche Sprache
HNO
https://doi.org/10.1007/s00106-021-01143-9Visual Analysis of College Sports Performance Based on Multimodal Knowledge Graph Optimization Neural Network
Computational Intelligence and Neuroscience
https://doi.org/10.1155/2022/5398932Derin Öğrenme ile Dudak Okuma Üzerine Detaylı Bir Araştırma
Uluslararası Muhendislik Arastirma ve Gelistirme Dergisi
https://doi.org/10.29137/umagd.1038899LIP READING USING CNN FOR TURKISH NUMBERS
Journal of Business in The Digital Age
https://doi.org/10.46238/jobda.1100903Visual Speech Recognition for Kannada Language Using VGG16 Convolutional Neural Network
Acoustics
https://doi.org/10.3390/acoustics5010020A novel facial expression recognition algorithm using geometry β –skeleton in fusion based on deep CNN
Image and Vision Computing
https://doi.org/10.1016/j.imavis.2023.104677Survey on Silentinterpreter : Analysis of Lip Movement and Extracting Speech using Deep Learning
International Journal of Scientific Research in Science, Engineering and Technology
https://doi.org/10.32628/IJSRSET2411219Lip Reading Using Various Deep Learning Models with Visual Turkish Data
Gazi University Journal of Science
https://doi.org/10.35378/gujs.1239207Approaches of Lip Reading to Assist Sign Language Understanding
Journal of Korea Robotics Society
https://doi.org/10.7746/jkros.2024.19.4.319