WEB TABANLI OTOMATİK DİL TANIMA VE ÇEVİRME SİSTEMİ
Yıl 2010,
Cilt: 25 Sayı: 3, 0 - , 19.02.2013
Uraz Yavanoğlu
Şeref Sağıroğlu
Öz
Bu çalışmada, internet ortamında bulunan MS Word ve HTML sayısal belgelerinin dilini tanımak ve sunulan bilgilerin içeriğini farklı dillere çevirmek için bir sistem geliştirilmiştir. Dil tanıma sorunu aslında daha genel bir problem olan özniteliklerin sınıflandırılması olarak gözükmektedir. Geliştirilen sistem farklı işlemlerin yapılmasını kolaylaştırmak için kullanıcıların içerik dillerini hiç bilmedikleri internetteki sayısal dokümanların içeriklerinin yapay sinir ağları temelli zeki bir çözüm ile otomatik tespiti ve istenilen 40 dile otomatik olarak çevirebilecekleri sistemden oluşmaktadır. Yapılan testlerde 15 dil için dokümanlar kullanılmıştır. Bu testlerde sistemin gerçek zamanlı olarak çalışma başarısının beklenenin üzerinde olduğu gösterilmiştir. Bu çalışmanın, internet içeriklerinin daha etkin olarak kullanılmasını sağlaması beklenmektedir.
Kaynakça
- Padro M., Padro L., “Comparing Methods for Language Identification” Procesamiento del Lenguaje Natural, Barcelona, 33-35 (2004).
- Botha G.R., Zimu V.Z., Barnard E., “Text- based language identification for the South Africanlanguages”, SAIEE Africa Research Journal, Cape Town, 141-146 (2007).
- El-Shishiny H., Troussov A., McCloskey DJ., Takeuchi M., Nevidomsky A., Volkov P., “Word Fragments Based Arabic Language Identification”, NEMLAR Conference on Arabic
- Language Resources and Tools, Mısır, 23-26 (2004).
- Kruengkrai C., Srichaivattana P., Sornlertlamvanich V., Isahara H., "Language Identification Based on String Kernels" Communications and Information Technology, Pekin, 896-899 (2005).
- Zavarsky P., Wada S., Mikami Y.,”Language and Encoding Scheme Identification of Extremely Large Sets of Multilingual Text Documents”, The 10th Machine Translation Summit, Puket, 354-355 (2005).
- Peng F., Schuurmans D., Wang S.,"Language and Task Independent Text Categorization with Simple Language Models", North American Chapter of the Association for Computational Linguistics - Human Language Technologies, Edmonton, 110-117 (2003).
- Nair A.S., Nair V. V., Chandra V. S. S., "Hidden Markov Model Based Identification of Transliterated Regional Language Words in Text Documents", Twentieth International Joint Conference on Artificial Intelligence, Haydarabad, 87-91 (2007).
- Ahmed B., Cha S-H.,, Tappert C.,"Language Identification from Text Using N-gram Based Cumulative Frequency Addition", Student/Faculty Research Day, New York, 121-128 (2004).
- Constable P.G., "Toward a Model for Language Identification", Summer Institute of Linguistics International Working Papers”, Dublin (2002).
- Adams G., Resnik P., "A Language Identification Application Built on the Java Client/Server Platform", The European Chapter of the Association of Computaional Linguistics Workshop, İspanya (1997).
- Ölveck T., "N-Gram based Statistics Aimed. at Language Identification", Student Research Conference in Informatics and Information Technologies, Brastilava, 1-7 (2005).
- Bilcu, E.B., Astola J., “A Hybrid Neural Network for Language Identification from Text”, Machine. Learning for Signal Processing Conference, Maynooth, 253-258 (2006).
- Liu Y-H., Chang F., Lin C-C., "Language Identification of Character Images Using Machine Learning Techniques", International Conference on Document Analysis and Recognition, Seul, 630-634 (2005).
- Zhu G., Yu X.,Li Y., Doermann D., "Unconstrained Language Identification Using A Shape Codebook", The 11th International Conference on Frontiers in Handwritting Recognition, Montreal, 13-18 (2008).
- Baykan E., Henzinger M., Weber I., "Web Page Language Identification Based on URLs", International Conference on Very Large Data Bases, Auckland, 176-187 (2008).
- Sağıroğlu, Ş., Beşdok, E., Erler, M., “Mühendislikte Yapay Zeka Uygulamaları-1:Yapay Sinir Ağları”, Ufuk Kitabevi, Kayseri, 10-100 (2003).
- Sağıroğlu Ş., Yavanoğlu U., Güven E.N., “Web Based Machine Learning for Language Identification and Translation”, International Conference on Machine Learning and Applications, Ohio, 280-285 (2007).
- Aldrich, A., “R. A. Fisher on Bayes and Bayes'Theorem”, Bayesian Analysis, 3, No. 1, pp.161–170, (2008)
- İnternet: Google Yazılım “Web Tabanlı Dil Çeviri Aracı Web Sayfası” http://translate.google.com/ (2008)
- İnternet: Microsoft Yazılım “Visual Studio 2005 C# Windows Form Uygulaması Yazılım Geliştirme Aracı”, http://msdn.microsoft.com/en-us/vstudio/default.aspx (2005).
- İnternet: Mathworks Yazılım “Matlab R2007B Deployment Tool, Dinamik bağlantı Kütüphanesi Geliştirme Aracı”, http://www.mathworks.com/ products/new_products/ release2007b.html (2007).
- İnternet: Cellbi Yazılım “Microsoft Word OLE Otomasyon Bileşeni” http://www.cellbi.com/products/docframework.aspx (2008).
- Takçı H., Soğukpınar İ. “Letter Based Text Scoring Method for Language Identification”, Springer Lecture Notes in Computer Science, Vol. 3261/2005 283-290 (2004).
Yıl 2010,
Cilt: 25 Sayı: 3, 0 - , 19.02.2013
Uraz Yavanoğlu
Şeref Sağıroğlu
Kaynakça
- Padro M., Padro L., “Comparing Methods for Language Identification” Procesamiento del Lenguaje Natural, Barcelona, 33-35 (2004).
- Botha G.R., Zimu V.Z., Barnard E., “Text- based language identification for the South Africanlanguages”, SAIEE Africa Research Journal, Cape Town, 141-146 (2007).
- El-Shishiny H., Troussov A., McCloskey DJ., Takeuchi M., Nevidomsky A., Volkov P., “Word Fragments Based Arabic Language Identification”, NEMLAR Conference on Arabic
- Language Resources and Tools, Mısır, 23-26 (2004).
- Kruengkrai C., Srichaivattana P., Sornlertlamvanich V., Isahara H., "Language Identification Based on String Kernels" Communications and Information Technology, Pekin, 896-899 (2005).
- Zavarsky P., Wada S., Mikami Y.,”Language and Encoding Scheme Identification of Extremely Large Sets of Multilingual Text Documents”, The 10th Machine Translation Summit, Puket, 354-355 (2005).
- Peng F., Schuurmans D., Wang S.,"Language and Task Independent Text Categorization with Simple Language Models", North American Chapter of the Association for Computational Linguistics - Human Language Technologies, Edmonton, 110-117 (2003).
- Nair A.S., Nair V. V., Chandra V. S. S., "Hidden Markov Model Based Identification of Transliterated Regional Language Words in Text Documents", Twentieth International Joint Conference on Artificial Intelligence, Haydarabad, 87-91 (2007).
- Ahmed B., Cha S-H.,, Tappert C.,"Language Identification from Text Using N-gram Based Cumulative Frequency Addition", Student/Faculty Research Day, New York, 121-128 (2004).
- Constable P.G., "Toward a Model for Language Identification", Summer Institute of Linguistics International Working Papers”, Dublin (2002).
- Adams G., Resnik P., "A Language Identification Application Built on the Java Client/Server Platform", The European Chapter of the Association of Computaional Linguistics Workshop, İspanya (1997).
- Ölveck T., "N-Gram based Statistics Aimed. at Language Identification", Student Research Conference in Informatics and Information Technologies, Brastilava, 1-7 (2005).
- Bilcu, E.B., Astola J., “A Hybrid Neural Network for Language Identification from Text”, Machine. Learning for Signal Processing Conference, Maynooth, 253-258 (2006).
- Liu Y-H., Chang F., Lin C-C., "Language Identification of Character Images Using Machine Learning Techniques", International Conference on Document Analysis and Recognition, Seul, 630-634 (2005).
- Zhu G., Yu X.,Li Y., Doermann D., "Unconstrained Language Identification Using A Shape Codebook", The 11th International Conference on Frontiers in Handwritting Recognition, Montreal, 13-18 (2008).
- Baykan E., Henzinger M., Weber I., "Web Page Language Identification Based on URLs", International Conference on Very Large Data Bases, Auckland, 176-187 (2008).
- Sağıroğlu, Ş., Beşdok, E., Erler, M., “Mühendislikte Yapay Zeka Uygulamaları-1:Yapay Sinir Ağları”, Ufuk Kitabevi, Kayseri, 10-100 (2003).
- Sağıroğlu Ş., Yavanoğlu U., Güven E.N., “Web Based Machine Learning for Language Identification and Translation”, International Conference on Machine Learning and Applications, Ohio, 280-285 (2007).
- Aldrich, A., “R. A. Fisher on Bayes and Bayes'Theorem”, Bayesian Analysis, 3, No. 1, pp.161–170, (2008)
- İnternet: Google Yazılım “Web Tabanlı Dil Çeviri Aracı Web Sayfası” http://translate.google.com/ (2008)
- İnternet: Microsoft Yazılım “Visual Studio 2005 C# Windows Form Uygulaması Yazılım Geliştirme Aracı”, http://msdn.microsoft.com/en-us/vstudio/default.aspx (2005).
- İnternet: Mathworks Yazılım “Matlab R2007B Deployment Tool, Dinamik bağlantı Kütüphanesi Geliştirme Aracı”, http://www.mathworks.com/ products/new_products/ release2007b.html (2007).
- İnternet: Cellbi Yazılım “Microsoft Word OLE Otomasyon Bileşeni” http://www.cellbi.com/products/docframework.aspx (2008).
- Takçı H., Soğukpınar İ. “Letter Based Text Scoring Method for Language Identification”, Springer Lecture Notes in Computer Science, Vol. 3261/2005 283-290 (2004).