OTOMATİK METİN ÖZETLEME İÇİN GENETİK ALGORİTMA TABANLI CÜMLE ÇIKARIMI
Year 2017,
Volume: 3 Issue: 2, 62 - 75, 20.12.2017
Oğuz Kaynar
,
Yunus Emre Işık
,
Yasin Görmez
,
Ferhan Demirkoparan
Abstract
İnternetin gelişmesiyle beraber dijital ortamda bulunan veri miktarı
sürekli artış göstermektedir. Özellikle web 2.0 teknolojisiyle birlikte
wikipedia, blog, sosyal medya gibi, kullanıcıların yeni içerik ekleyebildiği
sitelerin artması sonucunda internet ortamındaki bilgi miktarının hem sayısı
hem de büyüklüğü sürekli artarak devasa boyutlara ulaşmıştır. Verilerin bu
kadar çok olduğu bir ortamda istenilen bilgiye ulaşmak ciddi bir
problemdir. Günümüz bilgi çağı, aranan
bilgiye daha çabuk ve hızlı erişmek için otomatik metin özetleme sitemlerinin
bilgi çıkarımı ile ilgili birçok alanda kullanımını zorunlu hale
getirmektedir. Bu çalışmada cümle
çıkarımına dayalı metin özetleme yöntemleri ele alınmış, ilk olarak doküman
içerisinde yer alan cümleleri temsil edecek öznitelikler çıkarılmış, ardından bu
özniteliklerin özet
oluşturmadaki etkinliği genetik algoritma yardımıyla belirlenmeye
çalışılmıştır. Çalışmada kullanılan veri
seti Türkçe haber metinleri ve bunların özetlerini içeren 120 dokumandan
oluşmaktadır. 80 adet dokuman genetik
algoritma yardımıyla eğitilerek,
özniteliklere ilişkin en iyi ağırlık değerleri belirlenmiş, daha sonra
bu ağırlıklar yardımıyla 40 adet test dokümanı özetlenmiş ve sonuçlar orijinal
özetlerle karşılaştırılmıştır.
References
- Babar, S. A., & Patil, P. D. (2015). Improving Performance of Text Summarization. Procedia Computer Science, 46, 354-363.
- Binwahlan, M. S., Salim, N., & Suanmali, L. (2009, April). Swarm based text summarization. In Computer Science and Information Technology-Spring Conference, 2009. IACSITSC'09. International Association of (pp. 145-150). IEEE.
- Brandow, R., Mitze, K., & Rau, L. F. (1995). Automatic condensation of electronic publications by sentence selection. Information Processing & Management, 31(5), 675-685.
- Cigir, C., Kutlu, M., & Cicekli, I. (2009, September). Generic text summarization for Turkish. In Computer and Information Sciences, 2009. ISCIS 2009. 24th International Symposium on (pp. 224-229). IEEE.
- Dalal, V., & Malik, L. G. (2013, December). A survey of extractive and abstractive text summarization techniques. In Emerging Trends in Engineering and Technology (ICETET), 2013 6th International Conference on (pp. 109-110). IEEE.
Document understanding conferences (DUC) < http://www-nlpir.nist.gov/projects/duc/index.html >
- Edmondson, H. P. (1969). New Methods in Automatic Extraction. Journal of the Association for Computing Machinery, vol. 16, no. 2, pp. 264–285, 1969.
- Edmundson, H. P., & Wyllys, R. E. (1961). Automatic abstracting and indexing—survey and recommendations. Communications of the ACM, 4(5), 226-234.
- Fattah, M. A., & Ren, F. (2008). Automatic text summarization. Gas, 692, 10785.
Gholamrezazadeh, S., Salehi, M. A., & Gholamzadeh, B. (2009). A comprehensive survey on text summarization systems. Proceedings of CSA, 9, 1-6.
- Gupta, V., & Lehal, G. S. (2010). A survey of text summarization extractive techniques. Journal of emerging technologies in web intelligence, 2(3), 258-268.
Holland, John H. (1975) Adaptation in natural and artificial systems. An introductory analysis with application to biology, control, and artificial intelligence. Ann Arbor, MI: University of Michigan Press
- Khan, A., & Salim, N. (2014). A review on abstractive summarization methods. Journal of Theoretical and Applied Information Technology, 59(1), 64-72.
- Kupiec, J., Pedersen, J., & Chen, F. (1995, July). A trainable document summarizer. In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 68-73). ACM.
- Kutlu, M., Cıǧır, C., & Cicekli, I. (2010). Generic text summarization for Turkish. The Computer Journal, bxp124.
- Ledeneva, Y., Gelbukh, A., & García-Hernández, R. A. (2008, February). Terms derived from frequent sequences for extractive text summarization. In International Conference on Intelligent Text Processing and Computational Linguistics (pp. 593-604). Springer Berlin Heidelberg.
- Lee, J. H., Park, S., Ahn, C. M., & Kim, D. (2009). Automatic generic document summarization based on non-negative matrix factorization. Information Processing & Management, 45(1), 20-34.
- Lin, C. Y. (1999, November). Training a selection function for extraction. In Proceedings of the eighth international conference on Information and knowledge management (pp. 55-62). ACM.
- Lin, C. Y. (2004, July). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out: Proceedings of the ACL-04 workshop (Vol. 8).
Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of research and development, 2(2), 159-165.
Ozsoy, M. G., Cicekli, I., & Alpaslan, F. N. (2010, August). Text summarization of turkish texts using latent semantic analysis. In Proceedings of the 23rd international conference on computational linguistics (pp. 869-876). Association for Computational Linguistics.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 513-523.
- Suanmali, L., Salim, N., & Binwahlan, M. S. (2011). Genetic algorithm based sentence extraction for text summarization. International Journal of Innovative Computing, 1(1).
Torres-Moreno, J. M. (2014). Automatic text summarization. John Wiley & Sons.
Uy, N. Q., Anh, P. T., Doan, T. C., & Hoai, N. X. (2012, August). A study on the use of genetic programming for automatic text summarization. In Knowledge and Systems Engineering (KSE), 2012 Fourth International Conference on (pp. 93-98). IEEE.
GENETIC ALGORITHM BASED SENTENCE EXTRACTION FOR AUTOMATIC TEXT SUMMARIZATION
Year 2017,
Volume: 3 Issue: 2, 62 - 75, 20.12.2017
Oğuz Kaynar
,
Yunus Emre Işık
,
Yasin Görmez
,
Ferhan Demirkoparan
Abstract
With the development of the Internet, the amount of
data in the digital environment is continuously increasing. Especially with web
2.0 technology, as a result of sites which users are able to add new content
such as wikipedia, blogs and social media sites, the amount of information on
the internet is increasing both in number and size. Accessing the required
information in a medium where there are so many data is a serious problem.
Today’s information age make it necessary to use automatic text summarization
systems in many areas about information retrieval in order to access the
searched information. In this study, text summarization methods based on
sentence extraction are discussed, firstly features to represent sentences in
document is extracted and then the effectiveness of these attributes on
summarization is tried to be determined by using genetic algorithm. The data
set used in the study consists of 120 documents containing Turkish news texts
and their summaries. 80 documents are trained with the help of genetic
algorithm and the best weight values for the attributes are determined, then 40
test documents are summarized with these weights and the results are compared
with the original summaries.
References
- Babar, S. A., & Patil, P. D. (2015). Improving Performance of Text Summarization. Procedia Computer Science, 46, 354-363.
- Binwahlan, M. S., Salim, N., & Suanmali, L. (2009, April). Swarm based text summarization. In Computer Science and Information Technology-Spring Conference, 2009. IACSITSC'09. International Association of (pp. 145-150). IEEE.
- Brandow, R., Mitze, K., & Rau, L. F. (1995). Automatic condensation of electronic publications by sentence selection. Information Processing & Management, 31(5), 675-685.
- Cigir, C., Kutlu, M., & Cicekli, I. (2009, September). Generic text summarization for Turkish. In Computer and Information Sciences, 2009. ISCIS 2009. 24th International Symposium on (pp. 224-229). IEEE.
- Dalal, V., & Malik, L. G. (2013, December). A survey of extractive and abstractive text summarization techniques. In Emerging Trends in Engineering and Technology (ICETET), 2013 6th International Conference on (pp. 109-110). IEEE.
Document understanding conferences (DUC) < http://www-nlpir.nist.gov/projects/duc/index.html >
- Edmondson, H. P. (1969). New Methods in Automatic Extraction. Journal of the Association for Computing Machinery, vol. 16, no. 2, pp. 264–285, 1969.
- Edmundson, H. P., & Wyllys, R. E. (1961). Automatic abstracting and indexing—survey and recommendations. Communications of the ACM, 4(5), 226-234.
- Fattah, M. A., & Ren, F. (2008). Automatic text summarization. Gas, 692, 10785.
Gholamrezazadeh, S., Salehi, M. A., & Gholamzadeh, B. (2009). A comprehensive survey on text summarization systems. Proceedings of CSA, 9, 1-6.
- Gupta, V., & Lehal, G. S. (2010). A survey of text summarization extractive techniques. Journal of emerging technologies in web intelligence, 2(3), 258-268.
Holland, John H. (1975) Adaptation in natural and artificial systems. An introductory analysis with application to biology, control, and artificial intelligence. Ann Arbor, MI: University of Michigan Press
- Khan, A., & Salim, N. (2014). A review on abstractive summarization methods. Journal of Theoretical and Applied Information Technology, 59(1), 64-72.
- Kupiec, J., Pedersen, J., & Chen, F. (1995, July). A trainable document summarizer. In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 68-73). ACM.
- Kutlu, M., Cıǧır, C., & Cicekli, I. (2010). Generic text summarization for Turkish. The Computer Journal, bxp124.
- Ledeneva, Y., Gelbukh, A., & García-Hernández, R. A. (2008, February). Terms derived from frequent sequences for extractive text summarization. In International Conference on Intelligent Text Processing and Computational Linguistics (pp. 593-604). Springer Berlin Heidelberg.
- Lee, J. H., Park, S., Ahn, C. M., & Kim, D. (2009). Automatic generic document summarization based on non-negative matrix factorization. Information Processing & Management, 45(1), 20-34.
- Lin, C. Y. (1999, November). Training a selection function for extraction. In Proceedings of the eighth international conference on Information and knowledge management (pp. 55-62). ACM.
- Lin, C. Y. (2004, July). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out: Proceedings of the ACL-04 workshop (Vol. 8).
Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of research and development, 2(2), 159-165.
Ozsoy, M. G., Cicekli, I., & Alpaslan, F. N. (2010, August). Text summarization of turkish texts using latent semantic analysis. In Proceedings of the 23rd international conference on computational linguistics (pp. 869-876). Association for Computational Linguistics.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 513-523.
- Suanmali, L., Salim, N., & Binwahlan, M. S. (2011). Genetic algorithm based sentence extraction for text summarization. International Journal of Innovative Computing, 1(1).
Torres-Moreno, J. M. (2014). Automatic text summarization. John Wiley & Sons.
Uy, N. Q., Anh, P. T., Doan, T. C., & Hoai, N. X. (2012, August). A study on the use of genetic programming for automatic text summarization. In Knowledge and Systems Engineering (KSE), 2012 Fourth International Conference on (pp. 93-98). IEEE.