A Product Search Engine Supporting “Best Product” Queries
Year 2016,
Volume: 31 Issue: ÖS2, 259 - 270, 15.10.2016
Furkan Gözükara
,
Selma Ayşe Özel
Abstract
In this study, a novel product search engine system which supports “find the best products for a given category” type queries is proposed. The product search engine system consists of a focused crawler, a record linkage system, a sentiment analyzer, and a query engine system. The focused crawler is used to crawl product information from various e-commerce sites; the record linkage system determines the identical products that are crawled from different e-commerce sites; the sentiment analyzer classifies users’ reviews about the products as positive or negative so that our product search engine can decide which product is the best for a given category; and the query engine takes the user queries and displays the result. All implementations are done by using C# programming language in .NET 4.5 framework, and MS-SQL Server 2014 database management system is employed for data storage. The core of our system is the record linkage part which is based on a modified incremental Hierarchical Agglomerative Clustering algorithm. To improve the success of record linkage process we also develop a product code matching system such that if the two products from different e-commerce sites have the same product code they are considered as the same. In our experimental analysis we observe 96.25% F-measure in record linkage of E-commerce products and 100% precision in most related products search. Our system can successfully offer best products for a given category. The proposed system achieves to provide better user experience than the existing systems.
References
- 1. Chen, S.C., Dhillon, G.S., 2003. Interpreting Dimensions of Consumer Trust in E-Commerce, Information Technology and Management, 4:2-3, p. 303-318.
- 2. http://bkm.com.tr/en/internetten-yapilan-kartli-odeme-islemleri. Accessed, 23.05.2016.
- 3. Pathak, B., 2010. A Survey of the Comparison Shopping Agent-Based Decision Support Systems, Journal of Electronic Commerce Research, 11:3, p. 178-192.
- 4. Krulwich, B., 1996. Information Integration Agents: Bargainfinder and Newsfinder, In Internet-Based Information Systems: Papers From The 1996 AAAI Workshop, Vol. 79.
- 5. Guttman, R., Moukas, A., Maes, P., 1999. Agents as Mediators in Electronic Commerce, In Intelligent Information Agents, 131-152.
- 6. Sadeddin, K.W., Serenko, A., Hayes, J., 2007. Online Shopping Bots for Electronic Commerce: The Comparison of Functionality and Performance, International Journal of Electronic Business, 5:6, p. 576-589.
- 7. Chakrabarti, S., Van Den Berg, M., Dom, B., 1999. Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery, Computer Networks, 31:11, p. 1623-1640.
- 8. Gupta, S.B., 2012. The Issues and Challenges with the Web Crawlers, International Journal of Information Technology & Systems, 1:1, p. 1-10.
- 9. Heydon, A., Najork, M., 1999. Mercator: A Scalable, Extensible Web Crawler, World Wide Web, 2:4, p. 219-229.
- 10. Shkapenyuk, V., Suel, T., 2002. Design and Implementation of a High-Performance Distributed Web Crawler, In Data Engineering, 2002. Proceedings. 18th International Conference, 357-368.
- 11. Boldi, P., Codenotti, B., Santini, M., Vigna, S., 2004. Ubicrawler: A Scalable Fully Distributed Web Crawler, Software: Practice and Experience, 34:8, p. 711-726.
- 12. Gomes, D., Silva, M.J., 2008. The Viúva Negra Crawler: An Experience Report, Software: Practice & Experience, 38:2, p. 161-188.
- 13. Yohanes, B.W., Handoko, H., Wardana, H.K., 2013. Focused Crawler Optimization Using Genetic Algorithm. TELKOMNIKA (Telecommunication Computing Electronics and Control), 9:3, p. 403-410.
- 14. Liu, H., Milios, E., 2012. Probabilistic Models for Focused Web Crawling, Computational Intelligence, 28:3, p. 289-328.
- 15. Liu, B., Zhang, L., 2012. A Survey of Opinion Mining and Sentiment Analysis, In Mining Text Data, 415-463.
- 16. Pang, B., Lee, L., 2008. Opinion Mining and Sentiment Analysis, Foundations and Trends In Information Retrieval, 2:1-2, p. 1-135.
- 17. Vinodhini, G., Chandrasekaran, R.M. 2012. Sentiment Analysis and Opinion Mining: A Survey, International Journal, 2:6, p. 282-292.
- 18. Fellegi, I.P., Sunter, A.B. 1969. A Theory for Record Linkage, Journal of the American Statistical Association, 64:328, p. 1183-1210.
- 19. Tejada, S., Knoblock, C.A., Minton, S., 2002. Learning Domain-Independent String Transformation Weights for High Accuracy Object Identification, In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 350-359.
- 20. Jin, L., Li, C., Mehrotra, S., 2003. Efficient Record Linkage in Large Data Sets, In Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings. Eighth International Conference, 137-146.
- 21. Yan, S., Lee, D., Kan, M.Y., Giles, L.C., 2007. Adaptive Sorted Neighborhood Methods for Efficient Record Linkage, In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, 185-194.
- 22. Köpcke, H., Rahm, E., 2010. Frameworks for Entity Matching: A Comparison, Data & Knowledge Engineering, 69:2, p. 197-210.
- 23. Christen, P., 2012. A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication, IEEE Transactions on Knowledge and Data Engineering, 24:9, p. 1537-1555.
- 24. Gözükara, F., Özel, S.A., An Incremental Hierarchical Clustering System for Record Linkage in E-Commerce Domain, In The Publishing Process.
“En İyi Ürün” Sorgularını Destekleyen Bir Ürün Arama Motoru
Year 2016,
Volume: 31 Issue: ÖS2, 259 - 270, 15.10.2016
Furkan Gözükara
,
Selma Ayşe Özel
Abstract
Bu çalışmada, “verilen bir kategori için en iyi ürünleri bul” tarzındaki sorguları destekleyen özgün bir ürün arama motoru sistemi önerilmektedir. Geliştirilen ürün arama motoru sistemi, bir odaklı tarayıcı, bir kayıt eşleştirme sistemi, bir duygu analizi sistemi ve bir sorgu motoru sisteminden oluşmaktadır. Odaklı tarayıcı sistemi çeşitli e-ticaret sitelerindeki ürün bilgilerini elde etmek için kullanılmaktadır; kayıt eşleştirme sistemi farklı e-ticaret sitelerinde satılan aynı ürünleri tespit etmektedir; duygu analizi sistemi ürünlere yapılan kullanıcı yorumlarını olumlu veya olumsuz olarak sınıflandırmakta ve bu sınıflandırma hangi ürünlerin aranan kategori için en iyi ürün olduğunu belirlemek için kullanılmaktadır ve sorgu motoru ise kullanıcıların sorgusunu alıp kullanıcılara sonuçları göstermektedir. Bütün sistem C# programlama dilinde .NET 4.5 alt yapısı kullanılarak geliştirilmiştir ve veri depolamak için MS-SQL 2014 veri tabanı yönetim sistemi kullanılmıştır. Önerilen sistemin temelini artımsal olacak şekilde değiştirilmiş Hiyerarşik Aglomeratif Kümeleme algoritmasına dayanan kayıt eşleştirme sistemi oluşturmaktadır. Kayıt eşleştirme işleminin başarısını artırmak için ürün kodu eşleştirme sistemi geliştirilmiştir. Bu sistem farklı e-ticaret sitelerinde satılan ismi farklı olarak yazılmış ancak aynı ürün koduna sahip ürünleri tespit edebilmektedir. Deneysel analizlerimiz sonucunda, e-ticaret ürünlerinin kayıt eşleştirmesinde %96,25 F-ölçeği ve en alakalı ürünler aramasında %100 kesinlik elde edilmiştir. Geliştirilen sistem verilen kategori içinde en iyi ürünleri başarılı bir şekilde kullanıcıya sunabilmektedir. Önerilen sistem mevcut sistemlere göre daha başarılı bir kullanıcı deneyimi sunabilmektedir.
References
- 1. Chen, S.C., Dhillon, G.S., 2003. Interpreting Dimensions of Consumer Trust in E-Commerce, Information Technology and Management, 4:2-3, p. 303-318.
- 2. http://bkm.com.tr/en/internetten-yapilan-kartli-odeme-islemleri. Accessed, 23.05.2016.
- 3. Pathak, B., 2010. A Survey of the Comparison Shopping Agent-Based Decision Support Systems, Journal of Electronic Commerce Research, 11:3, p. 178-192.
- 4. Krulwich, B., 1996. Information Integration Agents: Bargainfinder and Newsfinder, In Internet-Based Information Systems: Papers From The 1996 AAAI Workshop, Vol. 79.
- 5. Guttman, R., Moukas, A., Maes, P., 1999. Agents as Mediators in Electronic Commerce, In Intelligent Information Agents, 131-152.
- 6. Sadeddin, K.W., Serenko, A., Hayes, J., 2007. Online Shopping Bots for Electronic Commerce: The Comparison of Functionality and Performance, International Journal of Electronic Business, 5:6, p. 576-589.
- 7. Chakrabarti, S., Van Den Berg, M., Dom, B., 1999. Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery, Computer Networks, 31:11, p. 1623-1640.
- 8. Gupta, S.B., 2012. The Issues and Challenges with the Web Crawlers, International Journal of Information Technology & Systems, 1:1, p. 1-10.
- 9. Heydon, A., Najork, M., 1999. Mercator: A Scalable, Extensible Web Crawler, World Wide Web, 2:4, p. 219-229.
- 10. Shkapenyuk, V., Suel, T., 2002. Design and Implementation of a High-Performance Distributed Web Crawler, In Data Engineering, 2002. Proceedings. 18th International Conference, 357-368.
- 11. Boldi, P., Codenotti, B., Santini, M., Vigna, S., 2004. Ubicrawler: A Scalable Fully Distributed Web Crawler, Software: Practice and Experience, 34:8, p. 711-726.
- 12. Gomes, D., Silva, M.J., 2008. The Viúva Negra Crawler: An Experience Report, Software: Practice & Experience, 38:2, p. 161-188.
- 13. Yohanes, B.W., Handoko, H., Wardana, H.K., 2013. Focused Crawler Optimization Using Genetic Algorithm. TELKOMNIKA (Telecommunication Computing Electronics and Control), 9:3, p. 403-410.
- 14. Liu, H., Milios, E., 2012. Probabilistic Models for Focused Web Crawling, Computational Intelligence, 28:3, p. 289-328.
- 15. Liu, B., Zhang, L., 2012. A Survey of Opinion Mining and Sentiment Analysis, In Mining Text Data, 415-463.
- 16. Pang, B., Lee, L., 2008. Opinion Mining and Sentiment Analysis, Foundations and Trends In Information Retrieval, 2:1-2, p. 1-135.
- 17. Vinodhini, G., Chandrasekaran, R.M. 2012. Sentiment Analysis and Opinion Mining: A Survey, International Journal, 2:6, p. 282-292.
- 18. Fellegi, I.P., Sunter, A.B. 1969. A Theory for Record Linkage, Journal of the American Statistical Association, 64:328, p. 1183-1210.
- 19. Tejada, S., Knoblock, C.A., Minton, S., 2002. Learning Domain-Independent String Transformation Weights for High Accuracy Object Identification, In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 350-359.
- 20. Jin, L., Li, C., Mehrotra, S., 2003. Efficient Record Linkage in Large Data Sets, In Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings. Eighth International Conference, 137-146.
- 21. Yan, S., Lee, D., Kan, M.Y., Giles, L.C., 2007. Adaptive Sorted Neighborhood Methods for Efficient Record Linkage, In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, 185-194.
- 22. Köpcke, H., Rahm, E., 2010. Frameworks for Entity Matching: A Comparison, Data & Knowledge Engineering, 69:2, p. 197-210.
- 23. Christen, P., 2012. A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication, IEEE Transactions on Knowledge and Data Engineering, 24:9, p. 1537-1555.
- 24. Gözükara, F., Özel, S.A., An Incremental Hierarchical Clustering System for Record Linkage in E-Commerce Domain, In The Publishing Process.