The Effect of Outlier Detection Methods in Real Estate Valuation with Machine Learning

Cihan Çılgın; Yılmaz Gökşen; Hadi Gökçen

doi:10.47899/ijss.1270433

EN TR

The Effect of Outlier Detection Methods in Real Estate Valuation with Machine Learning

Öz

For those who invest in real estate as an investment tool, as well as those who buy and sell real estate, the price of real estate should be predicted realistically and with the highest accuracy. It should be noted that the predict model should be the most appropriate representation of the underlying fundamentals of the market. Otherwise, the mistake to be made in the real estate valuation will cause some undesirable results such as inconsistent and unhealthy increase or decrease of the property tax, excessive gains or losses in favor of some groups, and adverse effects on investors and potential real estate owners. At this point, data-driven real estate valuation approaches are preferred more frequently to create highly accurate and unbiased estimates. However, the consistency, precision and accuracy of the models realized with machine learning approaches are directly related to the data quality. At this point, the effects of outlier detection on prediction performance in real estate valuation are investigated with a large data set obtained in this study. For this purpose, a heterogeneous data set with 70.771 real estate data and 283 variables, 4 different outlier detection methods were tested with 3 different machine learning approaches. The empirical findings reveal that the use of different outlier detection approaches increases the prediction performance in different ranges. With the best outlier detection approach, this performance increase was at a high 21,6% for Random Forest, with a 6,97% increase in average model performance.

Anahtar Kelimeler

Makine Öğrenimi İle Mülk Değerlemesinde Aykırı Değer Tespit Yöntemlerinin Etkisi

Öz

Konut alanlar ve satanlar kadar bir yatırım aracı olarak konut üzerinden yatırımda bulunanalar için de konut fiyatının gerçekçi ve en yüksek doğrulukta tahmin edilmesi gerekmektedir. Tahmin modelinin, piyasanın altında yatan temellerin en uygun temsili olması gerektiği unutulmamalıdır. Aksi takdirde konut değerlemesinde yapılacak hata emlak vergisinin tutarsız ve sağlıksız artırılması veya azaltılması, bazı gruplar lehine aşırı kazanç veya kayıp ve yatırımcılar ile potansiyel konut sahiplerini olumsuz etkilemesi gibi bazı istenmeyen sonuçlara neden olacaktır. Tam bu noktada günümüzde veri odaklı konut değerleme yaklaşımları yüksek doğrulukta ve önyargısız tahminler oluşturmada daha sık tercih edilmektedir. Fakat makine öğrenmesi yaklaşımları ile gerçekleştirilen modellerin tutarlılığı, kesinliği ve doğruluğu veri kalitesi ile doğrudan bağlantılıdır. Bu noktada bu çalışmada elde edilen geniş bir veri seti ile konut değerlemede özellikle aykırı değer tespitinin tahmin performansı üzerine etkileri araştırılmaktadır. Bu amaçla 70.771 konut verisi ve 283 adet değişkene sahip hetorejen bir veri seti ile 4 farklı aykırı değer tespiti yöntemi 3 farklı makine öğrenmesi yaklaşımı ile test edilmiştir. Elde edilen ampirik bul gular farklı aykırı değer tespiti yaklaşımlarının kullanılmasının tahmin performansını farklı aralıklarda artığını ortaya koymaktadır. En iyi aykırı değer tespiti yaklaşımı ile ortalama model performansında % 6,97’lik bir artışla birlikte Rastgele Orman için bu performans artışı % 21,6’lık yüksek bir oranda gerçekleşmiştir.

Anahtar Kelimeler

Teşekkür

Bu makalede bilimsel araştırma ve yayın etiği ilkelerine uyulmuştur. Bu makale Cihan Çılgın tarafından Gazi Üniversitesi Bilişim Enstitüsü Yönetim Bilişim Sistemleri Anabilim Dalı'nda gerçekleştirilen doktora tezinden üretilmiştir.

Kaynakça

Abhyankar, A. A., & Singla, H. K. (2021). Comparing predictive performance of general regression neural network (GRNN) and hedonic regression model for factors affecting housing prices in “Pune-India”. International Journal of Housing Markets and Analysis.
Alexandridis, A. K., Karlis, D., Papastamos, D., & Andritsos, D. (2019). Real Estate valuation and forecasting in non-homogeneous markets: A case study in Greece during the financial crisis. Journal of the Operational Research Society, 70(10), 1769-1783.
Alfaro-Navarro, J. L., Cano, E. L., Alfaro-Cortés, E., García, N., Gámez, M. and Larraz, B. (2020). A fully automated adjustment of ensemble methods in machine learning for modeling complex real estate systems. Complexity, 2020.
Alkan, T., Dokuz, Y., Ecemiş, A., Bozdağ, A., & Durduran, S. S. (2022). Using Machine Learning algorithms for predicting real estate values in tourism centers.
Almond, N., Lewis, O., Jenkins, D., Gronow, S., & Ware, A. (1997, September). Intelligent systems for the valuation of residential property. In RICS Cutting Edge, Conference, Dublin (pp. 1-19).
Aydemir, E., Aktürk, C., & Yalçınkaya, M. A. (2020). Yapay zekâ ile konut fiyatlarının tahmin edilmesi. Turkish Studies, 15(2), 183-194.
Aydemir, E., Aktürk, C., & Yalçınkaya, M. A. (2020). Yapay zekâ ile konut fiyatlarının tahmin edilmesi. Turkish Studies, 15(2), 183-194.
Barnett, V., & Lewis, T. (1984). Outliers in statistical data. Wiley Series in Probability and Mathematical Statistics. Applied Probability and Statistics.

Bergadano, F., Bertilone, R., Paolotti, D., & Ruffo, G. (2021). Developing real estate automated valuation models by learning from heterogeneous data sources. International Journal of Real Estate Studies, 15(1), 72-85.
Bilgilioğlu, S. S., & Yılmaz, H. M. (2021). Comparison of different machine learning models for mass appraisal of real estate. Survey Review, 1-12.
Bin, J., Tang, S., Liu, Y., Wang, G., Gardiner, B., Liu, Z., & Li, E. (2017, September). Regression model for appraisal of real estate using recurrent neural network and boosting tree. In 2017 2nd IEEE international conference on computational intelligence and applications (ICCIA) (pp. 209-213). IEEE.
Bin, O. (2004). A prediction comparison of housing sales prices by parametric versus semi-parametric regressions. Journal of Housing Economics, 13(1), 68-84.
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
Büyük, G., & Ünel, F. B. (2021). Comparison of modern methods using the python programming language in mass housing valuation. Advanced Land Management, 1(1), 21-26.
Chou, S. M., Lee, T. S., Shao, Y. E., & Chen, I. F. (2004). Mining the breast cancer pattern using artificial neural networks and multivariate adaptive regression splines. Expert systems with applications, 27(1), 133-142.
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1), 21-27.
Daşkıran, F. (2015). Denizli kentinde konut talebine etki eden faktörlerin hedonik fiyatlandırma modeli ile tahmin edilmesi. Journal Of International Social Research, 8(37).
Fu, T. (2018, June). Forecasting second-hand housing price using artificial intelligence and machine learning techniques. In 2018 8th International Conference on Mechatronics, Computer and Education Informationization (MCEI 2018) (pp. 269-273). Atlantis Press.
Galli, S. (2020). Python feature engineering cookbook: over 70 recipes for creating, engineering, and transforming features to build machine learning models. Packt Publishing Ltd, 42-25.
Gao, G., Bao, Z., Cao, J., Qin, A. K., & Sellis, T. (2022). Location-centered house price prediction: A multi-task learning approach. ACM Transactions on Intelligent Systems and Technology (TIST), 13(2), 1-25.
García-Magariño, I., Medrano, C., & Delgado, J. (2020). Estimation of missing prices in real-estate market agent-based simulations with machine learning and dimensionality reduction methods. Neural Computing and Applications, 32(7), 2665-2682.
Gilbertson, B., & Preston, D. (2005). A vision for valuation. Journal of Property Investment and Finance, 23(2), 123-140.
Gupta, R., Marfatia, H. A., Pierdzioch, C., & Salisu, A. A. (2021). Machine Learning predictions of housing market synchronization across us states: the role of uncertainty. The Journal of Real Estate Finance and Economics, 1-23.
Hårsman, B., & Quigley, J. M. (Eds.). (1991). Housing markets and housing institutions: an international comparison. Massachusetts: Kluwer Academic Publishers, 2-3.
Ho, W. K., Tang, B. S., & Wong, S. W. (2021). Predicting property prices with machine learning algorithms. Journal of Property Research, 38(1), 48-70.
Hodge, V., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial intelligence review, 22(2), 85-126.
Iglewicz, B., & Hoaglin, D. C. (1993). How to detect and handle outliers (Vol. 16). Asq Press.
Imran, I., Zaman, U., Waqar, M., & Zaman, A. (2021). Using machine learning algorithms for housing price prediction: the case of Islamabad housing data. Soft Computing and Machine Intelligence, 1(1), 11-23.
İlhan, A. T., & Semih, Ö. Z. (2020). Yapay sinir ağlarının gayrimenkullerin toplu değerlemesinde uygulanabilirliği: Gölbaşı ilçesi örneği. Hacettepe Üniversitesi Sosyal Bilimler Dergisi, 2(2), 160-188.
Jha, S. B., Babiceanu, R. F., Pandey, V., & Jha, R. K. (2020). Housing market prediction problem using different machine learning algorithms: A case study. arXiv preprint arXiv:2006.10092.
Jui, J. J., Molla, M. I., Bari, B. S., Rashid, M., & Hasan, M. J. (2020). flat price prediction using linear and random forest regression based on machine learning techniques. In Embracing Industry 4.0 (pp. 205-217). Springer, Singapore.
Kalliola, J., Kapočiūtė-Dzikienė, J., & Damaševičius, R. (2021). Neural network hyperparameter optimization for prediction of real estate prices in Helsinki. PeerJ Computer Science, 7, e444.
Kim, J., Won, J., Kim, H., & Heo, J. (2021). Machine-Learning-Based prediction of land prices in Seoul, South Korea. Sustainability, 13(23), 13088.
Kouwenberg, R., & Zwinkels, R. (2014). Forecasting the US housing market. International Journal of Forecasting, 30(3), 415-425.
Küçükkaplan, İ,, & Aldı, F. A. (2017). Denizli ilinde konut fiyatlarına etki eden faktörlerin panel verilerle analizi. Balıkesir Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 20(37), 219-236.
Liu, F. T., Ting, K. M., & Zhou, Z. H. (2008, December). Isolation forest. In 2008 eighth ieee international conference on data mining (pp. 413-422). IEEE.
Manasa, J., Gupta, R., & Narahari, N. S. (2020, March). Machine learning based predicting house prices using regression techniques. In 2020 2nd International conference on innovative mechanisms for industry applications (ICIMIA) (pp. 624-630). IEEE.
Mankad, M. D. (2021). Comparing OLS based hedonic model and ANN in house price estimation using relative location. Spatial Information Research, 1-10.
Manrique, M. A. C., Otero Gomez, D., Sierra, O. B., Laniado, H., Mateus C, R., & Millan, D. A. R. (2020). Housing-Price Prediction in Colombia using Machine Learning. OSF Preprints, (w85z2).
McGreal, S., Adair, A., McBurney, D., & Patterson, D. (1998). Neural networks: the prediction of residential values. Journal of Property Valuation and Investment, 16(1), 57-70.
Mrsic , L., Jerkovic, H., & Balkovic, M. (2020). Real estate market price prediction framework based on public data sources with case study from croatia. In: Sitek, P., Pietranik, M., Krótkiewicz, M., Srinilta, C. (eds) Intelligent Information and Database Systems. ACIIDS 2020. Communications in Computer and Information Science, vol 1178. Springer, Singapore. https://doi.org/10.1007/978-981-15-3380-8_2.
Pagourtzi, E., Assimakopoulos, V., Hatzichristos, T., & French, N. (2003). Real estate appraisal: a review of valuation methods. Journal of Property Investment & Finance, 21(4), 383-401.
Pai, P. F., & Wang, W. C. (2020). Using machine learning models and actual transaction data for predicting real estate prices. Applied Sciences, 10(17), 5832.
Pérez-Rave, J. I., González-Echavarría, F.,, & Correa-Morales, J. C. (2020). Modeling of apartment prices in a Colombian context from a machine learning approach with stable-important attributes. Dyna, 87(212), 63-72.
Peter, N. J., Okagbue, H. I., Obasi, E. C., & Akinola, A. O. (2020). Review on the application of artificial neural networks in real estate valuation. International Journal, 9(3), 2918-2925.
Poursaeed, O., Matera, T., & Belongie, S. (2018). Vision-based real estate price estimation. Machine Vision and Applications, 29(4), 667-676.
Rahman, S. K., Sathik, M. M., & Kannan, K. S. (2012). Multiple linear regression models in outlier detection. International Journal of Research in Computer Science, 2(2), 23-28.
Rampini, L., & Cecconi, F. R. (2021). Artificial intelligence algorithms to predict Italian real estate market prices. Journal of Property Investment & Finance.
Sa’at, N. F., Maimun, N. H. A., & Idris, N. H. (2021). Enhancing the accuracy of malaysian house price forecasting: a comparative analysis on the forecasting performance between the hedonic price model and artificial neural network model. Planning Malaysia, 19, 249- 259.
Sandbhor, S., & Chaphalkar, N. B. (2019). Impact of outlier detection on neural networks based property value prediction. In Information systems design and intelligent applications (pp. 481-495). Springer, Singapore.
Sangha, A. (2021). Property valuation by machine learning for the Norwegian real estate market. ScienceOpen Preprints. DOI: 10.14293/S2199-1006.1.SOR-.PP0TP9I.v1
Sawant, R., Jangid, Y., Tiwari, T., Jain, S., & Gupta, A. (2018, August). Comprehensive analysis of housing price prediction in pune using multi-featured random forest approach. In 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA) (pp. 1-5). IEEE.
Sevgen, S. C., ve Aliefendioğlu, Y. (2020). Mass apprasial with a machine learning algorithm: random forest regression. Bilişim Teknolojileri Dergisi, 13(3), 301-311.
Seya, H., & Shiroi, D. (2022). A comparison of residential apartment rent price predictions using a large data set: Kriging versus deep neural network. Geographical Analysis, 54(2), 239-260.
Shapiro, E., Mackmin, D., & Sams, G. (2019). Modern methods of valuation. Estates Gazette
Shi, D., Guan, J., Zurada, J., and Levitan, A. S. (2022). Predicting home sale prices: A review of existing methods and illustration of data stream methods for improved performance. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 12(2), e1435.
Sing, T. F., Yang, J. J., & Yu, S. M. (2021). Boosted tree ensembles for artificial intelligence based automated valuation models (AI-AVM). The Journal of Real Estate Finance and Economics, 1-26.
Sisman, S., & Aydinoglu, A. C. (2022). Improving performance of mass real estate valuation through application of the dataset optimization and Spatially Constrained Multivariate Clustering Analysis. Land Use Policy, 119, 106167.
Steurer, M., Hill, R. J., & Pfeifer, N. (2021). Metrics for evaluating the performance of machine learning based automated valuation models. Journal of Property Research, 38(2), 99-129.
Štubňová, M., Urbaníková, M., Hudáková, J., & Papcunová, V. (2020). Estimation of residential property market price: comparison of artificial neural networks and hedonic pricing model. Emerging Science Journal, 4(6), 530-538.
Tabar, M. E., Başara, A. C. ve Şişman, Y. (2021). Çoklu Regresyon ve Yapay Sinir Ağları ile Tokat ilinde konut değerleme çalışması. Türkiye Arazi Yönetimi Dergisi, 3(1), 1-7.
Tchuente, D., & Nyawa, S. (2021). Real estate price estimation in French cities using geocoding and machine learning. Annals of Operations Research, 571-608.
Terregrossa, S. J., & Ibadi, M. H. (2021). Combining housing price forecasts generated separately by hedonic and artificial neural network models. Asian Journal of Economics, Business and Accounting, 1, 130-148.
Tibshirani. R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) (1996), 267–288.
Torres-Pruñonosa, J., García-Estévez, P., & Prado-Román, C. (2021). Artificial neural network, quantile and semi-log regression modelling of mass appraisal in housing. Mathematics, 9(7), 783.
Truong, Q., Nguyen, M., Dang, H., & Mei, B. (2020). Housing price prediction via improved machine learning techniques. Procedia Computer Science, 174, 433-442.
Viktorovich, P. A., Aleksandrovich, P. V., Leopoldovich, K. I., & Vasilevna, P. I. (2018, August). Predicting sales prices of the houses using regression methods of machine learning. In 2018 3rd Russian-Pacific Conference on Computer Technology and Applications (RPC) (pp. 1-5). IEEE.
Walthert, L., & Sigrist, F. (2019). Deep learning for real estate price prediction. Available at SSRN 3393434.
Xu, D., Wang, Y., Meng, Y., & Zhang, Z. (2017, December). An improved data anomaly detection method based on isolation forest. In 2017 10th international symposium on computational intelligence and design (ISCID) (Vol. 2, pp. 287-291). IEEE.
Yacim, J. A., & Boshoff, D. G. B. (2020). Neural networks support vector machine for mass appraisal of properties. Property Management, 38(2), 241-272.
Yazdani, M. (2021). Machine Learning, Deep Learning, and Hedonic Methods for real estate price prediction. arXiv preprint arXiv:2110.07151.
Yıldırım, H. (2019). Property value assessment using artificial neural networks, hedonic regression and nearest neighbors regression methods. Selçuk Üniversitesi Mühendislik, Bilim ve Teknoloji Dergisi, 7(2), 387-404.
Yilmazer, S., & Kocaman, S. (2020). A mass appraisal assessment study using machine learning based on multiple regression and random forest. Land use policy, 99, 104889.
Zhang, F., & O'Donnell, L. J. (2020). Support vector regression. In Machine Learning (pp. 123-140). Academic Press.
Zhao, Y., Chetty, G., & Tran, D. (2019, December). Deep learning with XGBoost for real estate appraisal. In 2019 IEEE symposium series on computational intelligence (SSCI) (pp. 1396-1401). IEEE.
Zurada, J., Levitan, A., & Guan, J. (2011). A comparison of regression and artificial intelligence methods in a mass appraisal context. Journal of real estate research, 33(3), 349-388.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Bölgesel Çalışmalar

Bölüm

Araştırma Makalesi

Yazarlar

Cihan Çılgın ^*
0000-0002-8983-118X
Türkiye

Yılmaz Gökşen
0000-0002-2291-2946
Türkiye

Hadi Gökçen
0000-0002-5163-0008
Türkiye

Erken Görünüm Tarihi

27 Nisan 2023

Yayımlanma Tarihi

18 Temmuz 2023

Gönderilme Tarihi

24 Mart 2023

Kabul Tarihi

28 Mart 2023

Yayımlandığı Sayı

Yıl 2023 Cilt: 5 Sayı: 1

DOI

https://doi.org/10.47899/ijss.1270433

IZ

https://izlik.org/JA54XD56NY

Kaynak Göster

RIS / Bibtex

APA

Çılgın, C., Gökşen, Y., & Gökçen, H. (2023). The Effect of Outlier Detection Methods in Real Estate Valuation with Machine Learning. İzmir Sosyal Bilimler Dergisi, 5(1), 9-20. https://doi.org/10.47899/ijss.1270433

AMA

1.Çılgın C, Gökşen Y, Gökçen H. The Effect of Outlier Detection Methods in Real Estate Valuation with Machine Learning. İzmir Sosyal Bilimler Dergisi. 2023;5(1):9-20. doi:10.47899/ijss.1270433

Chicago

Çılgın, Cihan, Yılmaz Gökşen, ve Hadi Gökçen. 2023. “The Effect of Outlier Detection Methods in Real Estate Valuation with Machine Learning”. İzmir Sosyal Bilimler Dergisi 5 (1): 9-20. https://doi.org/10.47899/ijss.1270433.

EndNote

Çılgın C, Gökşen Y, Gökçen H (01 Temmuz 2023) The Effect of Outlier Detection Methods in Real Estate Valuation with Machine Learning. İzmir Sosyal Bilimler Dergisi 5 1 9–20.

IEEE

[1]C. Çılgın, Y. Gökşen, ve H. Gökçen, “The Effect of Outlier Detection Methods in Real Estate Valuation with Machine Learning”, İzmir Sosyal Bilimler Dergisi, c. 5, sy 1, ss. 9–20, Tem. 2023, doi: 10.47899/ijss.1270433.

ISNAD

Çılgın, Cihan - Gökşen, Yılmaz - Gökçen, Hadi. “The Effect of Outlier Detection Methods in Real Estate Valuation with Machine Learning”. İzmir Sosyal Bilimler Dergisi 5/1 (01 Temmuz 2023): 9-20. https://doi.org/10.47899/ijss.1270433.

JAMA

1.Çılgın C, Gökşen Y, Gökçen H. The Effect of Outlier Detection Methods in Real Estate Valuation with Machine Learning. İzmir Sosyal Bilimler Dergisi. 2023;5:9–20.

MLA

Çılgın, Cihan, vd. “The Effect of Outlier Detection Methods in Real Estate Valuation with Machine Learning”. İzmir Sosyal Bilimler Dergisi, c. 5, sy 1, Temmuz 2023, ss. 9-20, doi:10.47899/ijss.1270433.

Vancouver

1.Cihan Çılgın, Yılmaz Gökşen, Hadi Gökçen. The Effect of Outlier Detection Methods in Real Estate Valuation with Machine Learning. İzmir Sosyal Bilimler Dergisi. 01 Temmuz 2023;5(1):9-20. doi:10.47899/ijss.1270433

Cited By

A Hybrid Machine Learning Model Architecture with Clustering Analysis and Stacking Ensemble for Real Estate Price Prediction

Computational Economics

https://doi.org/10.1007/s10614-024-10703-4

Comparison of machine learning algorithms and multiple linear regression for live weight estimation of Akkaraman lambs

Tropical Animal Health and Production

https://doi.org/10.1007/s11250-024-04049-0

Determining sub-real estate markets with hybrid gradual unsupervised learning for better real estate price prediction performance

Survey Review

https://doi.org/10.1080/00396265.2025.2517518

Reply to the comment on: comparison of machine learning algorithms and multiple linear regression for live weight estimation of Akkaraman lambs

Tropical Animal Health and Production

https://doi.org/10.1007/s11250-026-04870-9