Performance Comparison of TOR Hidden Service Crawlers

Merve Varol Arısoy; Ecir Uğur Küçüksille

doi:10.35193/bseufbd.608555

EN TR

Performance Comparison of TOR Hidden Service Crawlers

Abstract

TOR (The Onion Routing) is a network structure that has become popular in recent years due to providing anonymity to its users and is often preferred by hidden services. In this network, which attracts attention due to the fact that privacy is essential, so the amount of data stored increases day by day, making it difficult to scan and analyze the data. In addition, it is highly likely that the process performed during the onion extension services scan will be considered as cyber-attack and the access to the relevant address will be blocked. Various crawler software has been developed in order to scan and access the services (onion web pages) in this network. However, crawling here is different from crawling pages in a surface network with extensions such as com, net, org. This is because the TOR network is located on the lower layers of the surface network, and the pages in TOR network are accessed only through the TOR browser instead of the traditional browsers (Chrome, Mozilla, etc.). In the crawler softwares developed to date, this situation was taken into consideration and in order to protect the confidentiality, the data was obtained by selecting paths through different relays in the requests made to the addresses.

In the TOR network, reaching the target address by passing over different nodes in each request sent by the users, slows down this network. In addition, the low performance of a browser that tries to retrieve information through TOR brings long periods of waiting. Therefore, working with crawler software with high crawling and information acquisition speed will improve the analysis process of the researchers. 4 different crawler software was evaluated according to various criteria in terms of guiding the people who will conduct research in this field and evaluating the superior and weaknesses of the crawlers against each other. The study provides an important point of view for choosing the right crawler in terms of initial starting points for the researchers want to analyze of Tor web services.

Keywords

TOR,crawler yazılımı,performans kıyaslaması

TOR Gizli Servis Tarayıcılarının Performans Karşılaştırması

Abstract

TOR (The Onion Routing), kullanıcısına anonimliği sağlaması sebebiyle son zamanlarda popülerliği artan ve onion uzantılı gizli servisler tarafından sıklıkla tercih edilen bir ağ yapısıdır. Gizliliğin esas olması nedeniyle dikkatleri üzerine çeken bu ağda, her geçen gün depolanan veri miktarı artmakta bu da verilerin taranma ve analiz edilme durumlarını zorlaştırmaktadır. Ayrıca, onion uzantılı servislerin taranması sırasında yapılan işlemin siber saldırı olarak değerlendirilip ilgili adrese erişimin engellenme ihtimali de yüksektir. Bu ağda yer alan servislerin (onion uzantılı web sayfaları) taranması ve içeriklerine ulaşılabilmesi için çeşitli crawler yazılımları geliştirilmiştir. Yalnız, burada yapılan tarama com, net, org gibi uzantılara sahip yüzey ağında yer alan sayfaların taranmasından farklıdır. Çünkü TOR ağı, yüzey ağının alt katmanlarında yer almakta ve buradaki sayfalara geleneksel tarayıcılar (chrome,mozilla vb.) yerine yalnızca TOR tarayıcısı aracılığıyla ulaşılmaktadır. Geliştirilen crawler yazılımlarında bu durum dikkate alınmış ve gizliliği korumak adına, adreslere yapılan her istekte farklı düğümler üzerinden yol seçimi yapılarak veri edinimine dikkat edilmiştir.

TOR ağında kullanıcıların gönderdiği her istekte farklı düğümler üzerinden geçilerek hedef adrese ulaşılması bu ağı yavaşlatmaktadır. Buna ilaveten, TOR üzerinden bilgi getirmeye çalışan bir tarayıcının performansının düşük olması da uzun süreler beklemeyi beraberinde getirir. Bu yüzden tarama ve bilgi elde etme hızı yüksek crawler yazılımları ile çalışmak araştırmacıların analiz süreçlerini de iyileştirecektir. Bu alanda araştırma yapacak olan kişileri yönlendirmesi ve crawler yazılımlarının birbirlerine karşı olan üstün ve zayıf yönlerinin değerlendirilmesi açısından 4 farklı crawler yazılımı çeşitli kriterlere göre değerlendirilmiştir. Gerçekleştirilen çalışma, TOR web servislerinin analizini yapmak isteyen araştırmacıların ilk çıkış noktaları anlamında doğru bir crawler yazılımını seçmeleri hususunda önemli bir bakış açısı sunmaktadır.

Keywords

TOR,crawler yazılımı,performans kıyaslaması

References

[1] AlKhatib, B., Basheer, R. (2019). Crawling the Dark Web: A Conceptual Perspective, Challenges and Implementation. Journal of Digital Information Management, 17(2), 51-60.
[2] Hoelscher, P. (2018). What is the Difference Between the Surface Web, the Deep Web, and the Dark Web? Infosec Resources, https://resources.infosecinstitute.com/what-is-the-difference-between-the-surface-web-the-deep-web-and-the-dark-web/#gref, (01.12.2019).
[3] Zabihimayvan, M., Sadeghi, R., Doran, D., Allahyari, M. (2019). A Broad Evaluation of the Tor English Content Ecosystem. In Proceedings on WebSci 2019, June 30–July 3, Boston, Massachusetts, 333-342.
[4] Owen, G., Savage, N. (2016). Empirical analysis of Tor Hidden Services. IET Information Security, 10(3), 113-118.
[5] Park, J., Mun, H., Lee, Y. (2018). Improving Tor Hidden Service Crawler Performance. In 2018 IEEE Conference on Dependable and Secure Computing (DSC), 10-13 December, Kaohsiung, Taiwan, 1-8.
[6] Casenove, M., Miraglia, A. (2014). Botnet over Tor: The illusion of hiding. In 2014 6th International Conference On Cyber Conflict, 3-6 June, Tallinn, Estonia, 273-282.
[7] Pundhir, S., Rafiq, M. Q. (2011). Performance Evaluation of Web Crawler. In IJCA Proceedings on International Conference on Emerging Technology Trends (ICETT), 43-46.
[8] Achsan, H. T. Y., Wibowo, W. C. (2014). A Fast Distributed Focused-Web Crawling. Procedia Engineering, 69, 492-499.

[9] Yadav, M., Goyal, N. (2015). Comparison of Open Source Crawlers-A Review. International Journal of Scientific & Engineering Research, 6, 1544-1551.
[10] Dikaiakos, M., Stassopoulou, A., Papageorgiou, L. (2003). Characterizing Crawler Behavior from Web Server Access Logs. In E-Commerce and Web Technologies, 2-5 September, Prague, 369-378.
[11] Zulkarnine, A. T., Frank, R., Monk, B., Mitchell, J., Davies, G. (2016). Surfacing collaborated networks in dark web to find illicit and criminal content. In 2016 IEEE Conference on Intelligence and Security Informatics (ISI), 28-30 September, Tucson, AZ, 109-114.
[12] Baravalle, A., Lopez, M. S., Lee, S. W. (2016). Mining the Dark Web: Drugs and Fake Ids. In 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), 12-15 December, Barcelona, 350-356.
[13] Kalpakis, G., Tsikrika, T., Iliou, C., Mironidis, T., Vrochidis, S., Middleton, J., Williamson, U., Kompatsiaris, I. (2016). Interactive Discovery and Retrieval of Web Resources Containing Home Made Explosive Recipes. In International Conference on Human Aspects of Information Security, Privacy, and Trust, 17 - 22 July, Toronto, 221-233.
[14] Iliou, C., Kalpakis, G., Tsikrika, T., Vrochidis, S., Kompatsiaris, I. (2016). Hybrid Focused Crawling for Homemade Explosives Discovery on Surface and Dark Web. In 2016 11th International Conference on Availability, Reliability and Security (ARES), 31 August-2 September, Salzburg, 229–234.
[15] Zhang, Y., Zeng, S., Huang, C., Fan, L., Yu, X., Dang, Y., A Larson, C., Denning, Roberts, N., Chen, H. (2010). Developing a Dark Web collection and infrastructure for computational and social sciences. In 2010 IEEE International Conference on Intelligence and Security Informatics, 23-26 May, Vancouver, BC, 59–64.
[16] Ghosh, S., Das, A., Porras, P., Yegneswaran, V., Gehani, A. (2017). Automated Categorization of Onion Sites for Analyzing the Darkweb Ecosystem. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2017, Halifax, Nova Scotia, 1793–1802.
[17] Pannu, M., Kay, I., Harris, D. (2018). Using Dark Web Crawler to Uncover Suspicious and Malicious Websites. In International Conference on Applied Human Factors and Ergonomics, 21-25 July, Orlando, Florida, 108-115.
[18] Raghavan, S., Garcia-Molina, H. (2001). Crawling the Hidden Web. In Proceeding VLDB '01 Proceedings of the 27th International Conference on Very Large Data Bases, 11 – 14 September, San Francisco, CA, 129-138.
[19] Seitz, J. (2016). Dark Web OSINT with Python Part Two: SSH Keys and Shodan on Automating OSINT. Automating OSINT, http://www.automatingosint.com/blog/2016/08/dark-web-osint-with-python-part-two-ssh-keys-and-shodan/, (28.07.2019).

Details

Primary Language

English

Subjects

Engineering

Journal Section

Research Article

Authors

Merve Varol Arısoy ^*
0000-0003-2085-1964
Türkiye

Ecir Uğur Küçüksille
0000-0002-3293-9878
Türkiye

Publication Date

December 26, 2019

Submission Date

August 21, 2019

Acceptance Date

December 6, 2019

Published in Issue

Year 2019 Volume: 6 Number: 2

DOI

https://doi.org/10.35193/bseufbd.608555

IZ

https://izlik.org/JA77JJ42ZD

Cite

RIS / Bibtex

APA

Varol Arısoy, M., & Küçüksille, E. U. (2019). Performance Comparison of TOR Hidden Service Crawlers. Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Dergisi, 6(2), 147-161. https://doi.org/10.35193/bseufbd.608555

AMA

1.Varol Arısoy M, Küçüksille EU. Performance Comparison of TOR Hidden Service Crawlers. Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Dergisi. 2019;6(2):147-161. doi:10.35193/bseufbd.608555

Chicago

Varol Arısoy, Merve, and Ecir Uğur Küçüksille. 2019. “Performance Comparison of TOR Hidden Service Crawlers”. Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Dergisi 6 (2): 147-61. https://doi.org/10.35193/bseufbd.608555.

EndNote

Varol Arısoy M, Küçüksille EU (December 1, 2019) Performance Comparison of TOR Hidden Service Crawlers. Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Dergisi 6 2 147–161.

IEEE

[1]M. Varol Arısoy and E. U. Küçüksille, “Performance Comparison of TOR Hidden Service Crawlers”, Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Dergisi, vol. 6, no. 2, pp. 147–161, Dec. 2019, doi: 10.35193/bseufbd.608555.

ISNAD

Varol Arısoy, Merve - Küçüksille, Ecir Uğur. “Performance Comparison of TOR Hidden Service Crawlers”. Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Dergisi 6/2 (December 1, 2019): 147-161. https://doi.org/10.35193/bseufbd.608555.

JAMA

1.Varol Arısoy M, Küçüksille EU. Performance Comparison of TOR Hidden Service Crawlers. Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Dergisi. 2019;6:147–161.

MLA

Varol Arısoy, Merve, and Ecir Uğur Küçüksille. “Performance Comparison of TOR Hidden Service Crawlers”. Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Dergisi, vol. 6, no. 2, Dec. 2019, pp. 147-61, doi:10.35193/bseufbd.608555.

Vancouver

1.Merve Varol Arısoy, Ecir Uğur Küçüksille. Performance Comparison of TOR Hidden Service Crawlers. Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Dergisi. 2019 Dec. 1;6(2):147-61. doi:10.35193/bseufbd.608555