Performance Comparison of TOR Hidden Service Crawlers
Abstract
TOR (The Onion Routing) is a network structure that has become popular in recent years due to providing anonymity to its users and is often preferred by hidden services. In this network, which attracts attention due to the fact that privacy is essential, so the amount of data stored increases day by day, making it difficult to scan and analyze the data. In addition, it is highly likely that the process performed during the onion extension services scan will be considered as cyber-attack and the access to the relevant address will be blocked. Various crawler software has been developed in order to scan and access the services (onion web pages) in this network. However, crawling here is different from crawling pages in a surface network with extensions such as com, net, org. This is because the TOR network is located on the lower layers of the surface network, and the pages in TOR network are accessed only through the TOR browser instead of the traditional browsers (Chrome, Mozilla, etc.). In the crawler softwares developed to date, this situation was taken into consideration and in order to protect the confidentiality, the data was obtained by selecting paths through different relays in the requests made to the addresses.
In the TOR network, reaching the target address by passing over different nodes in each request sent by the users, slows down this network. In addition, the low performance of a browser that tries to retrieve information through TOR brings long periods of waiting. Therefore, working with crawler software with high crawling and information acquisition speed will improve the analysis process of the researchers. 4 different crawler software was evaluated according to various criteria in terms of guiding the people who will conduct research in this field and evaluating the superior and weaknesses of the crawlers against each other. The study provides an important point of view for choosing the right crawler in terms of initial starting points for the researchers want to analyze of Tor web services.
Keywords
References
- [1] AlKhatib, B., Basheer, R. (2019). Crawling the Dark Web: A Conceptual Perspective, Challenges and Implementation. Journal of Digital Information Management, 17(2), 51-60.
- [2] Hoelscher, P. (2018). What is the Difference Between the Surface Web, the Deep Web, and the Dark Web? Infosec Resources, https://resources.infosecinstitute.com/what-is-the-difference-between-the-surface-web-the-deep-web-and-the-dark-web/#gref, (01.12.2019).
- [3] Zabihimayvan, M., Sadeghi, R., Doran, D., Allahyari, M. (2019). A Broad Evaluation of the Tor English Content Ecosystem. In Proceedings on WebSci 2019, June 30–July 3, Boston, Massachusetts, 333-342.
- [4] Owen, G., Savage, N. (2016). Empirical analysis of Tor Hidden Services. IET Information Security, 10(3), 113-118.
- [5] Park, J., Mun, H., Lee, Y. (2018). Improving Tor Hidden Service Crawler Performance. In 2018 IEEE Conference on Dependable and Secure Computing (DSC), 10-13 December, Kaohsiung, Taiwan, 1-8.
- [6] Casenove, M., Miraglia, A. (2014). Botnet over Tor: The illusion of hiding. In 2014 6th International Conference On Cyber Conflict, 3-6 June, Tallinn, Estonia, 273-282.
- [7] Pundhir, S., Rafiq, M. Q. (2011). Performance Evaluation of Web Crawler. In IJCA Proceedings on International Conference on Emerging Technology Trends (ICETT), 43-46.
- [8] Achsan, H. T. Y., Wibowo, W. C. (2014). A Fast Distributed Focused-Web Crawling. Procedia Engineering, 69, 492-499.
Details
Primary Language
English
Subjects
Engineering
Journal Section
Research Article
Authors
Publication Date
December 26, 2019
Submission Date
August 21, 2019
Acceptance Date
December 6, 2019
Published in Issue
Year 2019 Volume: 6 Number: 2