Türk Startupları İçin Entegre Bir Veritabanı Oluşturma: Sistematik ve Yenilikçi Bir Çerçeve
Year 2025,
Volume: 12 Issue: 2, 398 - 420, 30.11.2025
İsmail Ozan Çelikel
,
Eda Bahar
,
Günce Keziban Orman
,
Sultan Nezihe Turhan
Abstract
Son yıllarda, Türkiye'deki startup ekosistemi, devlet desteklerinin artması, özel yatırımların çeşitlenmesi, startup kültürünün dünya çapında yayılması ve teknolojik gelişmeler sayesinde önemli ölçüde büyümüştür. Hızla artan sayılara rağmen, farklı sektörlerdeki yeni girişimci firmalar için güncel, kapsamlı ve analitik olarak kullanılabilir bir veri tabanı bulunmamaktadır. Bu çalışma, geleneksel ETL süreçlerini modern veri mühendisliği teknikleriyle birleştiren hibrit bir metodoloji kullanarak Türkiye'deki startup'lar için entegre ve merkezi bir veri tabanı oluşturmaktadır. Şirketlere ait tüm veriler, web kazıma yöntemi ile kamuya açık veri tabanlarından elde edilmiş ve belge tabanlı bir NoSQL veri tabanı olan MongoDB'de bir veri tabanında depolanmıştır. Veri ön işleme tutarlılık, bütünlük ve yapısal bütünlük sağlarken, keşifsel veri analizi Türkiye'deki startup ekosisteminin coğrafi dağılımı, faaliyet alanları ve işgücü metrikleri hakkında kritik içgörüler ortaya koymuştur. Çalışmanın sonunda elde edilen bulgular araştırmacılar, politika yapıcılar ve farklı sektörlerde faaliyet gösteren firmalar dahil olmak üzere paydaşlara çok değerli bilgiler sunmaktadır. Ölçeklenebilirliği ve uyarlanabilirliği ile karakterize edilen çalışmada tanıtılan veri hattı oluşturma metodolojisi, diğer alanlardaki veri mühendisliği projeleri için tekrarlanabilir bir çerçeve görevi de görmektedir. Gelecekteki araştırmalar, bu veri setini finansal ölçümler ve sektörel etkilerle zenginleştirerek analitik kapasitesini daha da artırabilir.
Ethical Statement
Bu çalışmanın hazırlanma sürecinde bilimsel ve etik ilkelere uyulduğu ve yararlanılan tüm çalışmaların kaynakçada belirtildiği beyan olunur.
Supporting Institution
Bu araştırmayı desteklemek için dış fon kullanılmamıştır.
Project Number
FBA-2023-1211
Thanks
Bu çalışma, TÜBİTAK 1505 Üniversitesi Sanayi İşbirliği Destek Programı ve GSU Bilimsel Araştırma Projeleri tarafından FBA-2023-1211 proje numarasıyla desteklenen " Ar-Ge Projeleri Yönetiminde Yapay Zekâ Destekli, Uzmanlık Odaklı, Akademik Danışman ve Yeni Girişimci Firma Öneri Portalı Geliştirilmesi " (Proje Numarası 5240007) başlıklı araştırma projesi kapsamında yürütülmüştür.
References
-
Startups Watch. (2024). Startups Watch. İstanbul, https://startups.watch/, (24.10.2024).
-
Startup Market. (2024). Startup Market. İstanbul, https://startupmarket.co/, (31.10.2024).
-
Buyukbalci, P., Sanguineti, F., & Sacco, F. (2024). Rejuvenating business models via startup collaborations: Evidence from the Turkish context. Journal of Business Research, 174(8), 114521.
-
Sakarya, Ş., & İlkdoğan, S. (2023). Türkiye’de Startup Yatırımları ve Finansmanı. Bucak İşletme Fakültesi Dergisi, 6(2), 146-171.
-
Eroglu, Y., & Rashid, L. (2022). The impact of perceived support and barriers on the sustainable orientation of Turkish startups. Sustainability, 14(8), 4666.
-
Birden, M., & Bastug, M. (2020). The Impact of Incubators on Entrepreneurial Process in Turkey: A guide for Startups. Journal of Business Economics and Finance, 9(2), 132-142.
-
Knight, A., Greer, L. L., & De Jong, B. (2020). Start-up teams: a multidimensional conceptualization, integrative review of past research, and future research agenda. Acad. Manag. Ann., 14(1), 231–266.
-
Startup Genome LLC (2022). The global Startup ecosystem report GSER 2024. San Fransisco, https://startupgenome.com/article/global-startup-ecosystem-ranking-2024-top-40, (18.11.2024).
-
Mandel, M. (2017). How the Startup Economy is Spreading Across the Country. And How It Can Be Accelerated. Washington, Progressive Policy Institute, https://www.progressivepolicy.org/how-the-startup-economy-is-spreading-across-the-country/, (18.11.2024)
-
Basole, R. C., Russell, M. G., Huhtamäki, J., Rubens, N., Still, K., & Park, H. (2015). Understanding business ecosystem dynamics: A data-driven approach. ACM Transactions on Management Information Systems (TMIS), 6(2), 1-32.
Ziakis, C., Vlachopoulou, M., & Petridis, K. (2022). Start-up ecosystem (StUpEco): A conceptual framework and empirical research. Journal of Open Innovation: Technology, Market, and Complexity, 8(1), 35.
-
Jáki, E., Molnár, E. M., & Kádár, B. (2019). Characteristics and challenges of the Hungarian startup ecosystem. Vezetéstudomány-Budapest Management Review, 50(5), 2-12.
-
Türkiye Technohub Platformu. (2024). Türkiye Technohub Platformu. Ankara, https://turkiyetechnohub.org, (05.09.2024)
-
Massimino, B. (2016). Accessing online data: Web‐crawling and information‐scraping techniques to automate the assembly of research data. Journal of Business Logistics, 37(1), 34-42.
-
Mancosu, M., & Vegetti, F. (2020). What you can scrape and what is right to scrape: A proposal for a tool to collect public Facebook data. Social Media+ Society, 6(3), 2056305120940703.
-
Krotov, V., Johnson, L., & Silva, L. (2020). Tutorial: Legality and Ethics of Web Scraping. Communications of the Association for Information Systems, 47, 539-563.
-
Khder, M. A. (2021). Web scraping or web crawling: State of art, techniques, approaches and application. International Journal of Advances in Soft Computing & Its Applications, 13(3), 145-168.
-
A. W. Sudrajat, Ermatita & Samsuryadi (2023), Extending The Data Integration Model As The Foundation Of Business Intelligence: A Systematic Literature Review. 2023 10th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), 20-21 September 2023, 175-182.
-
Agrawal, P. (2023). Web Scraping and its Applications. International Journal of Scientific Research in Engineering And Management, 7(10), 1-11.
-
Luscombe, A., Dick, K., & Walby, K. (2021). Algorithmic thinking in the public interest: navigating technical, legal, and ethical hurdles to web scraping in the social sciences. Quality & Quantity, 56, 1023-1044.
-
Goulas, S., & Karamitros, G. (2024). How to harness the power of web scraping for medical and surgical research: An application in estimating international collaboration. World journal of surgery, 48(6),1297-1300.
-
Rodrigues, L. A., & Polepally, S. K. (2021). Creating Financial Database for Education and Research: Using WEB SCRAPING Technique. Master thesis, Dalarna University, School of Technology and Business Studies, Dalarna.
-
Styawati, A. Nurkholis, F. A. Ans, S. Alim, L. Andraini & R. A. Prasetyo. (2023) Web Scraping for Summarization of Freelance Job Website Using Vector Space Model. 2023 IEEE 9th Information Technology International Seminar (ITIS) 18-20 October 2023, 1-5.
-
Barba, G., Lazoi, M., & Lezzi, M. (2024). Bibliometric Insights into Web Scraping and Advanced AI-Based Models for Valuable Business Data. ICEIS, 1, 321-328.
-
Dong, H., Zhang, C., Li, G., & Zhang, H. (2024). Cloud-native databases: A survey. IEEE Transactions on Knowledge and Data Engineering, 36(12), 7772-7791.
-
Zhou, D., Yan, Z., Fu, Y., & Yao, Z. (2018). A survey on network data collection. Journal of Network and Computer Applications, 116, 9-23.
-
Spaniol, M., Denev, D., Mazeika, A., Weikum, G., & Senellart, P. (2009). Data quality in web archiving. In Proceedings of the 3rd Workshop on Information Credibility on the Web 20 April 2009, 19-26.
-
Vording, R. M. (2021). Harvesting unstructured data in heterogenous business environments; exploring modern web scraping technologies. Bachelor's thesis, University of Twente, Enschede.
-
Gandhi, R., Khurana, S. & Manchanda, H. (2023). ETL Data Pipeline to Analyze Scraped Data. Decision Intelligence, Proceedings of the International Conference on Information Technology, InCITe 2023, Volume 1, 379-388.
-
Simitsis, A., Skiadopoulos, S., & Vassiliadis, P. (2023, March). The History, Present, and Future of ETL Technology. Proceedings of the 25th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP) co-located with the 26th International Conference on Extending Database Technology and the 26th International Conference on Database Theory (EDBT/ICDT 2023) 28 March 2023, 3-12.
-
Singhal, B., & Aggarwal, A. (2022, December). ETL, ELT and reverse ETL: a business case study. 2022 Second International Conference on Advanced Technologies in Intelligent Control, Environment, Computing & Communication Engineering (ICATIECE) 16-17 December 2022, 1-4.
-
Raj, A., Bosch, J., Olsson, H. H., & Wang, T. J. (2020, August). Modelling data pipelines. 2020 46th Euromicro conference on software engineering and advanced applications (SEAA) 26-28 August 2020, 13-20.
-
Walha, A., Ghozzi, F., & Gargouri, F. (2024). Data integration from traditional to big data: main features and comparisons of ETL approaches. The Journal of Supercomputing, 80(19), 26687-26725.
-
Nwokeji, J. C., & Matovu, R. (2021). A systematic literature review on big data extraction, transformation and loading (etl). Intelligent Computing: Proceedings of the 2021 Computing Conference 15-16 July 2021, 308-324.
-
Bhatlawande, S., Rajandekar, R., & Shilaskar, S. (2024). Implementing Middleware Architecture for Automated Data Pipeline over Cloud Technologies. 2024 IEEE 13th International Conference on Communication Systems and Network Technologies (CSNT) 06-07 April 2024, 506-513.
-
Hafyani, H., Abboud, M., & Taher, Y. (2021). A Microservices Based Architecture for Implementing and Automating ETL Data Pipelines for Mobile Crowdsensing Applications. 2021 IEEE International Conference on Big Data (Big Data) 15-18 December 2021, Orlando, FL, USA, 5909-5911.
-
Singu, S. K. (2021). Designing scalable data engineering pipelines using Azure and Databricks. ESP Journal of Engineering & Technology Advancements, 1(2), 176-187.
-
Diouf, P. S., Boly, A., & Ndiaye, S. (2018). Variety of data in the ETL processes in the cloud: State of the art. 2018 IEEE International Conference on Innovative Research and Development (ICIRD) 11-12 May 2018, Bangkok, Thailand, 1-5.
-
Loshin, D. (2010). The practitioner's guide to data quality improvement. Elsevier.
-
Lamer, A., Saint-Dizier, C., Paris, N., & Chazard, E. (2024). Data Lake, Data Warehouse, Datamart, and Feature Store: Their Contributions to the Complete Data Reuse Pipeline. JMIR medical informatics, 12, e54590.
-
Makris, A., Tserpes, K., Spiliopoulos, G., Zissis, D., & Anagnostopoulos, D. (2021). MongoDB Vs PostgreSQL: A comparative study on performance aspects. GeoInformatica, 25, 243-268.
-
Khan, W., Kumar, T., Zhang, C., Raj, K., Roy, A. M., & Luo, B. (2023). SQL and NoSQL database software architecture performance analysis and assessments—a systematic literature review. Big Data and Cognitive Computing, 7(2), 97.
-
Ali, A., Naeem, S., Anam, S., & Ahmed, M. M. (2023). A state of art survey for big data processing and nosql database architecture. International Journal of Computing and Digital Systems, 14(1), 1-1.
-
Ambre, A., Gaikwad, P., Pawar, K., & Patil, V. (2019). Web and android application for comparison of e-commerce products. International Journal of Advanced Engineering, Management and Science (IJAEMS), 5(4), 266-268.
-
Rathore, M., & Bagui, S. S. (2024). MongoDB: Meeting the Dynamic Needs of Modern Applications. Encyclopedia, 4(4), 1433-1453.
-
Ereth, J. (2018). DataOps-Towards a Definition. Lernen. Wissen. Daten. Analysen. (LWDA 2018) August 22–24 2018, Mannheim, Germany, 104-112.
-
Bergh, C., Benghiat, G., & Strod, E. (2019). The DataOps cookbook (Second Version). DataKitchen Hqrs. https://www.devopsschool.com/blog/wp-content/uploads/2021/07/DK_dataops_book_2nd_edition.pdf
-
Boegershausen, J., Datta, H., Borah, A., & Stephen, A. T. (2022). Fields of gold: Scraping web data for marketing insights. Journal of Marketing, 86(5), 1-20.
-
Vanden Broucke, Seppe, & Bart Baesens (2018). Practical Web scraping for data science. New York, NY: Apress.
-
Henrys, K. (2021). Importance of web scraping in e-commerce and e-marketing. SSRN Electron. Journal.
-
Barbera, G., Araújo, L.F., & Fernandes, S.C. (2023). The Value of Web Data Scraping: An Application to TripAdvisor. Big Data Cogn. Comput., 7, 121.
-
Ticu, C. C. (2021). The Austrian start-up incubator ecosystem: A web scraping, AWS ML & text analytics competitor analysis on digital content. Master’s thesis, Central European University, Department of Economics and Business, Vienna.
-
Vassiliadis, P., Simitsis, A., & Baikousi, E. (2009). A taxonomy of ETL activities. DOLAP '09: Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP 2009, 25-32.
-
Richardson, L. (2024). Beautiful Soup Documentation. Cambridge, https://www.crummy.com/software/BeautifulSoup/bs4/doc/#, (05.09.2024).
-
Google Developers. (2024). Puppeteer Documentation. Mountain View, https://pptr.dev/guides/what-is-puppeteer, (03.09.2024).
-
Ali, S., & Wrembel, R. (2017). From conceptual design to performance optimization of ETL workflows: current state of research and open problems. The VLDB Journal, 26, 777 - 801.
-
Rajić, M.N., Milosavljević, P., & Kostić, Z. (2023). Knowledge and Data Management: The Cornerstone of Effective Organizational Strategy. 2023 International Conference on Big Data, Knowledge and Control Systems Engineering (BdKCSE) 2-3 November 2023, Sofia, Bulgaria, 1-7.
-
Shrestha, L., & Sheikh, N. (2022). Multiperspective Assessment of Enterprise Data Storage Systems: Literature Review. 2022 Portland International Conference on Management of Engineering and Technology (PICMET) 07-11 August 2022, Portland, OR, USA, 1-8.
-
Nambiar, A., & Mundra, D. (2022). An overview of data warehouse and data lake in modern enterprise data management. Big data and cognitive computing, 6(4), 132.
-
Khine, P. P., & Wang, Z. S. (2018). Data lake: a new ideology in big data era. ITM Web of Conferences May 4-6 2018, Girne, Turkey, 03025.
-
Nurhadi, Kadir, R. B. A., & Surin, E. S. B. M. (2021). Evaluation of NoSQL Databases Features and Capabilities for Smart City Data Lake Management. In Information Science and Applications: Proceedings of ICISA 2020, Singapore 16-18 December, Bali, Indonesia, 383-392.
-
Koutroumanis, N., Kousathanas, N., Doulkeridis, C., & Vlachou, A. (2021). Declarative Querying of Heterogeneous NoSQL Stores. SEA-Data@ VLDB 20 August 2021, Copenhagen, Denmark, 42-43.
-
Niu, J., Xu, J., & Xie, L. (2018). Hybrid Storage Systems: A Survey of Architectures and Algorithms. IEEE Access, 6, 13385-13406.
-
Davoudian, A., Chen, L., & Liu, M. (2018). A Survey on NoSQL Stores. ACM Computing Surveys (CSUR), 51, 1-43.
-
Membrey P, Plugge E, Hawkins T & Hawkins D. (2010). The definitive guide to mongoDB: the noSQL database for cloud and desktop computing. Springer, Berlin.
-
Rathore, M., & Bagui, S. S. (2024). MongoDB: Meeting the Dynamic Needs of Modern Applications. Encyclopedia, 4(4), 1433-1453.
-
Alghamdi, T. A., & Javaid, N. (2022). A survey of preprocessing methods used for analysis of big data originated from smart grids. IEEE Access, 10, 29149-29171.
-
Famili, A., Shen, W. M., Weber, R., & Simoudis, E. (1997). Data preprocessing and intelligent data analysis. Intelligent data analysis, 1(1), 3-23.
-
Maharana, K., Mondal, S., & Nemade, B. (2022). A review: Data pre-processing and data augmentation techniques. Global Transitions Proceedings, 3(1), 91-99.
-
Zhu, X., Wu, X., & Chen, Q. (2003). Eliminating class noise in large datasets. Proceedings of the Twentieth International Conference on International Conference on Machine Learning August 21-24 2003, Washington, DC, USA, 920-927.
-
Reddy, G. T., Reddy, M. P. K., Lakshmanna, K., Kaluri, R., Rajput, D. S., Srivastava, G., & Baker, T. (2020). Analysis of dimensionality reduction techniques on big data. IEEE Access, 8, 54776-54788.
-
Katya, E (2023). Exploring Feature Engineering Strategies for Improving Predictive Models in Data Science. Research Journal of Computer Systems and Engineering, 4(2), 201–215.
-
Abt, K. (1987). Descriptive data analysis: a concept between confirmatory and exploratory data analysis. Methods of information in medicine, 26(02), 77-88.
-
Ali, Z., Bhaskar, S. B., & Sudheesh, K. (2019). Descriptive statistics: Measures of central tendency, dispersion, correlation and regression. Airway, 2(3), 120-125.
-
Blanca, M. J., Arnau, J., López-Montiel, D., Bono, R., & Bendayan, R. (2013). Skewness and kurtosis in real data samples. Methodology, 9(2), 78-84.
-
Kim, H. Y. (2013). Statistical notes for clinical researchers: assessing normal distribution (2) using skewness and kurtosis. Restorative dentistry & endodontics, 38(1), 52-54.
-
Schober, P., Boer, C., & Schwarte, L. A. (2018). Correlation coefficients: appropriate use and interpretation. Anesthesia & analgesia, 126(5), 1763-1768.
-
Jose, B., & Abraham, S. (2017). Exploring the merits of nosql: A study based on mongodb. 2017 International Conference on Networks & Advances in Computational Technologies (NetACT) 20-22 July 2017, Thiruvananthapuram, India, 266-271.
-
Türkiye Technohub Platformu. (2024). Türkiye Technohub Platformu. Ankara, https://turkiyetechnohub.org/, (28.08.2024).
-
Requests: HTTP for Humans. https://requests.readthedocs.io/en/latest/ (05.09.2024)
-
Fowler, M., Rice, D., Foemmel, M., Hieatt, E., Mee, R., & Stafford, R. (2003). Data transfer object. Patterns of Enterprise Application Architecture. Addison Wesley, 347-356.
-
Rathore, M., & Bagui, S. S. (2024). MongoDB: Meeting the Dynamic Needs of Modern Applications. Encyclopedia, 4(4), 1433-1453.
-
Abadi, D., Ailamaki, A., Andersen, D., Bailis, P., Balazinska, M., Bernstein, P. A., ... & Suciu, D. (2022). The Seattle report on database research. Communications of the ACM, 65(8), 72-79.
-
Jupyter Team. Jupyter Notebook Documentation. https://jupyter-notebook.readthedocs.io/en/latest/ (08.09.2024)
-
Pandas Development Team. Pandas Documentation. https://pandas.pydata.org/docs/ (16.09.2024)
-
Matplotlib Development Team. Matplotlib Documentation. https://matplotlib.org/ (16.09.2024)
Building An Integrated Database For Turkish Startups: A Systematic And Novel Framework
Year 2025,
Volume: 12 Issue: 2, 398 - 420, 30.11.2025
İsmail Ozan Çelikel
,
Eda Bahar
,
Günce Keziban Orman
,
Sultan Nezihe Turhan
Abstract
In recent years, the Turkish startup ecosystem has grown significantly thanks to the increase in government support, diversification of private investments, the spread of startup culture worldwide, and technological developments. Despite the rapidly increasing numbers, there is no up-to-date, comprehensive, and analytically serviceable database for new entrepreneurial firms in different sectors. This study creates an integrated and centralized database for startups in Türkiye by using a hybrid methodology that combines traditional ETL processes with modern data engineering techniques. All data belonging to the companies were obtained from public databases and national techno-hub pools via the web scrape method and stored in a database on MongoDB, a document-based NoSQL database. While data preprocessing provided consistency, integrity, and structural integrity, exploratory data analysis revealed critical insights into the geographical distribution, fields of activity, and workforce metrics of the startup ecosystem in Türkiye. The findings obtained at the end of the study provide very valuable information to stakeholders, including researchers, policymakers, and firms operating in different sectors. The data pipeline construction methodology introduced in the study, characterized by its scalability and adaptability, also serves as a replicable framework for data engineering projects in other fields. Future research can further enhance its analytical capacity by enriching this dataset with financial metrics and sectoral impacts.
Ethical Statement
It is declared that scientific and ethical principles were followed during the preparation of this study and that all studies used are stated in the bibliography.
Supporting Institution
The author(s) acknowledge that they received no external funding to support this research.
Project Number
FBA-2023-1211
Thanks
This study was conducted as part of a research project entitled "Development of Artificial Intelligence Supported, Expertise-Focused, Academic Advisor and Startup Recommendation Portal in R&D Project Management" (Project Number 5240007), which is supported by TÜBİTAK 1505 University's Industry Collaboration Support Program and the GSU Scientific Research Projects under project number FBA-2023-1211
References
-
Startups Watch. (2024). Startups Watch. İstanbul, https://startups.watch/, (24.10.2024).
-
Startup Market. (2024). Startup Market. İstanbul, https://startupmarket.co/, (31.10.2024).
-
Buyukbalci, P., Sanguineti, F., & Sacco, F. (2024). Rejuvenating business models via startup collaborations: Evidence from the Turkish context. Journal of Business Research, 174(8), 114521.
-
Sakarya, Ş., & İlkdoğan, S. (2023). Türkiye’de Startup Yatırımları ve Finansmanı. Bucak İşletme Fakültesi Dergisi, 6(2), 146-171.
-
Eroglu, Y., & Rashid, L. (2022). The impact of perceived support and barriers on the sustainable orientation of Turkish startups. Sustainability, 14(8), 4666.
-
Birden, M., & Bastug, M. (2020). The Impact of Incubators on Entrepreneurial Process in Turkey: A guide for Startups. Journal of Business Economics and Finance, 9(2), 132-142.
-
Knight, A., Greer, L. L., & De Jong, B. (2020). Start-up teams: a multidimensional conceptualization, integrative review of past research, and future research agenda. Acad. Manag. Ann., 14(1), 231–266.
-
Startup Genome LLC (2022). The global Startup ecosystem report GSER 2024. San Fransisco, https://startupgenome.com/article/global-startup-ecosystem-ranking-2024-top-40, (18.11.2024).
-
Mandel, M. (2017). How the Startup Economy is Spreading Across the Country. And How It Can Be Accelerated. Washington, Progressive Policy Institute, https://www.progressivepolicy.org/how-the-startup-economy-is-spreading-across-the-country/, (18.11.2024)
-
Basole, R. C., Russell, M. G., Huhtamäki, J., Rubens, N., Still, K., & Park, H. (2015). Understanding business ecosystem dynamics: A data-driven approach. ACM Transactions on Management Information Systems (TMIS), 6(2), 1-32.
Ziakis, C., Vlachopoulou, M., & Petridis, K. (2022). Start-up ecosystem (StUpEco): A conceptual framework and empirical research. Journal of Open Innovation: Technology, Market, and Complexity, 8(1), 35.
-
Jáki, E., Molnár, E. M., & Kádár, B. (2019). Characteristics and challenges of the Hungarian startup ecosystem. Vezetéstudomány-Budapest Management Review, 50(5), 2-12.
-
Türkiye Technohub Platformu. (2024). Türkiye Technohub Platformu. Ankara, https://turkiyetechnohub.org, (05.09.2024)
-
Massimino, B. (2016). Accessing online data: Web‐crawling and information‐scraping techniques to automate the assembly of research data. Journal of Business Logistics, 37(1), 34-42.
-
Mancosu, M., & Vegetti, F. (2020). What you can scrape and what is right to scrape: A proposal for a tool to collect public Facebook data. Social Media+ Society, 6(3), 2056305120940703.
-
Krotov, V., Johnson, L., & Silva, L. (2020). Tutorial: Legality and Ethics of Web Scraping. Communications of the Association for Information Systems, 47, 539-563.
-
Khder, M. A. (2021). Web scraping or web crawling: State of art, techniques, approaches and application. International Journal of Advances in Soft Computing & Its Applications, 13(3), 145-168.
-
A. W. Sudrajat, Ermatita & Samsuryadi (2023), Extending The Data Integration Model As The Foundation Of Business Intelligence: A Systematic Literature Review. 2023 10th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), 20-21 September 2023, 175-182.
-
Agrawal, P. (2023). Web Scraping and its Applications. International Journal of Scientific Research in Engineering And Management, 7(10), 1-11.
-
Luscombe, A., Dick, K., & Walby, K. (2021). Algorithmic thinking in the public interest: navigating technical, legal, and ethical hurdles to web scraping in the social sciences. Quality & Quantity, 56, 1023-1044.
-
Goulas, S., & Karamitros, G. (2024). How to harness the power of web scraping for medical and surgical research: An application in estimating international collaboration. World journal of surgery, 48(6),1297-1300.
-
Rodrigues, L. A., & Polepally, S. K. (2021). Creating Financial Database for Education and Research: Using WEB SCRAPING Technique. Master thesis, Dalarna University, School of Technology and Business Studies, Dalarna.
-
Styawati, A. Nurkholis, F. A. Ans, S. Alim, L. Andraini & R. A. Prasetyo. (2023) Web Scraping for Summarization of Freelance Job Website Using Vector Space Model. 2023 IEEE 9th Information Technology International Seminar (ITIS) 18-20 October 2023, 1-5.
-
Barba, G., Lazoi, M., & Lezzi, M. (2024). Bibliometric Insights into Web Scraping and Advanced AI-Based Models for Valuable Business Data. ICEIS, 1, 321-328.
-
Dong, H., Zhang, C., Li, G., & Zhang, H. (2024). Cloud-native databases: A survey. IEEE Transactions on Knowledge and Data Engineering, 36(12), 7772-7791.
-
Zhou, D., Yan, Z., Fu, Y., & Yao, Z. (2018). A survey on network data collection. Journal of Network and Computer Applications, 116, 9-23.
-
Spaniol, M., Denev, D., Mazeika, A., Weikum, G., & Senellart, P. (2009). Data quality in web archiving. In Proceedings of the 3rd Workshop on Information Credibility on the Web 20 April 2009, 19-26.
-
Vording, R. M. (2021). Harvesting unstructured data in heterogenous business environments; exploring modern web scraping technologies. Bachelor's thesis, University of Twente, Enschede.
-
Gandhi, R., Khurana, S. & Manchanda, H. (2023). ETL Data Pipeline to Analyze Scraped Data. Decision Intelligence, Proceedings of the International Conference on Information Technology, InCITe 2023, Volume 1, 379-388.
-
Simitsis, A., Skiadopoulos, S., & Vassiliadis, P. (2023, March). The History, Present, and Future of ETL Technology. Proceedings of the 25th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP) co-located with the 26th International Conference on Extending Database Technology and the 26th International Conference on Database Theory (EDBT/ICDT 2023) 28 March 2023, 3-12.
-
Singhal, B., & Aggarwal, A. (2022, December). ETL, ELT and reverse ETL: a business case study. 2022 Second International Conference on Advanced Technologies in Intelligent Control, Environment, Computing & Communication Engineering (ICATIECE) 16-17 December 2022, 1-4.
-
Raj, A., Bosch, J., Olsson, H. H., & Wang, T. J. (2020, August). Modelling data pipelines. 2020 46th Euromicro conference on software engineering and advanced applications (SEAA) 26-28 August 2020, 13-20.
-
Walha, A., Ghozzi, F., & Gargouri, F. (2024). Data integration from traditional to big data: main features and comparisons of ETL approaches. The Journal of Supercomputing, 80(19), 26687-26725.
-
Nwokeji, J. C., & Matovu, R. (2021). A systematic literature review on big data extraction, transformation and loading (etl). Intelligent Computing: Proceedings of the 2021 Computing Conference 15-16 July 2021, 308-324.
-
Bhatlawande, S., Rajandekar, R., & Shilaskar, S. (2024). Implementing Middleware Architecture for Automated Data Pipeline over Cloud Technologies. 2024 IEEE 13th International Conference on Communication Systems and Network Technologies (CSNT) 06-07 April 2024, 506-513.
-
Hafyani, H., Abboud, M., & Taher, Y. (2021). A Microservices Based Architecture for Implementing and Automating ETL Data Pipelines for Mobile Crowdsensing Applications. 2021 IEEE International Conference on Big Data (Big Data) 15-18 December 2021, Orlando, FL, USA, 5909-5911.
-
Singu, S. K. (2021). Designing scalable data engineering pipelines using Azure and Databricks. ESP Journal of Engineering & Technology Advancements, 1(2), 176-187.
-
Diouf, P. S., Boly, A., & Ndiaye, S. (2018). Variety of data in the ETL processes in the cloud: State of the art. 2018 IEEE International Conference on Innovative Research and Development (ICIRD) 11-12 May 2018, Bangkok, Thailand, 1-5.
-
Loshin, D. (2010). The practitioner's guide to data quality improvement. Elsevier.
-
Lamer, A., Saint-Dizier, C., Paris, N., & Chazard, E. (2024). Data Lake, Data Warehouse, Datamart, and Feature Store: Their Contributions to the Complete Data Reuse Pipeline. JMIR medical informatics, 12, e54590.
-
Makris, A., Tserpes, K., Spiliopoulos, G., Zissis, D., & Anagnostopoulos, D. (2021). MongoDB Vs PostgreSQL: A comparative study on performance aspects. GeoInformatica, 25, 243-268.
-
Khan, W., Kumar, T., Zhang, C., Raj, K., Roy, A. M., & Luo, B. (2023). SQL and NoSQL database software architecture performance analysis and assessments—a systematic literature review. Big Data and Cognitive Computing, 7(2), 97.
-
Ali, A., Naeem, S., Anam, S., & Ahmed, M. M. (2023). A state of art survey for big data processing and nosql database architecture. International Journal of Computing and Digital Systems, 14(1), 1-1.
-
Ambre, A., Gaikwad, P., Pawar, K., & Patil, V. (2019). Web and android application for comparison of e-commerce products. International Journal of Advanced Engineering, Management and Science (IJAEMS), 5(4), 266-268.
-
Rathore, M., & Bagui, S. S. (2024). MongoDB: Meeting the Dynamic Needs of Modern Applications. Encyclopedia, 4(4), 1433-1453.
-
Ereth, J. (2018). DataOps-Towards a Definition. Lernen. Wissen. Daten. Analysen. (LWDA 2018) August 22–24 2018, Mannheim, Germany, 104-112.
-
Bergh, C., Benghiat, G., & Strod, E. (2019). The DataOps cookbook (Second Version). DataKitchen Hqrs. https://www.devopsschool.com/blog/wp-content/uploads/2021/07/DK_dataops_book_2nd_edition.pdf
-
Boegershausen, J., Datta, H., Borah, A., & Stephen, A. T. (2022). Fields of gold: Scraping web data for marketing insights. Journal of Marketing, 86(5), 1-20.
-
Vanden Broucke, Seppe, & Bart Baesens (2018). Practical Web scraping for data science. New York, NY: Apress.
-
Henrys, K. (2021). Importance of web scraping in e-commerce and e-marketing. SSRN Electron. Journal.
-
Barbera, G., Araújo, L.F., & Fernandes, S.C. (2023). The Value of Web Data Scraping: An Application to TripAdvisor. Big Data Cogn. Comput., 7, 121.
-
Ticu, C. C. (2021). The Austrian start-up incubator ecosystem: A web scraping, AWS ML & text analytics competitor analysis on digital content. Master’s thesis, Central European University, Department of Economics and Business, Vienna.
-
Vassiliadis, P., Simitsis, A., & Baikousi, E. (2009). A taxonomy of ETL activities. DOLAP '09: Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP 2009, 25-32.
-
Richardson, L. (2024). Beautiful Soup Documentation. Cambridge, https://www.crummy.com/software/BeautifulSoup/bs4/doc/#, (05.09.2024).
-
Google Developers. (2024). Puppeteer Documentation. Mountain View, https://pptr.dev/guides/what-is-puppeteer, (03.09.2024).
-
Ali, S., & Wrembel, R. (2017). From conceptual design to performance optimization of ETL workflows: current state of research and open problems. The VLDB Journal, 26, 777 - 801.
-
Rajić, M.N., Milosavljević, P., & Kostić, Z. (2023). Knowledge and Data Management: The Cornerstone of Effective Organizational Strategy. 2023 International Conference on Big Data, Knowledge and Control Systems Engineering (BdKCSE) 2-3 November 2023, Sofia, Bulgaria, 1-7.
-
Shrestha, L., & Sheikh, N. (2022). Multiperspective Assessment of Enterprise Data Storage Systems: Literature Review. 2022 Portland International Conference on Management of Engineering and Technology (PICMET) 07-11 August 2022, Portland, OR, USA, 1-8.
-
Nambiar, A., & Mundra, D. (2022). An overview of data warehouse and data lake in modern enterprise data management. Big data and cognitive computing, 6(4), 132.
-
Khine, P. P., & Wang, Z. S. (2018). Data lake: a new ideology in big data era. ITM Web of Conferences May 4-6 2018, Girne, Turkey, 03025.
-
Nurhadi, Kadir, R. B. A., & Surin, E. S. B. M. (2021). Evaluation of NoSQL Databases Features and Capabilities for Smart City Data Lake Management. In Information Science and Applications: Proceedings of ICISA 2020, Singapore 16-18 December, Bali, Indonesia, 383-392.
-
Koutroumanis, N., Kousathanas, N., Doulkeridis, C., & Vlachou, A. (2021). Declarative Querying of Heterogeneous NoSQL Stores. SEA-Data@ VLDB 20 August 2021, Copenhagen, Denmark, 42-43.
-
Niu, J., Xu, J., & Xie, L. (2018). Hybrid Storage Systems: A Survey of Architectures and Algorithms. IEEE Access, 6, 13385-13406.
-
Davoudian, A., Chen, L., & Liu, M. (2018). A Survey on NoSQL Stores. ACM Computing Surveys (CSUR), 51, 1-43.
-
Membrey P, Plugge E, Hawkins T & Hawkins D. (2010). The definitive guide to mongoDB: the noSQL database for cloud and desktop computing. Springer, Berlin.
-
Rathore, M., & Bagui, S. S. (2024). MongoDB: Meeting the Dynamic Needs of Modern Applications. Encyclopedia, 4(4), 1433-1453.
-
Alghamdi, T. A., & Javaid, N. (2022). A survey of preprocessing methods used for analysis of big data originated from smart grids. IEEE Access, 10, 29149-29171.
-
Famili, A., Shen, W. M., Weber, R., & Simoudis, E. (1997). Data preprocessing and intelligent data analysis. Intelligent data analysis, 1(1), 3-23.
-
Maharana, K., Mondal, S., & Nemade, B. (2022). A review: Data pre-processing and data augmentation techniques. Global Transitions Proceedings, 3(1), 91-99.
-
Zhu, X., Wu, X., & Chen, Q. (2003). Eliminating class noise in large datasets. Proceedings of the Twentieth International Conference on International Conference on Machine Learning August 21-24 2003, Washington, DC, USA, 920-927.
-
Reddy, G. T., Reddy, M. P. K., Lakshmanna, K., Kaluri, R., Rajput, D. S., Srivastava, G., & Baker, T. (2020). Analysis of dimensionality reduction techniques on big data. IEEE Access, 8, 54776-54788.
-
Katya, E (2023). Exploring Feature Engineering Strategies for Improving Predictive Models in Data Science. Research Journal of Computer Systems and Engineering, 4(2), 201–215.
-
Abt, K. (1987). Descriptive data analysis: a concept between confirmatory and exploratory data analysis. Methods of information in medicine, 26(02), 77-88.
-
Ali, Z., Bhaskar, S. B., & Sudheesh, K. (2019). Descriptive statistics: Measures of central tendency, dispersion, correlation and regression. Airway, 2(3), 120-125.
-
Blanca, M. J., Arnau, J., López-Montiel, D., Bono, R., & Bendayan, R. (2013). Skewness and kurtosis in real data samples. Methodology, 9(2), 78-84.
-
Kim, H. Y. (2013). Statistical notes for clinical researchers: assessing normal distribution (2) using skewness and kurtosis. Restorative dentistry & endodontics, 38(1), 52-54.
-
Schober, P., Boer, C., & Schwarte, L. A. (2018). Correlation coefficients: appropriate use and interpretation. Anesthesia & analgesia, 126(5), 1763-1768.
-
Jose, B., & Abraham, S. (2017). Exploring the merits of nosql: A study based on mongodb. 2017 International Conference on Networks & Advances in Computational Technologies (NetACT) 20-22 July 2017, Thiruvananthapuram, India, 266-271.
-
Türkiye Technohub Platformu. (2024). Türkiye Technohub Platformu. Ankara, https://turkiyetechnohub.org/, (28.08.2024).
-
Requests: HTTP for Humans. https://requests.readthedocs.io/en/latest/ (05.09.2024)
-
Fowler, M., Rice, D., Foemmel, M., Hieatt, E., Mee, R., & Stafford, R. (2003). Data transfer object. Patterns of Enterprise Application Architecture. Addison Wesley, 347-356.
-
Rathore, M., & Bagui, S. S. (2024). MongoDB: Meeting the Dynamic Needs of Modern Applications. Encyclopedia, 4(4), 1433-1453.
-
Abadi, D., Ailamaki, A., Andersen, D., Bailis, P., Balazinska, M., Bernstein, P. A., ... & Suciu, D. (2022). The Seattle report on database research. Communications of the ACM, 65(8), 72-79.
-
Jupyter Team. Jupyter Notebook Documentation. https://jupyter-notebook.readthedocs.io/en/latest/ (08.09.2024)
-
Pandas Development Team. Pandas Documentation. https://pandas.pydata.org/docs/ (16.09.2024)
-
Matplotlib Development Team. Matplotlib Documentation. https://matplotlib.org/ (16.09.2024)