Word Frequency: New York Times Throughout the Times

Mehmet Aşıroğlu; Emre Olca

Research Article

Year 2024, Volume: 8 Issue: 2, 163 - 170, 22.12.2024

Mehmet Aşıroğlu Emre Olca

Abstract

References

[1] Wagner, Richard K., et al. "Modeling the development of written language." Reading and writing 24 (2011): 203-220.
[2] Leech, Geoffrey, and Nicholas Smith. "Change and constancy in linguistic change: How grammatical usage in written English evolved in the period 1931-1991." Corpus Linguistics. Brill, 2009.
[3] Zhang, Guoshuai, et al. "Learning to predict US policy change using New York Times corpus with pre-trained language model." Multimedia Tools and Applications 79 (2020): 34227-34240.
[4] Jatowt, Adam, and Kevin Duh. "A framework for analyzing semantic change of words across time." IEEE/ACM joint conference on digital libraries. IEEE, 2014.
[5] Shapiro, Adam Hale, Moritz Sudhof, and Daniel J. Wilson. "Measuring news sentiment." Journal of econometrics 228.2 (2022): 221-243.
[6] Trust, Paul, Ahmed Zahran, and Rosane Minghim. "Understanding the influence of news on society decision making: application to economic policy uncertainty." Neural Computing and Applications 35.20 (2023): 14929-14945.
[7] https://www.kaggle.com/datasets/tumanovalexander/nyt-articlesdata?resource=download Accessed 20 July 2024.
[8] Yun-tao, Zhang, Gong Ling, and Wang Yong-cheng. "An improved TF-IDF approach for text classification." Journal of Zhejiang University-Science A 6.1 (2005): 49-55.
[9] Sabharwal, Navin, et al. "Bert algorithms explained." Hands-on Question Answering Systems with BERT: Applications in Neural Networks and Natural Language Processing (2021): 65-95.
[10] https://scikitlearn.org/stable/modules/generated/sklearn.svm.LinearSVC.html Accessed 6 Aug. 2024.
[11] Ohsaki, Miho, et al. "Confusion-matrix-based kernel logistic regression for imbalanced data classification." IEEE Transactions on Knowledge and Data Engineering 29.9 (2017): 1806-1819.
[12] Nadkarni, Prakash M., Lucila Ohno-Machado, and Wendy W. Chapman. "Natural language processing: an introduction." Journal of the American Medical Informatics Association 18.5 (2011): 544-551.
[13] Rao, Prahalad K., et al. "Process-machine interaction (PMI) modeling and monitoring of chemical mechanical planarization (CMP) process using wireless vibration sensors." IEEE Transactions on Semiconductor Manufacturing 27.1 (2013): 1-15.
[14] Yu, Jian, et al. "Economic policy uncertainty (EPU) and firm carbon emissions: evidence using a China provincial EPU index." Energy economics 94 (2021): 105071.
[15] Taylor, Joshua A., and Johanna L. Mathieu. "Index policies for demand response." IEEE Transactions on Power Systems 29.3 (2013): 1287-1295.
[16] Ramasamy, Ravindran, and Soroush Karimi Abar. "Influence of macroeconomic variables on exchange rates." Journal of economics, Business and Management 3.2 (2015): 276-281.
[17] Turki, Turki, and Sanjiban Sekhar Roy. "Novel hate speech detection using word cloud visualization and ensemble learning coupled with count vectorizer." Applied Sciences 12.13 (2022): 6611.
[18] Wadud, Md Anwar Hussen, M. F. Mridha, and Mohammad Motiur Rahman. "Word embedding methods for word representation in deep learning for natural language processing." Iraqi Journal of Science (2022): 1349-1361.
[19] Canty, Morton John. Image analysis, classification and change detection in remote sensing: with algorithms for Python. Crc Press, 2019.
[20] Li, Hongjian, et al. "The impact of protein structure and sequence similarity on the accuracy of machine-learning scoring functions for binding affinity prediction." Biomolecules 8.1 (2018): 12.
[21] Chen, Yuanyuan, Xuan Wang, and Xiaohui Du. "Diagnostic evaluation model of English learning based on machine learning." Journal of Intelligent & Fuzzy Systems 40.2 (2021): 2169-2179.
[22] Qi, Shi, et al. "An English teaching quality evaluation model based on Gaussian process machine learning." Expert Systems 39.6 (2022): e12861.
[23] Chang, Hui‐Tzu, and Chia‐Yu Lin. "Improving student learning performance in machine learning curricula: A comparative study of online problem‐solving competitions in Chinese and English‐medium instruction settings." Journal of Computer Assisted Learning (2024).
[24] Georgiou, Georgios P. "Comparison of the prediction accuracy of machine learning algorithms in crosslinguistic vowel classification." Scientific Reports 13.1 (2023): 15594.

Word Frequency: New York Times Throughout the Times

Year 2024, Volume: 8 Issue: 2, 163 - 170, 22.12.2024

Mehmet Aşıroğlu Emre Olca

Abstract

This project investigates the evolution of the English language over the past century through a machine learning model trained on leading articles from The New York Times spanning from 1920 to 2020. The primary aim is to predict the year in which a given sentence could have been written based on linguistic patterns, including word usage and sentence structure. By analyzing these patterns, the model provides insights into the changing styles and trends in written English over time. The model's predictions are grounded in extensive data analysis and machine learning techniques, ensuring a high degree of accuracy. This study not only highlights the dynamic nature of language but also demonstrates the application of computational methods in linguistic research. The findings of this research are significant for historical linguistics and literature studies, as they provide a quantifiable method to track linguistic changes. Additionally, this work can aid in the development of tools for temporal text classification, benefiting fields such as digital humanities and archival studies. Understanding how language evolves is crucial for preserving cultural heritage and improving communication strategies in various media.

Keywords

language evolution, machine learning, historical linguistics, text analysis, computational linguistics

References

[1] Wagner, Richard K., et al. "Modeling the development of written language." Reading and writing 24 (2011): 203-220.
[2] Leech, Geoffrey, and Nicholas Smith. "Change and constancy in linguistic change: How grammatical usage in written English evolved in the period 1931-1991." Corpus Linguistics. Brill, 2009.
[3] Zhang, Guoshuai, et al. "Learning to predict US policy change using New York Times corpus with pre-trained language model." Multimedia Tools and Applications 79 (2020): 34227-34240.
[4] Jatowt, Adam, and Kevin Duh. "A framework for analyzing semantic change of words across time." IEEE/ACM joint conference on digital libraries. IEEE, 2014.
[5] Shapiro, Adam Hale, Moritz Sudhof, and Daniel J. Wilson. "Measuring news sentiment." Journal of econometrics 228.2 (2022): 221-243.
[6] Trust, Paul, Ahmed Zahran, and Rosane Minghim. "Understanding the influence of news on society decision making: application to economic policy uncertainty." Neural Computing and Applications 35.20 (2023): 14929-14945.
[7] https://www.kaggle.com/datasets/tumanovalexander/nyt-articlesdata?resource=download Accessed 20 July 2024.
[8] Yun-tao, Zhang, Gong Ling, and Wang Yong-cheng. "An improved TF-IDF approach for text classification." Journal of Zhejiang University-Science A 6.1 (2005): 49-55.
[9] Sabharwal, Navin, et al. "Bert algorithms explained." Hands-on Question Answering Systems with BERT: Applications in Neural Networks and Natural Language Processing (2021): 65-95.
[10] https://scikitlearn.org/stable/modules/generated/sklearn.svm.LinearSVC.html Accessed 6 Aug. 2024.
[11] Ohsaki, Miho, et al. "Confusion-matrix-based kernel logistic regression for imbalanced data classification." IEEE Transactions on Knowledge and Data Engineering 29.9 (2017): 1806-1819.
[12] Nadkarni, Prakash M., Lucila Ohno-Machado, and Wendy W. Chapman. "Natural language processing: an introduction." Journal of the American Medical Informatics Association 18.5 (2011): 544-551.
[13] Rao, Prahalad K., et al. "Process-machine interaction (PMI) modeling and monitoring of chemical mechanical planarization (CMP) process using wireless vibration sensors." IEEE Transactions on Semiconductor Manufacturing 27.1 (2013): 1-15.
[14] Yu, Jian, et al. "Economic policy uncertainty (EPU) and firm carbon emissions: evidence using a China provincial EPU index." Energy economics 94 (2021): 105071.
[15] Taylor, Joshua A., and Johanna L. Mathieu. "Index policies for demand response." IEEE Transactions on Power Systems 29.3 (2013): 1287-1295.
[16] Ramasamy, Ravindran, and Soroush Karimi Abar. "Influence of macroeconomic variables on exchange rates." Journal of economics, Business and Management 3.2 (2015): 276-281.
[17] Turki, Turki, and Sanjiban Sekhar Roy. "Novel hate speech detection using word cloud visualization and ensemble learning coupled with count vectorizer." Applied Sciences 12.13 (2022): 6611.
[18] Wadud, Md Anwar Hussen, M. F. Mridha, and Mohammad Motiur Rahman. "Word embedding methods for word representation in deep learning for natural language processing." Iraqi Journal of Science (2022): 1349-1361.
[19] Canty, Morton John. Image analysis, classification and change detection in remote sensing: with algorithms for Python. Crc Press, 2019.
[20] Li, Hongjian, et al. "The impact of protein structure and sequence similarity on the accuracy of machine-learning scoring functions for binding affinity prediction." Biomolecules 8.1 (2018): 12.
[21] Chen, Yuanyuan, Xuan Wang, and Xiaohui Du. "Diagnostic evaluation model of English learning based on machine learning." Journal of Intelligent & Fuzzy Systems 40.2 (2021): 2169-2179.
[22] Qi, Shi, et al. "An English teaching quality evaluation model based on Gaussian process machine learning." Expert Systems 39.6 (2022): e12861.
[23] Chang, Hui‐Tzu, and Chia‐Yu Lin. "Improving student learning performance in machine learning curricula: A comparative study of online problem‐solving competitions in Chinese and English‐medium instruction settings." Journal of Computer Assisted Learning (2024).
[24] Georgiou, Georgios P. "Comparison of the prediction accuracy of machine learning algorithms in crosslinguistic vowel classification." Scientific Reports 13.1 (2023): 15594.

There are 24 citations in total.

Details

Primary Language	English
Subjects	Data Mining and Knowledge Discovery, Artificial Intelligence (Other)
Journal Section	Articles
Authors	Mehmet Aşıroğlu This is me 0009-0006-1883-2245 Emre Olca 0000-0001-6812-5166
Early Pub Date	December 22, 2024
Publication Date	December 22, 2024
Submission Date	November 8, 2024
Acceptance Date	December 21, 2024
Published in Issue	Year 2024 Volume: 8 Issue: 2

Cite

IEEE	M. Aşıroğlu and E. Olca, “Word Frequency: New York Times Throughout the Times”, IJMSIT, vol. 8, no. 2, pp. 163–170, 2024.

Download Cover Image

Article Files

Full Text