Review
BibTex RIS Cite

Artificial Intelligence Breakthroughs and Data Futures: A Retrospective and Prospective Review

Year 2026, Volume: 14 Issue: 1, 1 - 16, 31.01.2026
https://doi.org/10.21541/apjess.1705042

Abstract

This paper presents a comprehensive synthesis of major breakthroughs in artificial intelligence (AI) over the past fifteen years, integrating historical, theoretical, and technological perspectives. It identifies key inflection points in AI’s evolution by tracing the convergence of computational resources, data access, and algorithmic innovation. The analysis highlights how researchers enabled GPU-based model training, triggered a data-centric shift with ImageNet, simplified architectures through the Transformer, and expanded modeling capabilities with the GPT series. Rather than treating these advances as isolated milestones, the paper frames them as indicators of deeper paradigm shifts. By applying concepts from statistical learning theory such as sample complexity and data efficiency, the paper explains how researchers translated breakthroughs into scalable solutions and why the field must now embrace data-centric approaches. In response to rising privacy concerns and tightening regulations, the paper evaluates emerging solutions like federated learning, privacy-enhancing technologies (PETs), and the data site paradigm, which reframe data access and security. In cases where real-world data remains inaccessible, the paper also assesses the utility and constraints of mock and synthetic data generation. By aligning technical insights with evolving data infrastructure, this study offers strategic guidance for future AI research and policy development.

References

  • Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260. https://doi.org/10.1126/science.aaa8415
  • Narayanan, D., Shoeybi, M., Casper, J., LeGresley, P., Patwary, M., Korthikanti, V., Vainbrand, D., Kashinkunti, P., Bernauer, J., Catanzaro, B., Phanishayee, A., Zaharia, M., & Kazhamiaka, F. (2021). Efficient Large-Scale Language Model Training on GPU Clusters. ArXiv, abs/2104.04473.
  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
  • LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539
  • Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.
  • Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., ... & Hassabis, D. (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419), 1140-1144.Brynjolfsson, E., & McAfee, A. (2017). Machine, platform, crowd: Harnessing our digital future. W. W. Norton & Company.
  • Russell, S., & Norvig, P. (2020). Artificial Intelligence: A Modern Approach (4th ed.). Pearson.
  • Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge University Press.
  • Veale, M., & Binns, R. (2017). Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data. Big Data & Society, 4(2),1–17. https://doi.org/10.1177/2053951717743530
  • Kairouz, P., McMahan, H. B., et al. (2021). Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2), 1–210. https://doi.org/10.1561/2200000083
  • Liu, Y., Yu, T., Vaidya, J., et al. (2021). A survey of privacy-preserving federated learning: Techniques, challenges, and future directions. IEEE Transactions on Knowledge and Data Engineering, 1–1. https://doi.org/10.1145/3460427
  • Floridi, L., & Cowls, J. (2022). A unified framework of five principles for AI in society. Machine learning and the city: Applications in architecture and urban design, 535-545.
  • Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE transactions on neural networks, 10(5), 988-999.
  • Deng, L. (2012). The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE signal processing magazine, 29(6), 141-142.
  • Noah Golowich, Alexander Rakhlin, Ohad Shamir, Size-independent sample complexity of neural networks, Information and Inference: A Journal of the IMA, Volume 9, Issue 2, June 2020, Pages 473–504, https://doi.org/10.1093/imaiai/iaz007.
  • Wager, S., Wang, S., & Liang, P. S. (2013). Dropout training as adaptive regularization. Advances in neural information processing systems, 26.
  • Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2017). Understanding deep learning requires rethinking generalization. International Conference on Learning Representations (ICLR).
  • Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., ... & Dennison, D. (2015). Hidden technical debt in machine learning systems. Advances in Neural Information Processing Systems, 28.
  • Vapnik, V. (2013). The nature of statistical learning theory. Springer science & business media.
  • Abu-Mostafa, Y. S., Magdon-Ismail, M., & Lin, H.-T. (2012). Learning from Data: A Short Course. AMLBook
  • Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., ... & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744.
  • C. I. R. Koksal, N. M. Cicek, A. Y. Metin and B. Ors, "Optimizing Data Availability and Utilization in Deep Learning Accelerator SoCs," 2023 30th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Istanbul, Turkiye, 2023, pp. 1-4, doi: 10.1109/ICECS58634.2023.10382797.
  • Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L. M., Rothchild, D., ... & Dean, J. (2021). Carbon emissions and large neural network training. arXiv preprint arXiv:2104.10350.
  • Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.
  • Koller, D., & Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press.
  • Wainwright, M. J., & Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2), 1–305.
  • Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.
  • LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. https://doi.org/10.1109/5.726791
  • Raina, R., Madhavan, A., & Ng, A. Y. (2009, June). Large-scale deep unsupervised learning using graphics processors. In Proceedings of the 26th annual international conference on machine learning (pp. 873-880).
  • Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (pp. 248–255). https://doi.org/10.1109/CVPR.2009.5206848
  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NeurIPS) 25, 1097–1105. https://papers.nips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781. https://arxiv.org/abs/1301.3781
  • Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236
  • Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. nature, 529(7587), 484-489. https://doi.org/10.1038/nature16961
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS) 30, 5998–6008. https://arxiv.org/abs/1706.03762
  • Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving Language Understanding by Generative Pre-Training. OpenAI. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
  • Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. OpenAI. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
  • Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P. & Amodei, D. (2020). Language models are few-shot learners. In Advances in Neural Information Processing Systems (NeurIPS) 33, 1877–1901. https://arxiv.org/abs/2005.14165
  • Arute, F., Arya, K., Babbush, R., Bacon, D., Bardin, J. C., Barends, R., ... & Martinis, J. M. (2019). Quantum supremacy using a programmable superconducting processor. Nature, 574(7779), 505–510. https://doi.org/10.1038/s41586-019-1666-5
  • Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., ... & Zhao, S. (2019). Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977. https://doi.org/10.48550/arXiv.1912.04977
  • McSharry PE, Clifford GD, Tarassenko L, Smith L. A dynamical model for generating synthetic electrocardiogram signals. IEEE Transactions on Biomedical Engineering 50(3): 289-294; March 2003.
  • Adler, S., Hitzig, Z., Jain, S., Brewer, C., Chang, W., DiResta, R., ... & Zick, T. (2024). Personhood credentials: Artificial intelligence and the value of privacy-preserving tools to distinguish who is real online. arXiv preprint arXiv:2408.07892.
  • OpenMined Foundation. (2024, November 28). Datasite server documentation. OpenMined. https://docs.openmined.org/en/latest/components/datasite-server.html
  • OpenMined. (2025, February). PySyft (Version 0.9.5) https://github.com/OpenMined/PySyft
  • Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021, March). On the dangers of stochastic parrots: Can language models be too big?. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610-623).
  • Silver, D., Schrittwieser, J., et al. (2017). Mastering the game of Go without human knowledge. Nature, 550(7676), 354–359.
  • Zhang, C., et al. (2020). AI Development in China: Trends and Opportunities. Nature Machine Intelligence, 2(2), 59–62.
  • AI4EU Project. (2022). European Commission. https://www.ai4eu.eu
  • Narayanan, A., et al. (2019). India Stack: Public digital infrastructure for financial inclusion. Digital Public Goods Alliance Report.
  • Kaplan, J., et al. (2020). Scaling Laws for Neural Language Models. arXiv:2001.08361.
  • El Emam K, Mosquera L, Bass J Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation J Med Internet Res 2020;22(11):e23139 DOI: 10.2196/23139
  • Goncalves, A., Ray, P., Soper, B. et al. Generation and evaluation of synthetic patient data. BMC Med Res Methodol 20, 108 (2020). https://doi.org/10.1186/s12874-020-00977-1
  • Kaabachi, B., Despraz, J., Meurers, T. et al. A scoping review of privacy and utility metrics in medical synthetic data. npj Digit. Med. 8, 60 (2025). https://doi.org/10.1038/s41746-024-01359-3
  • Mestari, S., Lenzini, G., & Demirci, H. (2023). Preserving data privacy in machine learning systems. Comput. Secur., 137, 103605. https://doi.org/10.1016/j.cose.2023.103605.
  • Kakarala, M., & Rongali, S. (2025). Data Privacy and Security in AI. World Journal of Advanced Research and Reviews.
  • Lami, B., Hussein, S., Rajamanickam, R., & Emmanuel, G. (2024). The role of artificial intelligence (AI) in shaping data privacy. International Journal of Law and Management. https://doi.org/10.1108/ijlma-07-2024-0242
  • Gupta, A., Amarnani, M., Soanki, S., & Kishore, J. (2025, February). AI and Data Privacy in Business. In 2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT) (pp. 109-114). IEEE.
  • Ijaiya, H. (2024). Harnessing AI for data privacy: Examining risks, opportunities and strategic future directions. Int. J. Sci. Res. Arch, 13, 2878-2892.
  • Zhu, T., Ye, D., Wang, W., Zhou, W., & Yu, P. (2020). More Than Privacy: Applying Differential Privacy in Key Areas of Artificial Intelligence. IEEE Transactions on Knowledge and Data Engineering, 34, 2824-2843. https://doi.org/10.1109/TKDE.2020.3014246.
  • Yu, S., Carroll, F., & Bentley, B. (2024). Insights Into Privacy Protection Research in AI. IEEE Access, 12, 41704-41726. https://doi.org/10.1109/ACCESS.2024.3378126.
  • Yin, X., Zhu, Y., & Hu, J. (2021). A Comprehensive Survey of Privacy-preserving Federated Learning. ACM Computing Surveys (CSUR), 54, 1 - 36. https://doi.org/10.1145/3460427
  • Li, Q., Wen, Z., Wu, Z., & He, B. (2019). A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection. IEEE Transactions on Knowledge and Data Engineering, 35, 3347-3366. https://doi.org/10.1109/TKDE.2021.3124599
  • Yin, L., Feng, J., Xun, H., Sun, Z., & Cheng, X. (2021). A Privacy-Preserving Federated Learning for Multiparty Data Sharing in Social IoTs. IEEE Transactions on Network Science and Engineering, 8, 2706-2718. https://doi.org/10.1109/TNSE.2021.3074185
  • Zhang, C., Xie, Y., Bai, H., Yu, B., Li, W., & Gao, Y. (2021). A survey on federated learning. Knowl. Based Syst., 216, 106775. https://doi.org/10.1016/j.knosys.2021.106775
  • Mothukuri, V., Parizi, R., Pouriyeh, S., Huang, Y., Dehghantanha, A., & Srivastava, G. (2021). A survey on security and privacy of federated learning. Future Gener. Comput. Syst., 115, 619-640. https://doi.org/10.1016/j.future.2020.10.007
  • Bonawitz, K., Kairouz, P., McMahan, H., & Ramage, D. (2021). Federated Learning and Privacy. Queue, 19, 87 - 114. https://doi.org/10.1145/3494834.3500240
  • Li, J., Ding, M., Yang, H., Shu, F., Quek, T., & Poor, H. (2019). On Safeguarding Privacy and Security in the Framework of Federated Learning. IEEE Network, 34, 242-248. https://doi.org/10.1109/MNET.001.1900506
  • Lu, Y., Huang, X., Dai, Y., Maharjan, S., & Zhang, Y. (2020). Blockchain and Federated Learning for Privacy-Preserved Data Sharing in Industrial IoT. IEEE Transactions on Industrial Informatics, 16, 4177-4186. https://doi.org/10.1109/TII.2019.2942190
  • Gosselin, R., Vieu, L., Loukil, F., & Benoît, A. (2022). Privacy and Security in Federated Learning: A Survey. Applied Sciences.
  • Razi, Q., Piyush, R., Chakrabarti, A., Singh, A., Hassija, V., & Chalapathi, G. (2025). Enhancing Data Privacy: A Comprehensive Survey of Privacy-Enabling Technologies. IEEE Access, 13, 40354-40385. https://doi.org/10.1109/ACCESS.2025.3546618
  • Donald, O., Ajala, O., Arinze, C., Ofodile, O., Okoye, C., & Daraojimba, O. (2024). Reviewing advancements in privacy-enhancing technologies for big data analytics in an era of increased surveillance. World Journal of Advanced Engineering Technology and Sciences.
  • Cho, H., Froelicher, D., Dokmai, N., Nandi, A., Sadhuka, S., Hong, M., & Berger, B. (2024). Privacy-Enhancing Technologies in Biomedical Data Science. Annual review of biomedical data science, 7 1, 317-343. https://doi.org/10.1146/annurev-biodatasci-120423-120107
  • Soykan, E., Karaçay, L., Karakoç, F., & Tomur, E. (2022). A Survey and Guideline on Privacy Enhancing Technologies for Collaborative Machine Learning. IEEE Access, 10, 97495-97519. https://doi.org/10.1109/ACCESS.2022.3204037
  • Jordan, S., Fontaine, C., & Hendricks-Sturrup, R. (2022). Selecting Privacy-Enhancing Technologies for Managing Health Data Use. Frontiers in Public Health, 10. https://doi.org/10.3389/fpubh.2022.814163
  • Chumburidze, Y., & Kakorina, O. (2023). Studying of Privacy-Enhancing Technologies. NBI Technologies. https://doi.org/10.15688/nbit.jvolsu.2023.3.5
  • Berger, B., & Cho, H. (2019). Emerging technologies towards enhancing privacy in genomic data sharing. Genome Biology, 20. https://doi.org/10.1186/s13059-019-1741-0
  • Seamons, K. (2022). Chapter 8 Privacy-Enhancing Technologies.
  • Klymenko, A., Meisenbacher, S., Messmer, F., & Matthes, F. (2023). Privacy-Enhancing Technologies in the Process of Data Privacy Compliance: An Educational Perspective. , 62-69.
  • Goldberg, I., Wagner, D., & Brewer, E. (1997). Privacy-enhancing technologies for the Internet. Proceedings IEEE COMPCON 97. Digest of Papers, 103-109. https://doi.org/10.1109/CMPCON.1997.584680
  • Melzi, P., Rathgeb, C., Tolosana, R., Vera-Rodríguez, R., & Busch, C. (2022). An Overview of Privacy-Enhancing Technologies in Biometric Recognition. ACM Computing Surveys, 56, 1 - 28. https://doi.org/10.1145/3664596
  • Lovrencic, R., & Škvorc, D. (2023). Multi-cloud applications: data and code fragmentation for improved security. International Journal of Information Security, 22, 713-721. https://doi.org/10.1007/s10207-022-00658-8
  • Zhang, J., Chen, B., Zhao, Y., Cheng, X., & Hu, F. (2018). Data Security and Privacy-Preserving in Edge Computing Paradigm: Survey and Open Issues. IEEE Access, 6, 18209-18237. https://doi.org/10.1109/ACCESS.2018.2820162
  • Sun, Z., Strang, K., & Pambel, F. (2020). Privacy and security in the big data paradigm. Journal of Computer Information Systems, 60, 146 - 155. https://doi.org/10.1080/08874417.2017.1418631
  • Xu, Q., Cheng, Z., Cheng, Y., & Chen, G. (2018). Cross-Domain Data Access System for Distributed Sites in HEP. , 154-164. https://doi.org/10.1007/978-3-030-28061-1_17
  • Raptis, T., Passarella, A., & Conti, M. (2020). Distributed Data Access in Industrial Edge Networks. IEEE Journal on Selected Areas in Communications, 38, 915-927.
  • Kaplan, N., Baker, K., & Karasti, H. (2021). Long live the data! Embedded data management at a long‐term ecological research site. Ecosphere, 12.
  • Malerba, D., & Pasquadibisceglie, V. (2024). Data-Centric AI. J. Intell. Inf. Syst., 62, 1493-1502. https://doi.org/10.1007/s10844-024-00901-9
  • Li, X., Cao, C., Shi, Y., Bai, W., Gao, H., Qiu, L., Wang, C., Gao, Y., Zhang, S., Xue, X., & Chen, L. (2020). A Survey of Data-Driven and Knowledge-Aware eXplainable AI. IEEE Transactions on Knowledge and Data Engineering, 34, 29-49. https://doi.org/10.1109/tkde.2020.2983930
  • Kumar, Y., Marchena, J., Awlla, A., Li, J., & Abdalla, H. (2024). The AI-Powered Evolution of Big Data. Applied Sciences.
  • Myakala, P., Jonnalagadda, A., & Naayini, P. (2025). Revolutionizing Big Data with AI-Driven Hybrid Soft Computing Techniques. Machine Learning and Applications: An International Journal. https://doi.org/10.5121/mlaij.2025.12101
  • Nguyen, D., Cheng, P., Ding, M., López-Pérez, D., Pathirana, P., Li, J., Seneviratne, A., Li, Y., & Poor, H. (2020). Enabling AI in Future Wireless Networks: A Data Life Cycle Perspective. IEEE Communications Surveys & Tutorials, 23, 553-595. https://doi.org/10.1109/COMST.2020.3024783
  • Wang, D., Weisz, J., Muller, M., Ram, P., Geyer, W., Dugan, C., Tausczik, Y., Samulowitz, H., & Gray, A. (2019). Human-AI Collaboration in Data Science. Proceedings of the ACM on Human-Computer Interaction, 3, 1 - 24. https://doi.org/10.1145/3359313
  • Tripathi, M., Nath, A., Singh, T., Ethayathulla, A., & Kaur, P. (2021). Evolving scenario of big data and Artificial Intelligence (AI) in drug discovery. Molecular Diversity, 25, 1439 - 1460. https://doi.org/10.1007/s11030-021-10256-w
  • Tripathi, M., Nath, A., Singh, T., Ethayathulla, A., & Kaur, P. (2021). Evolving scenario of big data and Artificial Intelligence (AI) in drug discovery. Molecular Diversity, 25, 1439 - 1460. https://doi.org/10.1007/s11030-021-10256-w
  • Yuksel, B. B., & Metin, A. Y. (2026). Federated learning with homomorphic encryption for secure real time ECG anomaly detection: A multi institutional privacy preserving framework. Biomedical Signal Processing and Control, 116, 109557.
There are 95 citations in total.

Details

Primary Language English
Subjects Machine Learning Algorithms
Journal Section Review
Authors

Beyazıt Bestami Yüksel 0000-0001-5060-6236

Ayşe Yılmazer Metin 0000-0003-4502-7365

Submission Date May 23, 2025
Acceptance Date August 31, 2025
Publication Date January 31, 2026
Published in Issue Year 2026 Volume: 14 Issue: 1

Cite

IEEE B. B. Yüksel and A. Yılmazer Metin, “Artificial Intelligence Breakthroughs and Data Futures: A Retrospective and Prospective Review”, APJESS, vol. 14, no. 1, pp. 1–16, 2026, doi: 10.21541/apjess.1705042.

Academic Platform Journal of Engineering and Smart Systems