Statistical Learning-Based Prediction of Estrogen Receptor Alpha (ERα) Inhibitor Activities
Abstract
Estrogen receptor alpha (ERα) is a protein that plays a role in processes such as cell growth and proliferation; however, it has become an important research topic due to its overexpression in 70% of breast cancers. ERα inhibitors stop the growth of cancer cells by blocking the activity of this protein. Traditional drug discovery methods are disadvantageous in terms of time and cost. Various approaches exist in the literature for the discovery of ERα inhibitors. Therefore, a machine learning-based Quantitative Structure-Activity Relationship (QSAR) approach was preferred in this study for the discovery of ERα inhibitors. In this study, a machine learning-based QSAR approach was preferred due to its capacity to extract structure-activity relationships from large chemical datasets and its ability to provide high-throughput screening opportunities. This method enables the prediction of biological activities by expressing the chemical structural properties of molecules through numerical descriptors. The ChEMBL206 target identifier was selected as the data source due to its widespread use, high data quality, and the opportunity for comparison with previous studies. The obtained molecules were classified according to their IC50 values, and their chemical space distributions were analyzed using Lipinski rules. Subsequently, 3153 molecular descriptors were calculated for 3053 molecules using the PADEL program. Feature importance analysis revealed that fingerprints such as PubchemFP667 and PubchemFP527, as well as APC2D atom pair descriptors that stood out in the LightGBM model, played critical roles in ERα inhibition. The developed models demonstrated superior performance with accuracy above 94%, sensitivity around 90%, specificity above 95%, and AUC values above 0.97. This study contributes to the efficiency of the drug discovery process by demonstrating that the activity of ERα inhibitors can be predicted with high accuracy rates.
Keywords
References
- Al-Thanoon, N. A., Qasim, O. S., Algamal, Z. Y., 2019. A new hybrid firefly algorithm and particle swarm optimization for tuning parameter estimation in penalized support vector machine with application in chemometrics. Chemometrics and Intelligent Laboratory Systems, 184, 142–152.
- Ali, S., Coombes, R. C., 2002. Endocrine-responsive breast cancer and strategies for combating resistance. Nature Reviews Cancer, 2(2), 101–112.
- Anderson, E., 2002. The role of oestrogen and progesterone receptors in human mammary development and tumorigenesis. Breast Cancer Research, 4(5), 197–201.
- Arciniegas, F., Bennett, K., Breneman, C., Embrechts, M. J., 2000. Molecular database mining using self-organizing maps for the design of novel pharmaceuticals. In Intelligent Engineering Systems through Artificial Neural Networks: Smart Engineering System Design, 10, 477–481.
- Breiman, L., 1996. Bagging predictors. Machine Learning, 24(2), 123–140.
- Breiman, L., 2001. Random forests. Machine Learning, 45(1), 5–32.
- Byvatov, E., Fechner, U., Sadowski, J., Schneider, G., 2003. Comparison of support vector machine and artificial neural network systems for drug/nondrug classification. Journal of Chemical Information and Computer Sciences, 43(6), 1082–1089.
- Carracedo-Reboredo, P., Liñares-Blanco, J., Rodríguez- Fernández, N., Cedrón, F., Novoa, F. J., Carballal, A., Maojo, V., Pazos, A., Fernandez-Lozano, C., 2021. A review on machine learning approaches and trends in drug discovery. Computational and Structural Biotechnology Journal, 19, 4538-4558.
Details
Primary Language
English
Subjects
Machine Learning (Other), Data Management and Data Science (Other)
Journal Section
Research Article
Authors
Fatma Karateke
*
0009-0001-0284-048X
Türkiye
Publication Date
April 1, 2026
Submission Date
July 29, 2025
Acceptance Date
January 29, 2026
Published in Issue
Year 2026 Volume: 9 Number: 2026