Research Article
BibTex RIS Cite

Transformation to Achieve Perfect Correlation

Year 2025, Volume: 15 Issue: 2, 1 - 12, 31.12.2025

Abstract

Correlation and linear regression are common means to evaluate association and empirical relationships between two or more variables. Such relationships often show significant departure of |r_XY | from unity. Existing transformations to increase correlation fail to achieve perfect correlation. For a bivariate data, the paper proposes transforming Y to y=G.‖x‖‖y‖, which gives r_(X y)=1 where G is the G-inverse of the matrix A=x.x^Tand x, y denote vectors of deviation scores. The concept is extended to perfect linearity between a dependent variable (Y) and a set of independent variables (Multiple linear regressions) or between set of dependent variables and set of independent variables (Canonical regression), avoiding problems of insignificant beta coefficients in univariate and multivariate regression models and outliers. Empirical illustration of G-inverse and extensions for multiple linear regressions and Canonical regressions are also given. The proposed transformation is a novel method of introducing perfect correlation between two variables. Extension of the concept in multiple linear regressions and canonical regression will go a long way in empirical researches in various branches of science. Future studies may include finding distribution of the proposed perfect correlations and comparison of efficacy of our suggested approach against other traditional ones by providing quantitative evidences.

Ethical Statement

Ethical statement is not applicable for this theoretical paper since no data were collected from individuals

Project Number

Not applicable

References

  • Agresti A. (2002). Categorical data analysis (2nd ed). Hoboken, NJ: Wiley
  • Bignardi G., Dalmaijer E.S., Astle D.E. (2022): Testing the specificity of environmental risk factors for developmental outcomes. Child Dev. 93:e282–e298. doi: 10.1111/cdev.13719
  • Brooks, Thomas, Pope, D. and Marcolini, Michael. (2014): Airfoil Self-Noise. UCI Machine Learning Repository. https://doi.org/10.24432/C5VW2C.
  • Brossart, D. F., Parker, R. I., & Castillo, L. G. (2011). Robust regression for single-case data analysis: How can it help? Behavior Research Methods, 43(3), 710–719. https://doi.org/10.3758/s13428-011-0079-7
  • Box, G. E. P. and Cox, D. R. (1964): An analysis of transformations, Journal of the Royal Statistical Society, Series B, 26, 211-252.
  • Chakrabartty, Satyendra Nath (2023): Improving Linearity in Health Science Investigations. Health Sci J. Vol. 17 No. 4: 1010. DOI: 10.36648/1791-809X.17.4.1010
  • Chakrabartty, S. N., Kangrui, Wang and Chakrabarty, Dalia (2024): Reliable Uncertainties of Tests & Surveys - a Data-driven Approach. International Journal of Metrology and Quality Engineering (IJMQE).15, 4, 1 – 14. https://doi.org/10.1051/ijmqe/2023018
  • Cox DR.(1972). Regression models and life-tables (with discussion). J R STAT SOC ; B. 34:187-220. doi: http://dx.doi.org/10.2307/2985181
  • Erceg-Hurn, D. M., & Mirosevich, V. M. (2008). Modern robust statistical methods: An easy way to maximize the accuracy and power of your research. American Psychologist, 63(7), 591–601. https://doi.org/10.1037/0003-066X.63.7.591
  • Feng, Ge, Peng, Jing, TU, Dongke, Zheng, Julia Z. and Feng, Changyong (2016). Two Paradoxes in Linear Regression Analysis. Shanghai Archives of Psychiatry, Vol. 28, No. 6, 355 – 360. https://doi.org/10.11919/j.issn.1002-0829.216084
  • Field, A. P., & Wilcox, R. R. (2017). Robust statistical methods: A primer for clinical psychology and experimental psychopathology researchers. Behaviour Research and Therapy, 98(Supp. C), 19–38. https://doi.org/10.1016/j.brat.2017.05.013
  • Fox, S. and Hammond, S. (2017). Investigating the multivariate relationship between impulsivity and psychopathy using canonical correlation analysis. Personality and Individual Differences, 111, 187-192. doi:10.1016/j.paid.2017.02.025
  • Gavurova B., Rigelsky M., Ivankova V. (2020): Perceived health status and economic growth in terms of gender-oriented inequalities in the OECD countries. Economics and Sociology, 13:245–257. doi: 10.14254/2071-789X.2020/13-2/16.
  • Hand, D. J. ( 1996): Statistics and the Theory of Measurement, J. R. Statist. Soc. A; 159, Part 3, 445-492
  • Jamieson, S. (2004): Likert scales: How to (ab) use them. Medical Education, 38, 1212 -1218
  • Kim, Y., Kim, T.-H., & Ergun, T. (2015). The instability of the Pearson correlation coefficient in the presence of coincidental outliers. Finance Research Letters, 13, 243–257. https://doi.org/10.1016/j.frl.2014.12.005
  • Kovacevic, M. (2011): Review of HDI Critiques and Potential Improvements, The Human Development Research Paper (HDRP) Series, Research Paper 2010/33.
  • Liu Y, Ruan J, Wan C, Tan J, Wu B, Zhao Z. (2022): Canonical correlation analysis of factors that influence quality of life among patients with chronic obstructive pulmonary disease based on QLICD-COPD (V2.0). BMJ Open Respir Res. 9(1):e001192. doi: 10.1136/bmjresp-2021-001192.
  • Loco, J.V; Elskens, M., Croux, C. and Beernaert, H. (2002). Linearity of calibration curves: use and misuse of the correlation coefficient. Accreditation and Quality Assurance (7):281–285. DOI 10.1007/s00769-002-0487-6
  • Malakar B., Roy S.K., Pal B. (2022): Relationship between physical strength measurements and anthropometric variables: Multivariate analysis. J. Public Health Dev. 20:132–145. doi: 10.55131/jphd/2022/200111
  • Mardia, K.V. and Bibby, J.M. and Kent, J.T. (1982): Multivariate analysis, Academic Press
  • Niven, E. B., & Deutsch, C. V. (2012). Calculating a robust correlation coefficient and quantifying its uncertainty. Computers & Geosciences, 40, 1–9. https://doi.org/10.1016/j.cageo.2011.06.021
  • Parkin D, Rice N, Devlin N.(2010): Statistical analysis of EQ-5D profiles: does the use value sets bias inferences? Med Decis Making 30(5): 556–565. DOI: 10.1177/0272989X09357473
  • Rao, C. Radhakrishna and Mitra, Sujit Kumar (1971). Generalized Inverse of Matrices and its Applications. New York: John Wiley & Sons. ISBN 978-0-471-70821-6
  • Rousseeuw, P. J., & van Zomeren, B. C. (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association, 85(411), 633–639. https://doi.org/10.1080/01621459.1990.10474920
  • Song-Gui Wang & Shein-Chung Chow (1987): Some results on canonical correlations and measures of multivariate association. Communications in Statistics - Theory and Methods, 16:2, 339-351, DOI: 10.1080/03610928708829370
  • Stefano, Claudio; Fontanella, Francesco; Maniaci, Marilena and Freca, Alessandra (2018). Avila. UCI Machine Learning Repository. https://doi.org/10.24432/C5K02X
  • Vasylieva T, Gavurova B, Dotsenko T, Bilan S, Strzelec M, Khouri S. (2023): The Behavioral and Social Dimension of the Public Health System of European Countries: Descriptive, Canonical, and Factor Analysis. Int J Environ Res Public Health. 20(5):4419. doi: 10.3390/ijerph20054419.
  • Wessa P. (2012): Box-Cox Linearity Plot (v1.0.5) in Free Statistics Software (v1.1.23-r7), Office for Research Development and Education. http://www.wessa.net/rwasp_boxcoxlin.wasp/
  • Wilcox, R. R. (2023). Robust Correlation Coefficients That Deal With Bad Leverage Points. Methodology, Vol. 19(4), 348–364. https://doi.org/10.5964/meth.11045
  • Wilcox, R. R. (2022). Introduction to robust estimation and hypothesis testing (5th ed.). Academic Press.
  • Yellowlees, A., Bursa, F., Fleetwood, K. J., Charlton, S., Hirst, K. J., Sun, R., & Fusco, P. C. (2016). The appropriateness of robust regression in addressing outliers in an anthrax vaccine potency test. Bioscience, 66(1), 63–72. https://doi.org/10.1093/biosci/biv159

Mükemmel Korelasyona Ulaşmak için Dönüşüm

Year 2025, Volume: 15 Issue: 2, 1 - 12, 31.12.2025

Abstract

Korelasyon ve doğrusal regresyon, iki veya daha fazla değişken arasındaki ilişkiyi ve ampirik ilişkileri değerlendirmek için yaygın araçlardır. Bu tür ilişkiler genellikle |r_XY |'nin birlikten önemli ölçüde saptığını gösterir. Korelasyonu artırmak için yapılan mevcut dönüşümler mükemmel korelasyona ulaşmada başarısız olur. İki değişkenli veriler için, makale Y'yi y=G.‖x‖‖y‖'ye dönüştürmeyi önerir; bu da r_(X y)=1'i verir; burada G, A=x.x^T matrisinin G-tersidir ve x, y sapma puanlarının vektörlerini belirtir. Kavram, bağımlı değişken (Y) ile bağımsız değişkenler kümesi (Çoklu doğrusal regresyonlar) veya bağımlı değişkenler kümesi ile bağımsız değişkenler kümesi (Kanonik regresyon) arasındaki mükemmel doğrusallığa genişletilir ve tek değişkenli ve çok değişkenli regresyon modellerinde ve aykırı değerlerde önemsiz beta katsayıları sorunlarından kaçınılır. G-tersinin ampirik gösterimi ve çoklu doğrusal regresyonlar ve Kanonik regresyonlar için uzantılar da verilmiştir. Önerilen dönüşüm, iki değişken arasında mükemmel korelasyon tanıtmanın yeni bir yöntemidir. Kavramın çoklu doğrusal regresyonlarda ve kanonik regresyonda genişletilmesi, çeşitli bilim dallarındaki ampirik araştırmalarda uzun bir yol kat edecektir. Gelecekteki çalışmalar, önerilen mükemmel korelasyonların dağılımını bulmayı ve nicel kanıtlar sağlayarak önerilen yaklaşımımızın etkinliğinin diğer geleneksel yaklaşımlarla karşılaştırılmasını içerebilir.

Project Number

Not applicable

References

  • Agresti A. (2002). Categorical data analysis (2nd ed). Hoboken, NJ: Wiley
  • Bignardi G., Dalmaijer E.S., Astle D.E. (2022): Testing the specificity of environmental risk factors for developmental outcomes. Child Dev. 93:e282–e298. doi: 10.1111/cdev.13719
  • Brooks, Thomas, Pope, D. and Marcolini, Michael. (2014): Airfoil Self-Noise. UCI Machine Learning Repository. https://doi.org/10.24432/C5VW2C.
  • Brossart, D. F., Parker, R. I., & Castillo, L. G. (2011). Robust regression for single-case data analysis: How can it help? Behavior Research Methods, 43(3), 710–719. https://doi.org/10.3758/s13428-011-0079-7
  • Box, G. E. P. and Cox, D. R. (1964): An analysis of transformations, Journal of the Royal Statistical Society, Series B, 26, 211-252.
  • Chakrabartty, Satyendra Nath (2023): Improving Linearity in Health Science Investigations. Health Sci J. Vol. 17 No. 4: 1010. DOI: 10.36648/1791-809X.17.4.1010
  • Chakrabartty, S. N., Kangrui, Wang and Chakrabarty, Dalia (2024): Reliable Uncertainties of Tests & Surveys - a Data-driven Approach. International Journal of Metrology and Quality Engineering (IJMQE).15, 4, 1 – 14. https://doi.org/10.1051/ijmqe/2023018
  • Cox DR.(1972). Regression models and life-tables (with discussion). J R STAT SOC ; B. 34:187-220. doi: http://dx.doi.org/10.2307/2985181
  • Erceg-Hurn, D. M., & Mirosevich, V. M. (2008). Modern robust statistical methods: An easy way to maximize the accuracy and power of your research. American Psychologist, 63(7), 591–601. https://doi.org/10.1037/0003-066X.63.7.591
  • Feng, Ge, Peng, Jing, TU, Dongke, Zheng, Julia Z. and Feng, Changyong (2016). Two Paradoxes in Linear Regression Analysis. Shanghai Archives of Psychiatry, Vol. 28, No. 6, 355 – 360. https://doi.org/10.11919/j.issn.1002-0829.216084
  • Field, A. P., & Wilcox, R. R. (2017). Robust statistical methods: A primer for clinical psychology and experimental psychopathology researchers. Behaviour Research and Therapy, 98(Supp. C), 19–38. https://doi.org/10.1016/j.brat.2017.05.013
  • Fox, S. and Hammond, S. (2017). Investigating the multivariate relationship between impulsivity and psychopathy using canonical correlation analysis. Personality and Individual Differences, 111, 187-192. doi:10.1016/j.paid.2017.02.025
  • Gavurova B., Rigelsky M., Ivankova V. (2020): Perceived health status and economic growth in terms of gender-oriented inequalities in the OECD countries. Economics and Sociology, 13:245–257. doi: 10.14254/2071-789X.2020/13-2/16.
  • Hand, D. J. ( 1996): Statistics and the Theory of Measurement, J. R. Statist. Soc. A; 159, Part 3, 445-492
  • Jamieson, S. (2004): Likert scales: How to (ab) use them. Medical Education, 38, 1212 -1218
  • Kim, Y., Kim, T.-H., & Ergun, T. (2015). The instability of the Pearson correlation coefficient in the presence of coincidental outliers. Finance Research Letters, 13, 243–257. https://doi.org/10.1016/j.frl.2014.12.005
  • Kovacevic, M. (2011): Review of HDI Critiques and Potential Improvements, The Human Development Research Paper (HDRP) Series, Research Paper 2010/33.
  • Liu Y, Ruan J, Wan C, Tan J, Wu B, Zhao Z. (2022): Canonical correlation analysis of factors that influence quality of life among patients with chronic obstructive pulmonary disease based on QLICD-COPD (V2.0). BMJ Open Respir Res. 9(1):e001192. doi: 10.1136/bmjresp-2021-001192.
  • Loco, J.V; Elskens, M., Croux, C. and Beernaert, H. (2002). Linearity of calibration curves: use and misuse of the correlation coefficient. Accreditation and Quality Assurance (7):281–285. DOI 10.1007/s00769-002-0487-6
  • Malakar B., Roy S.K., Pal B. (2022): Relationship between physical strength measurements and anthropometric variables: Multivariate analysis. J. Public Health Dev. 20:132–145. doi: 10.55131/jphd/2022/200111
  • Mardia, K.V. and Bibby, J.M. and Kent, J.T. (1982): Multivariate analysis, Academic Press
  • Niven, E. B., & Deutsch, C. V. (2012). Calculating a robust correlation coefficient and quantifying its uncertainty. Computers & Geosciences, 40, 1–9. https://doi.org/10.1016/j.cageo.2011.06.021
  • Parkin D, Rice N, Devlin N.(2010): Statistical analysis of EQ-5D profiles: does the use value sets bias inferences? Med Decis Making 30(5): 556–565. DOI: 10.1177/0272989X09357473
  • Rao, C. Radhakrishna and Mitra, Sujit Kumar (1971). Generalized Inverse of Matrices and its Applications. New York: John Wiley & Sons. ISBN 978-0-471-70821-6
  • Rousseeuw, P. J., & van Zomeren, B. C. (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association, 85(411), 633–639. https://doi.org/10.1080/01621459.1990.10474920
  • Song-Gui Wang & Shein-Chung Chow (1987): Some results on canonical correlations and measures of multivariate association. Communications in Statistics - Theory and Methods, 16:2, 339-351, DOI: 10.1080/03610928708829370
  • Stefano, Claudio; Fontanella, Francesco; Maniaci, Marilena and Freca, Alessandra (2018). Avila. UCI Machine Learning Repository. https://doi.org/10.24432/C5K02X
  • Vasylieva T, Gavurova B, Dotsenko T, Bilan S, Strzelec M, Khouri S. (2023): The Behavioral and Social Dimension of the Public Health System of European Countries: Descriptive, Canonical, and Factor Analysis. Int J Environ Res Public Health. 20(5):4419. doi: 10.3390/ijerph20054419.
  • Wessa P. (2012): Box-Cox Linearity Plot (v1.0.5) in Free Statistics Software (v1.1.23-r7), Office for Research Development and Education. http://www.wessa.net/rwasp_boxcoxlin.wasp/
  • Wilcox, R. R. (2023). Robust Correlation Coefficients That Deal With Bad Leverage Points. Methodology, Vol. 19(4), 348–364. https://doi.org/10.5964/meth.11045
  • Wilcox, R. R. (2022). Introduction to robust estimation and hypothesis testing (5th ed.). Academic Press.
  • Yellowlees, A., Bursa, F., Fleetwood, K. J., Charlton, S., Hirst, K. J., Sun, R., & Fusco, P. C. (2016). The appropriateness of robust regression in addressing outliers in an anthrax vaccine potency test. Bioscience, 66(1), 63–72. https://doi.org/10.1093/biosci/biv159
There are 32 citations in total.

Details

Primary Language English
Subjects Statistical Theory
Journal Section Research Article
Authors

Satyendra Chakrabartty 0000-0002-7687-5044

Anish Chakrabarty This is me 0000-0002-1993-2006

Project Number Not applicable
Submission Date March 12, 2025
Acceptance Date October 21, 2025
Publication Date December 31, 2025
Published in Issue Year 2025 Volume: 15 Issue: 2

Cite

APA Chakrabartty, S., & Chakrabarty, A. (2025). Transformation to Achieve Perfect Correlation. İstatistik Araştırma Dergisi, 15(2), 1-12.