A Chaos-Causality Approach to Principled Pruning of Dense Neural Networks

Rajan Sahu; Shivam Chadha; Archana Mathur; Nithin Nagaraj; Snehanshu Saha

doi:10.51537/chaos.1588198

Research Article

Year 2025, Volume: 7 Issue: 2, 154 - 165, 31.07.2025

Rajan Sahu , Shivam Chadha , Archana Mathur , Nithin Nagaraj , Snehanshu Saha

https://doi.org/10.51537/chaos.1588198

Abstract

Project Number

TAR/2021/000206

References

Alavani, G., J. Desai, S. Saha, and S. Sarkar, 2023 Program analysis and machine learning based approach to predict power consumption of cuda kernel. ACM Transactions on Modeling and Performance Evaluation of Computing Systems .
Balakrishnan, H. N., A. Kathpalia, S. Saha, and N. Nagaraj, 2019 Chaosnet: A chaos based artificial neural network architecture for classification. Chaos: An Interdisciplinary Journal of Nonlinear Science 29.
Ditto, W. L. and S. Sinha, 2015 Exploiting chaos for applications. Chaos: An Interdisciplinary Journal of Nonlinear Science 25.
Faure, P. and H. Korn, 2001 Is there chaos in the brain? i. concepts of nonlinear dynamics and methods of investigation. Comptes Rendus de l’Académie des Sciences-Series III-Sciences de la Vie 324: 773–793.
Frankle, J. and M. Carbin, 2018 The lottery ticket hypothesis: Finding sparse, trainable neural networks. ICLR 2019 .
Granger, C. W., 1969 Investigating causal relations by econometric models and cross-spectral methods. Econometrica: journal of the Econometric Society pp. 424–438.
Hegger, R., H. Kantz, and T. Schreiber, 1998 Practical implementation of nonlinear time series methods: The tisean package. Chaos 9 2: 413–435.
Herrmann, L. M., M. Granz, and T. Landgraf, 2022 Chaotic dynamics are intrinsic to neural network training with sgd. In Neural Information Processing Systems.
Kondo, M., S. Sunada, and T. Niiyama, 2021 Lyapunov exponent analysis for multilayer neural networks. Nonlinear Theory and Its Applications, IEICE .
Korn, H. and P. Faure, 2003 Is there chaos in the brain? ii. experimental evidence and related models. Comptes rendus biologies 326: 787–840.
Kuo, D., 2005 Chaos and its computing paradigm. IEEE Potentials 24: 13–15.
LeCun, Y., I. Kanter, and S. A. Solla, 1990 Second order properties of error surfaces: Learning time and generalization. In NIPS 1990.
Lee, J., S. Park, S. Mo, S. Ahn, and J. Shin, 2020 Layer-adaptive sparsity for the magnitude-based pruning. In International Conference on Learning Representations.
Li, G., C. Qian, C. Jiang, X. Lu, and K. Tang, 2018 Optimization based layer-wise magnitude-based pruning for dnn compression. In International Joint Conference on Artificial Intelligence.
Liu, S., T. Chen, X. Chen, L. Shen, D. C. Mocanu, et al., 2022 The unreasonable effectiveness of random pruning: Return of the most naive baseline for sparse training. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022, OpenReview.net.
Liu, Z., M. Sun, T. Zhou, G. Huang, and T. Darrell, 2018 Rethinking the value of network pruning. ArXiv abs/1810.05270.
Lundberg, S. M. and S.-I. Lee, 2017 A unified approach to interpreting model predictions. Red Hook, NY, USA, Curran Associates Inc.
Martin, C. H., T. Peng, and M.W. Mahoney, 2020 Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data. Nature Communications 12.
Medhat, S., H. Abdel-Galil, A. E. Aboutabl, and H. Saleh, 2023 Iterative magnitude pruning-based light-version of alexnet for skin cancer classification. Neural Computing and Applications pp. 1–16.
Mittal, D., S. Bhardwaj, M. M. Khapra, and B. Ravindran, 2018a Recovering from random pruning: On the plasticity of deep convolutional neural networks. 2018 IEEEWinter Conference on Applications of Computer Vision (WACV) pp. 848–857.
Mittal, D., S. Bhardwaj, M. M. Khapra, and B. Ravindran, 2018b Studying the plasticity in deep convolutional neural networks using random pruning. Machine Vision and Applications 30: 203 – 216.
Mohapatra, R., S. Saha, C. A. C. Coello, A. Bhattacharya, S. S. Dhavala, et al., 2022 Adaswarm: Augmenting gradient-based optimizers in deep learning with swarm intelligence. IEEE Transactions on Emerging Topics in Computational Intelligence 6: 329–340.
Nagaraj, N. and P. G. Vaidya, 2009 Multiplexing of discrete chaotic signals in presence of noise. Chaos: An Interdisciplinary Journal of Nonlinear Science 19.
Prandi, C., S. Mirri, S. Ferretti, and P. Salomoni, 2017 On the need of trustworthy sensing and crowdsourcing for urban accessibility in smart city. ACM Trans. Internet Technol. 18.
Qian, X.-Y. and D. Klabjan, 2021 A probabilistic approach to neural network pruning. In International Conference on Machine Learning.
Rosenstein, M. T., J. J. Collins, and C. J. D. Luca, 1993 A practical method for calculating largest lyapunov exponents from small data sets. Physica D: Nonlinear Phenomena 65: 117–134.
Saha, S., N. Nagaraj, A. Mathur, R. Yedida, and S. H. R, 2020 Evolution of novel activation functions in neural network training for astronomy data: habitability classification of exoplanets. The European Physical Journal. Special Topics 229: 2629 – 2738.
Saleem, T. J., R. Ahuja, S. Prasad, and B. Lall, 2024 Insights into the lottery ticket hypothesis and iterative magnitude pruning. ArXiv abs/2403.15022.
Shen, Z., H. Yang, and S. Zhang, 2019 Nonlinear approximation via compositions. Neural networks : the official journal of the International Neural Network Society 119: 74–84.
Sprott, J., 2013 Is chaos good for learning. Nonlinear dynamics, psychology, and life sciences 17: 223–232.
Wolf, A., J. Swift, H. L. Swinney, and J. A. Vastano, 1985 Determining lyapunov exponents from a time series. Physica D: Nonlinear Phenomena 16: 285–317.
Zhang, L., L. Feng, K. Chen, and C. H. Lai, 2021 Edge of chaos as a guiding principle for modern neural network training. ArXiv abs/2107.09437.
Zou, D., Y. Cao, D. Zhou, and Q. Gu, 2018 Stochastic gradient descent optimizes over-parameterized deep relu networks.

A Chaos-Causality Approach to Principled Pruning of Dense Neural Networks

Year 2025, Volume: 7 Issue: 2, 154 - 165, 31.07.2025

Rajan Sahu , Shivam Chadha , Archana Mathur , Nithin Nagaraj , Snehanshu Saha

https://doi.org/10.51537/chaos.1588198

Abstract

Reducing the size of a neural network (pruning) by removing weights without impacting its performance is an important problem for resource-constrained devices. In the past, pruning was typically accomplished by ranking or penalizing weights based on criteria like magnitude and removing low-ranked weights before retraining the remaining ones. Pruning strategies also involve removing neurons from the network to achieve the desired reduction in network size. We formulate pruning as an optimization problem to minimize misclassifications by selecting specific weights. We have introduced the concept of chaos in learning (Lyapunov Exponents) through weight updates and used causality-based investigations to identify the causal weight connections responsible for misclassification. Two architectures are proposed in the current work - Lyapunov Exponent Granger Causality driven Fully Trained Network (LEGCNet-FT) and Lyapunov Exponent Granger Causality driven Partially Trained Network (LEGCNet-PT). The proposed methodology gauges causality between weight-specific Lyapunov Exponents (LEs) and misclassification, facilitating the identification of weights for pruning in the network. The performance of both the dense and pruned neural networks is evaluated using accuracy, F1 scores, FLOPS, and percentage pruned. It is observed that, using LEGCNet-PT/LEGCNet-FT, a dense over-parameterized network can be pruned without compromising accuracy, F1 score, or other performance metrics. Additionally, the sparse networks are trained with fewer epochs and fewer FLOPs than their dense counterparts across all datasets. Our methods are compared with random and magnitude pruning and observed that the pruned network maintains the original performance while retaining feature explainability. Feature explainability is investigated using SHAP and WeightWatchers. The SHAP values computed for the proposed pruning architecture, as well as for the baselines (random and magnitude), indicate that feature importance is maintained in LEGCNet-PT and LEGCNet-FT when compared to the dense network. WeightWatchers results reveal that the network layers are well-trained.

Keywords

Chaos , Granger causality , Neural networks , Lyapunov exponent , Weight pruning

Project Number

TAR/2021/000206

References

Alavani, G., J. Desai, S. Saha, and S. Sarkar, 2023 Program analysis and machine learning based approach to predict power consumption of cuda kernel. ACM Transactions on Modeling and Performance Evaluation of Computing Systems .
Balakrishnan, H. N., A. Kathpalia, S. Saha, and N. Nagaraj, 2019 Chaosnet: A chaos based artificial neural network architecture for classification. Chaos: An Interdisciplinary Journal of Nonlinear Science 29.
Ditto, W. L. and S. Sinha, 2015 Exploiting chaos for applications. Chaos: An Interdisciplinary Journal of Nonlinear Science 25.
Faure, P. and H. Korn, 2001 Is there chaos in the brain? i. concepts of nonlinear dynamics and methods of investigation. Comptes Rendus de l’Académie des Sciences-Series III-Sciences de la Vie 324: 773–793.
Frankle, J. and M. Carbin, 2018 The lottery ticket hypothesis: Finding sparse, trainable neural networks. ICLR 2019 .
Granger, C. W., 1969 Investigating causal relations by econometric models and cross-spectral methods. Econometrica: journal of the Econometric Society pp. 424–438.
Hegger, R., H. Kantz, and T. Schreiber, 1998 Practical implementation of nonlinear time series methods: The tisean package. Chaos 9 2: 413–435.
Herrmann, L. M., M. Granz, and T. Landgraf, 2022 Chaotic dynamics are intrinsic to neural network training with sgd. In Neural Information Processing Systems.
Kondo, M., S. Sunada, and T. Niiyama, 2021 Lyapunov exponent analysis for multilayer neural networks. Nonlinear Theory and Its Applications, IEICE .
Korn, H. and P. Faure, 2003 Is there chaos in the brain? ii. experimental evidence and related models. Comptes rendus biologies 326: 787–840.
Kuo, D., 2005 Chaos and its computing paradigm. IEEE Potentials 24: 13–15.
LeCun, Y., I. Kanter, and S. A. Solla, 1990 Second order properties of error surfaces: Learning time and generalization. In NIPS 1990.
Lee, J., S. Park, S. Mo, S. Ahn, and J. Shin, 2020 Layer-adaptive sparsity for the magnitude-based pruning. In International Conference on Learning Representations.
Li, G., C. Qian, C. Jiang, X. Lu, and K. Tang, 2018 Optimization based layer-wise magnitude-based pruning for dnn compression. In International Joint Conference on Artificial Intelligence.
Liu, S., T. Chen, X. Chen, L. Shen, D. C. Mocanu, et al., 2022 The unreasonable effectiveness of random pruning: Return of the most naive baseline for sparse training. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022, OpenReview.net.
Liu, Z., M. Sun, T. Zhou, G. Huang, and T. Darrell, 2018 Rethinking the value of network pruning. ArXiv abs/1810.05270.
Lundberg, S. M. and S.-I. Lee, 2017 A unified approach to interpreting model predictions. Red Hook, NY, USA, Curran Associates Inc.
Martin, C. H., T. Peng, and M.W. Mahoney, 2020 Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data. Nature Communications 12.
Medhat, S., H. Abdel-Galil, A. E. Aboutabl, and H. Saleh, 2023 Iterative magnitude pruning-based light-version of alexnet for skin cancer classification. Neural Computing and Applications pp. 1–16.
Mittal, D., S. Bhardwaj, M. M. Khapra, and B. Ravindran, 2018a Recovering from random pruning: On the plasticity of deep convolutional neural networks. 2018 IEEEWinter Conference on Applications of Computer Vision (WACV) pp. 848–857.
Mittal, D., S. Bhardwaj, M. M. Khapra, and B. Ravindran, 2018b Studying the plasticity in deep convolutional neural networks using random pruning. Machine Vision and Applications 30: 203 – 216.
Mohapatra, R., S. Saha, C. A. C. Coello, A. Bhattacharya, S. S. Dhavala, et al., 2022 Adaswarm: Augmenting gradient-based optimizers in deep learning with swarm intelligence. IEEE Transactions on Emerging Topics in Computational Intelligence 6: 329–340.
Nagaraj, N. and P. G. Vaidya, 2009 Multiplexing of discrete chaotic signals in presence of noise. Chaos: An Interdisciplinary Journal of Nonlinear Science 19.
Prandi, C., S. Mirri, S. Ferretti, and P. Salomoni, 2017 On the need of trustworthy sensing and crowdsourcing for urban accessibility in smart city. ACM Trans. Internet Technol. 18.
Qian, X.-Y. and D. Klabjan, 2021 A probabilistic approach to neural network pruning. In International Conference on Machine Learning.
Rosenstein, M. T., J. J. Collins, and C. J. D. Luca, 1993 A practical method for calculating largest lyapunov exponents from small data sets. Physica D: Nonlinear Phenomena 65: 117–134.
Saha, S., N. Nagaraj, A. Mathur, R. Yedida, and S. H. R, 2020 Evolution of novel activation functions in neural network training for astronomy data: habitability classification of exoplanets. The European Physical Journal. Special Topics 229: 2629 – 2738.
Saleem, T. J., R. Ahuja, S. Prasad, and B. Lall, 2024 Insights into the lottery ticket hypothesis and iterative magnitude pruning. ArXiv abs/2403.15022.
Shen, Z., H. Yang, and S. Zhang, 2019 Nonlinear approximation via compositions. Neural networks : the official journal of the International Neural Network Society 119: 74–84.
Sprott, J., 2013 Is chaos good for learning. Nonlinear dynamics, psychology, and life sciences 17: 223–232.
Wolf, A., J. Swift, H. L. Swinney, and J. A. Vastano, 1985 Determining lyapunov exponents from a time series. Physica D: Nonlinear Phenomena 16: 285–317.
Zhang, L., L. Feng, K. Chen, and C. H. Lai, 2021 Edge of chaos as a guiding principle for modern neural network training. ArXiv abs/2107.09437.
Zou, D., Y. Cao, D. Zhou, and Q. Gu, 2018 Stochastic gradient descent optimizes over-parameterized deep relu networks.

There are 33 citations in total.

Details

Primary Language	English
Subjects	Complex Systems in Mathematics, Dynamical Systems in Applications
Journal Section	Research Article
Authors	Rajan Sahu 0009-0003-6201-5507 Shivam Chadha 0009-0001-3921-7106 Archana Mathur 0000-0003-4522-6890 Nithin Nagaraj 0000-0003-0097-4131 Snehanshu Saha 0000-0002-8458-604X
Project Number	TAR/2021/000206
Submission Date	November 21, 2024
Acceptance Date	March 27, 2025
Publication Date	July 31, 2025
Published in Issue	Year 2025 Volume: 7 Issue: 2

Cite

APA	Sahu, R., Chadha, S., Mathur, A., … Nagaraj, N. (2025). A Chaos-Causality Approach to Principled Pruning of Dense Neural Networks. Chaos Theory and Applications, 7(2), 154-165. https://doi.org/10.51537/chaos.1588198

Article Files

Full Text

Chaos Theory and Applications in Applied Sciences and Engineering: An interdisciplinary journal of nonlinear science

The published articles in CHTA are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License