Year 2025,
Volume: 7 Issue: 2, 154 - 165, 31.07.2025
Rajan Sahu
,
Shivam Chadha
,
Archana Mathur
,
Nithin Nagaraj
,
Snehanshu Saha
Project Number
TAR/2021/000206
References
-
Alavani, G., J. Desai, S. Saha, and S. Sarkar, 2023 Program analysis
and machine learning based approach to predict power consumption
of cuda kernel. ACM Transactions on Modeling and
Performance Evaluation of Computing Systems .
-
Balakrishnan, H. N., A. Kathpalia, S. Saha, and N. Nagaraj, 2019
Chaosnet: A chaos based artificial neural network architecture
for classification. Chaos: An Interdisciplinary Journal of Nonlinear
Science 29.
-
Ditto, W. L. and S. Sinha, 2015 Exploiting chaos for applications.
Chaos: An Interdisciplinary Journal of Nonlinear Science 25.
-
Faure, P. and H. Korn, 2001 Is there chaos in the brain? i. concepts
of nonlinear dynamics and methods of investigation. Comptes
Rendus de l’Académie des Sciences-Series III-Sciences de la Vie
324: 773–793.
-
Frankle, J. and M. Carbin, 2018 The lottery ticket hypothesis: Finding
sparse, trainable neural networks. ICLR 2019 .
-
Granger, C. W., 1969 Investigating causal relations by econometric
models and cross-spectral methods. Econometrica: journal of
the Econometric Society pp. 424–438.
-
Hegger, R., H. Kantz, and T. Schreiber, 1998 Practical implementation
of nonlinear time series methods: The tisean package. Chaos
9 2: 413–435.
-
Herrmann, L. M., M. Granz, and T. Landgraf, 2022 Chaotic dynamics
are intrinsic to neural network training with sgd. In Neural
Information Processing Systems.
-
Kondo, M., S. Sunada, and T. Niiyama, 2021 Lyapunov exponent
analysis for multilayer neural networks. Nonlinear Theory and
Its Applications, IEICE .
-
Korn, H. and P. Faure, 2003 Is there chaos in the brain? ii. experimental
evidence and related models. Comptes rendus biologies
326: 787–840.
-
Kuo, D., 2005 Chaos and its computing paradigm. IEEE Potentials
24: 13–15.
-
LeCun, Y., I. Kanter, and S. A. Solla, 1990 Second order properties
of error surfaces: Learning time and generalization. In NIPS
1990.
-
Lee, J., S. Park, S. Mo, S. Ahn, and J. Shin, 2020 Layer-adaptive sparsity
for the magnitude-based pruning. In International Conference
on Learning Representations.
-
Li, G., C. Qian, C. Jiang, X. Lu, and K. Tang, 2018 Optimization
based layer-wise magnitude-based pruning for dnn compression.
In International Joint Conference on Artificial Intelligence.
-
Liu, S., T. Chen, X. Chen, L. Shen, D. C. Mocanu, et al., 2022 The
unreasonable effectiveness of random pruning: Return of the
most naive baseline for sparse training. In The Tenth International
Conference on Learning Representations, ICLR 2022, Virtual Event,
April 25-29, 2022, OpenReview.net.
-
Liu, Z., M. Sun, T. Zhou, G. Huang, and T. Darrell, 2018 Rethinking
the value of network pruning. ArXiv abs/1810.05270.
-
Lundberg, S. M. and S.-I. Lee, 2017 A unified approach to interpreting
model predictions. Red Hook, NY, USA, Curran Associates
Inc.
-
Martin, C. H., T. Peng, and M.W. Mahoney, 2020 Predicting trends
in the quality of state-of-the-art neural networks without access
to training or testing data. Nature Communications 12.
-
Medhat, S., H. Abdel-Galil, A. E. Aboutabl, and H. Saleh, 2023
Iterative magnitude pruning-based light-version of alexnet for
skin cancer classification. Neural Computing and Applications
pp. 1–16.
-
Mittal, D., S. Bhardwaj, M. M. Khapra, and B. Ravindran, 2018a
Recovering from random pruning: On the plasticity of deep
convolutional neural networks. 2018 IEEEWinter Conference on
Applications of Computer Vision (WACV) pp. 848–857.
-
Mittal, D., S. Bhardwaj, M. M. Khapra, and B. Ravindran, 2018b
Studying the plasticity in deep convolutional neural networks
using random pruning. Machine Vision and Applications 30:
203 – 216.
-
Mohapatra, R., S. Saha, C. A. C. Coello, A. Bhattacharya, S. S.
Dhavala, et al., 2022 Adaswarm: Augmenting gradient-based
optimizers in deep learning with swarm intelligence. IEEE Transactions
on Emerging Topics in Computational Intelligence 6:
329–340.
-
Nagaraj, N. and P. G. Vaidya, 2009 Multiplexing of discrete chaotic
signals in presence of noise. Chaos: An Interdisciplinary Journal
of Nonlinear Science 19.
-
Prandi, C., S. Mirri, S. Ferretti, and P. Salomoni, 2017 On the need of
trustworthy sensing and crowdsourcing for urban accessibility
in smart city. ACM Trans. Internet Technol. 18.
-
Qian, X.-Y. and D. Klabjan, 2021 A probabilistic approach to neural network pruning. In International Conference on Machine Learning.
-
Rosenstein, M. T., J. J. Collins, and C. J. D. Luca, 1993 A practical method for calculating largest lyapunov exponents from small data sets. Physica D: Nonlinear Phenomena 65: 117–134.
-
Saha, S., N. Nagaraj, A. Mathur, R. Yedida, and S. H. R, 2020 Evolution
of novel activation functions in neural network training
for astronomy data: habitability classification of exoplanets. The
European Physical Journal. Special Topics 229: 2629 – 2738.
-
Saleem, T. J., R. Ahuja, S. Prasad, and B. Lall, 2024 Insights into
the lottery ticket hypothesis and iterative magnitude pruning.
ArXiv abs/2403.15022.
-
Shen, Z., H. Yang, and S. Zhang, 2019 Nonlinear approximation
via compositions. Neural networks : the official journal of the
International Neural Network Society 119: 74–84.
-
Sprott, J., 2013 Is chaos good for learning. Nonlinear dynamics,
psychology, and life sciences 17: 223–232.
-
Wolf, A., J. Swift, H. L. Swinney, and J. A. Vastano, 1985 Determining
lyapunov exponents from a time series. Physica D: Nonlinear
Phenomena 16: 285–317.
-
Zhang, L., L. Feng, K. Chen, and C. H. Lai, 2021 Edge of chaos as
a guiding principle for modern neural network training. ArXiv
abs/2107.09437.
-
Zou, D., Y. Cao, D. Zhou, and Q. Gu, 2018 Stochastic gradient
descent optimizes over-parameterized deep relu networks.
A Chaos-Causality Approach to Principled Pruning of Dense Neural Networks
Year 2025,
Volume: 7 Issue: 2, 154 - 165, 31.07.2025
Rajan Sahu
,
Shivam Chadha
,
Archana Mathur
,
Nithin Nagaraj
,
Snehanshu Saha
Abstract
Reducing the size of a neural network (pruning) by removing weights without impacting its performance is an important problem for resource-constrained devices. In the past, pruning was typically accomplished by ranking or penalizing weights based on criteria like magnitude and removing low-ranked weights before retraining the remaining ones. Pruning strategies also involve removing neurons from the network to achieve the desired reduction in network size. We formulate pruning as an optimization problem to minimize misclassifications by selecting specific weights. We have introduced the concept of chaos in learning (Lyapunov Exponents) through weight updates and used causality-based investigations to identify the causal weight connections responsible for misclassification. Two architectures are proposed in the current work - Lyapunov Exponent Granger Causality driven Fully Trained Network (LEGCNet-FT) and Lyapunov Exponent Granger Causality driven Partially Trained Network (LEGCNet-PT). The proposed methodology gauges causality between weight-specific Lyapunov Exponents (LEs) and misclassification, facilitating the identification of weights for pruning in the network. The performance of both the dense and pruned neural networks is evaluated using accuracy, F1 scores, FLOPS, and percentage pruned. It is observed that, using LEGCNet-PT/LEGCNet-FT, a dense over-parameterized network can be pruned without compromising accuracy, F1 score, or other performance metrics. Additionally, the sparse networks are trained with fewer epochs and fewer FLOPs than their dense counterparts across all datasets. Our methods are compared with random and magnitude pruning and observed that the pruned network maintains the original performance while retaining feature explainability. Feature explainability is investigated using SHAP and WeightWatchers. The SHAP values computed for the proposed pruning architecture, as well as for the baselines (random and magnitude), indicate that feature importance is maintained in LEGCNet-PT and LEGCNet-FT when compared to the dense network. WeightWatchers results reveal that the network layers are well-trained.
Project Number
TAR/2021/000206
References
-
Alavani, G., J. Desai, S. Saha, and S. Sarkar, 2023 Program analysis
and machine learning based approach to predict power consumption
of cuda kernel. ACM Transactions on Modeling and
Performance Evaluation of Computing Systems .
-
Balakrishnan, H. N., A. Kathpalia, S. Saha, and N. Nagaraj, 2019
Chaosnet: A chaos based artificial neural network architecture
for classification. Chaos: An Interdisciplinary Journal of Nonlinear
Science 29.
-
Ditto, W. L. and S. Sinha, 2015 Exploiting chaos for applications.
Chaos: An Interdisciplinary Journal of Nonlinear Science 25.
-
Faure, P. and H. Korn, 2001 Is there chaos in the brain? i. concepts
of nonlinear dynamics and methods of investigation. Comptes
Rendus de l’Académie des Sciences-Series III-Sciences de la Vie
324: 773–793.
-
Frankle, J. and M. Carbin, 2018 The lottery ticket hypothesis: Finding
sparse, trainable neural networks. ICLR 2019 .
-
Granger, C. W., 1969 Investigating causal relations by econometric
models and cross-spectral methods. Econometrica: journal of
the Econometric Society pp. 424–438.
-
Hegger, R., H. Kantz, and T. Schreiber, 1998 Practical implementation
of nonlinear time series methods: The tisean package. Chaos
9 2: 413–435.
-
Herrmann, L. M., M. Granz, and T. Landgraf, 2022 Chaotic dynamics
are intrinsic to neural network training with sgd. In Neural
Information Processing Systems.
-
Kondo, M., S. Sunada, and T. Niiyama, 2021 Lyapunov exponent
analysis for multilayer neural networks. Nonlinear Theory and
Its Applications, IEICE .
-
Korn, H. and P. Faure, 2003 Is there chaos in the brain? ii. experimental
evidence and related models. Comptes rendus biologies
326: 787–840.
-
Kuo, D., 2005 Chaos and its computing paradigm. IEEE Potentials
24: 13–15.
-
LeCun, Y., I. Kanter, and S. A. Solla, 1990 Second order properties
of error surfaces: Learning time and generalization. In NIPS
1990.
-
Lee, J., S. Park, S. Mo, S. Ahn, and J. Shin, 2020 Layer-adaptive sparsity
for the magnitude-based pruning. In International Conference
on Learning Representations.
-
Li, G., C. Qian, C. Jiang, X. Lu, and K. Tang, 2018 Optimization
based layer-wise magnitude-based pruning for dnn compression.
In International Joint Conference on Artificial Intelligence.
-
Liu, S., T. Chen, X. Chen, L. Shen, D. C. Mocanu, et al., 2022 The
unreasonable effectiveness of random pruning: Return of the
most naive baseline for sparse training. In The Tenth International
Conference on Learning Representations, ICLR 2022, Virtual Event,
April 25-29, 2022, OpenReview.net.
-
Liu, Z., M. Sun, T. Zhou, G. Huang, and T. Darrell, 2018 Rethinking
the value of network pruning. ArXiv abs/1810.05270.
-
Lundberg, S. M. and S.-I. Lee, 2017 A unified approach to interpreting
model predictions. Red Hook, NY, USA, Curran Associates
Inc.
-
Martin, C. H., T. Peng, and M.W. Mahoney, 2020 Predicting trends
in the quality of state-of-the-art neural networks without access
to training or testing data. Nature Communications 12.
-
Medhat, S., H. Abdel-Galil, A. E. Aboutabl, and H. Saleh, 2023
Iterative magnitude pruning-based light-version of alexnet for
skin cancer classification. Neural Computing and Applications
pp. 1–16.
-
Mittal, D., S. Bhardwaj, M. M. Khapra, and B. Ravindran, 2018a
Recovering from random pruning: On the plasticity of deep
convolutional neural networks. 2018 IEEEWinter Conference on
Applications of Computer Vision (WACV) pp. 848–857.
-
Mittal, D., S. Bhardwaj, M. M. Khapra, and B. Ravindran, 2018b
Studying the plasticity in deep convolutional neural networks
using random pruning. Machine Vision and Applications 30:
203 – 216.
-
Mohapatra, R., S. Saha, C. A. C. Coello, A. Bhattacharya, S. S.
Dhavala, et al., 2022 Adaswarm: Augmenting gradient-based
optimizers in deep learning with swarm intelligence. IEEE Transactions
on Emerging Topics in Computational Intelligence 6:
329–340.
-
Nagaraj, N. and P. G. Vaidya, 2009 Multiplexing of discrete chaotic
signals in presence of noise. Chaos: An Interdisciplinary Journal
of Nonlinear Science 19.
-
Prandi, C., S. Mirri, S. Ferretti, and P. Salomoni, 2017 On the need of
trustworthy sensing and crowdsourcing for urban accessibility
in smart city. ACM Trans. Internet Technol. 18.
-
Qian, X.-Y. and D. Klabjan, 2021 A probabilistic approach to neural network pruning. In International Conference on Machine Learning.
-
Rosenstein, M. T., J. J. Collins, and C. J. D. Luca, 1993 A practical method for calculating largest lyapunov exponents from small data sets. Physica D: Nonlinear Phenomena 65: 117–134.
-
Saha, S., N. Nagaraj, A. Mathur, R. Yedida, and S. H. R, 2020 Evolution
of novel activation functions in neural network training
for astronomy data: habitability classification of exoplanets. The
European Physical Journal. Special Topics 229: 2629 – 2738.
-
Saleem, T. J., R. Ahuja, S. Prasad, and B. Lall, 2024 Insights into
the lottery ticket hypothesis and iterative magnitude pruning.
ArXiv abs/2403.15022.
-
Shen, Z., H. Yang, and S. Zhang, 2019 Nonlinear approximation
via compositions. Neural networks : the official journal of the
International Neural Network Society 119: 74–84.
-
Sprott, J., 2013 Is chaos good for learning. Nonlinear dynamics,
psychology, and life sciences 17: 223–232.
-
Wolf, A., J. Swift, H. L. Swinney, and J. A. Vastano, 1985 Determining
lyapunov exponents from a time series. Physica D: Nonlinear
Phenomena 16: 285–317.
-
Zhang, L., L. Feng, K. Chen, and C. H. Lai, 2021 Edge of chaos as
a guiding principle for modern neural network training. ArXiv
abs/2107.09437.
-
Zou, D., Y. Cao, D. Zhou, and Q. Gu, 2018 Stochastic gradient
descent optimizes over-parameterized deep relu networks.