How Analysis Can Teach Us the Optimal Way to Design Neural Operators

Vu-anh Le; Mehmet Dik

doi:10.47086/pims.1579364

Research Article

Year 2024, Volume: 6 Issue: 2, 77 - 99, 28.12.2024

Vu-anh Le , Mehmet Dik

https://doi.org/10.47086/pims.1579364

Abstract

References

V.-A. Le and M. Dik, "A mathematical analysis of neural operator behaviors," arXiv preprint arXiv:2410.21481, 2024.
N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, and A. Anandkumar, "Neural operator: Learning maps between function spaces," SIAM J. Sci. Comput. 43 (2021), no. 5, A3172–A3192.
Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar, "Fourier neural operator for parametric partial differential equations," in Proceedings of the International Conference on Learning Representations (ICLR), 2021. Available at [https://openreview.net/forum?id=c8P9NQVtmnO](https://openreview.net/forum?id=c8P9NQVtmnO).
L. Lu, P. Jin, and G. E. Karniadakis, "DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators," arXiv preprint arXiv:1910.03193, 2019.
S. Banach, "Sur les opérations dans les ensembles abstraits et leur application aux équations intégrales," Fund. Math. 3 (1922), 133–181.
I. Daubechies, Ten Lectures on Wavelets, SIAM, Philadelphia, PA, 1992.
Y. Zhao and S. Sun, "Wavelet neural operator: A neural operator based on the wavelet transform," arXiv preprint arXiv:2201.12086, 2022.
T.-S. Chen and H.-Y. Chen, "Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems," IEEE Trans. Neural Networks 6 (1995), no. 4, 911–917.
M. Raghu, B. Poole, J. Kleinberg, S. Ganguli, and J. Sohl-Dickstein, "On the expressive power of deep neural networks," in Proceedings of the 34th International Conference on Machine Learning, vol. 70, PMLR, 2017, pp. 2847–2854.
K. Hornik, M. Stinchcombe, and H. White, "Multilayer feedforward networks are universal approximators," Neural Networks 2 (1989), no. 5, 359–366.
I. Goodfellow, Y. Bengio, and A. Courville, “Deep Learning,” MIT Press, 2016. Available at [http://www.deeplearningbook.org](http://www.deeplearningbook.org).
A. Krogh and J. A. Hertz, “A simple weight decay can improve generalization,” in Advances in Neural Information Processing Systems 4, Morgan-Kaufmann, 1992, pp. 950–957.
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A simple way to prevent neural networks from overfitting," J. Mach. Learn. Res. 15 (2014), 1929–1958.
T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, "Spectral normalization for generative adversarial networks," arXiv preprint arXiv:1802.05957, 2018.
M. Mohri, A. Rostamizadeh, and A. Talwalkar, “Foundations of Machine Learning,” MIT Press, 2018.
N. Cohen, O. Sharir, and A. Shashua, "On the expressive power of deep learning: A tensor analysis," in Proceedings of the 29th Annual Conference on Learning Theory, vol. 49, PMLR, 2016, pp. 698–728.
J. W. Cooley and J. W. Tukey, "An algorithm for the machine calculation of complex Fourier series," Math. Comp. 19 (1965), no. 90, 297–301.
W. Rudin, “Principles of Mathematical Analysis,” 3rd ed., McGraw-Hill, New York, 1976.
P. L. Bartlett, D. J. Foster, and M. J. Telgarsky, "Spectrally-normalized margin bounds for neural networks," arXiv preprint arXiv:1706.08498, 2017.
D. Jackson, “The Theory of Approximation,” American Mathematical Society, Providence, RI, 1930.
S. Mallat, “A Wavelet Tour of Signal Processing,” 2nd ed., Academic Press, San Diego, CA, 1999.
S. G. Mallat, "A theory for multiresolution signal decomposition: The wavelet representation," IEEE Trans. Pattern Anal. Mach. Intell. 11 (1989), no. 7, 674–693.
E. C. Titchmarsh, “Introduction to the Theory of Fourier Integrals,” Oxford University Press, Oxford, 1948.
A. Cohen, “Numerical Analysis of Wavelet Methods,” Elsevier, 2003.
D. L. Donoho, “Compressed sensing,” IEEE Trans. Inform. Theory 52 (2006), no. 4, 1289–1306.
G. F. Montúfar, R. Pascanu, K. Cho, and Y. Bengio, "On the number of linear regions of deep neural networks," in Advances in Neural Information Processing Systems 27, Curran Associates, Inc., 2014, pp. 2924–2932.
T. Serra, C. Tjandraatmadja, and S. Ramalingam, "Bounding and counting linear regions of deep neural networks," in Proceedings of the 35th International Conference on Machine Learning, 2018, pp. 4558–4566.
S. Geman, E. Bienenstock, and R. Doursat, “Neural networks and the bias/variance dilemma,” Neural Comput. 4 (1992), no. 1, 1–58.
A. E. Hoerl and R. W. Kennard, "Ridge regression: Biased estimation for nonorthogonal problems," Technometrics 12 (1970), no. 1, 55–67.
G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," in Proceedings of the Spring Joint Computer Conference, ACM, 1967, pp. 483–485.
M. Frigo and S. G. Johnson, "FFTW: An adaptive software architecture for the FFT," in Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 3, 1998, pp. 1381–1384.

How Analysis Can Teach Us the Optimal Way to Design Neural Operators

Year 2024, Volume: 6 Issue: 2, 77 - 99, 28.12.2024

Vu-anh Le , Mehmet Dik

https://doi.org/10.47086/pims.1579364

Abstract

This paper presents a mathematics-informed approach to neural operator design, building upon the theoretical framework established in our prior work. By integrating rigorous mathematical analysis with practical design strategies, we aim to enhance the stability, convergence, generalization, and computational efficiency of neural operators. We revisit key theoretical insights, including stability in high dimensions, exponential convergence, and universality of neural operators. Based on these insights, we provide detailed design recommendations, each supported by mathematical proofs and citations. Our contributions offer a systematic methodology for developing next-gen neural operators with improved performance and reliability.

Keywords

neural operators, functional analysis, convergence, stability, generalization

Ethical Statement

The authors bind no conflicting interests.

Supporting Institution

Google Research

Thanks

We thank the Google Research Division of Goggle Inc., for providing resources and mentorships so the student intern Vu-Anh may conduct this project.

References

V.-A. Le and M. Dik, "A mathematical analysis of neural operator behaviors," arXiv preprint arXiv:2410.21481, 2024.
N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, and A. Anandkumar, "Neural operator: Learning maps between function spaces," SIAM J. Sci. Comput. 43 (2021), no. 5, A3172–A3192.
Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar, "Fourier neural operator for parametric partial differential equations," in Proceedings of the International Conference on Learning Representations (ICLR), 2021. Available at [https://openreview.net/forum?id=c8P9NQVtmnO](https://openreview.net/forum?id=c8P9NQVtmnO).
L. Lu, P. Jin, and G. E. Karniadakis, "DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators," arXiv preprint arXiv:1910.03193, 2019.
S. Banach, "Sur les opérations dans les ensembles abstraits et leur application aux équations intégrales," Fund. Math. 3 (1922), 133–181.
I. Daubechies, Ten Lectures on Wavelets, SIAM, Philadelphia, PA, 1992.
Y. Zhao and S. Sun, "Wavelet neural operator: A neural operator based on the wavelet transform," arXiv preprint arXiv:2201.12086, 2022.
T.-S. Chen and H.-Y. Chen, "Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems," IEEE Trans. Neural Networks 6 (1995), no. 4, 911–917.
M. Raghu, B. Poole, J. Kleinberg, S. Ganguli, and J. Sohl-Dickstein, "On the expressive power of deep neural networks," in Proceedings of the 34th International Conference on Machine Learning, vol. 70, PMLR, 2017, pp. 2847–2854.
K. Hornik, M. Stinchcombe, and H. White, "Multilayer feedforward networks are universal approximators," Neural Networks 2 (1989), no. 5, 359–366.
I. Goodfellow, Y. Bengio, and A. Courville, “Deep Learning,” MIT Press, 2016. Available at [http://www.deeplearningbook.org](http://www.deeplearningbook.org).
A. Krogh and J. A. Hertz, “A simple weight decay can improve generalization,” in Advances in Neural Information Processing Systems 4, Morgan-Kaufmann, 1992, pp. 950–957.
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A simple way to prevent neural networks from overfitting," J. Mach. Learn. Res. 15 (2014), 1929–1958.
T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, "Spectral normalization for generative adversarial networks," arXiv preprint arXiv:1802.05957, 2018.
M. Mohri, A. Rostamizadeh, and A. Talwalkar, “Foundations of Machine Learning,” MIT Press, 2018.
N. Cohen, O. Sharir, and A. Shashua, "On the expressive power of deep learning: A tensor analysis," in Proceedings of the 29th Annual Conference on Learning Theory, vol. 49, PMLR, 2016, pp. 698–728.
J. W. Cooley and J. W. Tukey, "An algorithm for the machine calculation of complex Fourier series," Math. Comp. 19 (1965), no. 90, 297–301.
W. Rudin, “Principles of Mathematical Analysis,” 3rd ed., McGraw-Hill, New York, 1976.
P. L. Bartlett, D. J. Foster, and M. J. Telgarsky, "Spectrally-normalized margin bounds for neural networks," arXiv preprint arXiv:1706.08498, 2017.
D. Jackson, “The Theory of Approximation,” American Mathematical Society, Providence, RI, 1930.
S. Mallat, “A Wavelet Tour of Signal Processing,” 2nd ed., Academic Press, San Diego, CA, 1999.
S. G. Mallat, "A theory for multiresolution signal decomposition: The wavelet representation," IEEE Trans. Pattern Anal. Mach. Intell. 11 (1989), no. 7, 674–693.
E. C. Titchmarsh, “Introduction to the Theory of Fourier Integrals,” Oxford University Press, Oxford, 1948.
A. Cohen, “Numerical Analysis of Wavelet Methods,” Elsevier, 2003.
D. L. Donoho, “Compressed sensing,” IEEE Trans. Inform. Theory 52 (2006), no. 4, 1289–1306.
G. F. Montúfar, R. Pascanu, K. Cho, and Y. Bengio, "On the number of linear regions of deep neural networks," in Advances in Neural Information Processing Systems 27, Curran Associates, Inc., 2014, pp. 2924–2932.
T. Serra, C. Tjandraatmadja, and S. Ramalingam, "Bounding and counting linear regions of deep neural networks," in Proceedings of the 35th International Conference on Machine Learning, 2018, pp. 4558–4566.
S. Geman, E. Bienenstock, and R. Doursat, “Neural networks and the bias/variance dilemma,” Neural Comput. 4 (1992), no. 1, 1–58.
A. E. Hoerl and R. W. Kennard, "Ridge regression: Biased estimation for nonorthogonal problems," Technometrics 12 (1970), no. 1, 55–67.
G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," in Proceedings of the Spring Joint Computer Conference, ACM, 1967, pp. 483–485.
M. Frigo and S. G. Johnson, "FFTW: An adaptive software architecture for the FFT," in Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 3, 1998, pp. 1381–1384.

There are 31 citations in total.

Details

Primary Language	English
Subjects	Mathematical Methods and Special Functions, Approximation Theory and Asymptotic Methods
Journal Section	Articles
Authors	Vu-anh Le 0009-0000-1904-5186 Mehmet Dik 0000-0003-0643-2771
Early Pub Date	December 23, 2024
Publication Date	December 28, 2024
Submission Date	November 4, 2024
Acceptance Date	November 12, 2024
Published in Issue	Year 2024 Volume: 6 Issue: 2

Cite

Download Cover Image

Article Files

Full Text

The published articles in PIMS are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.