Topology-Preserving Scaling in Data Augmentation

Vu-anh Le; Mehmet Dik

doi:10.47087/mjm.1615296

EN

Topology-Preserving Scaling in Data Augmentation

Abstract

We propose an algorithmic framework for dataset normalization in data augmentation pipelines that preserves topological stability under non-uniform scaling transformations. Given a finite metric space \( X \subset \mathbb{R}^n \) with Euclidean distance \( d_X \), we consider scaling transformations defined by scaling factors \( s_1, s_2, \ldots, s_n > 0 \). Specifically, we define a scaling function \( S \) that maps each point \( x = (x_1, x_2, \ldots, x_n) \in X \) to \[ S(x) = (s_1 x_1, s_2 x_2, \ldots, s_n x_n). \] Our main result establishes that the bottleneck distance \( d_B(D, D_S) \) between the persistence diagrams \( D \) of \( X \) and \( D_S \) of \( S(X) \) satisfies: \[ d_B(D, D_S) \leq (s_{\max} - s_{\min}) \cdot \operatorname{diam}(X), \] where \( s_{\min} = \min_{1 \leq i \leq n} s_i \), \( s_{\max} = \max_{1 \leq i \leq n} s_i \), and \( \operatorname{diam}(X) \) is the diameter of \( X \). Based on this theoretical guarantee, we formulate an optimization problem to minimize the scaling variability \( \Delta_s = s_{\max} - s_{\min} \) under the constraint \( d_B(D, D_S) \leq \epsilon \), where \( \epsilon > 0 \) is a user-defined tolerance. We develop an algorithmic solution to this problem, ensuring that data augmentation via scaling transformations preserves essential topological features. We further extend our analysis to higher-dimensional homological features, alternative metrics such as the Wasserstein distance, and iterative or probabilistic scaling scenarios. Our contributions provide a rigorous mathematical framework for dataset normalization in data augmentation pipelines, ensuring that essential topological characteristics are maintained despite scaling transformations.

Keywords

Ethical Statement

The authors bind no conflicting interests.

References

A. Zomorodian and G. Carlsson, Computing persistent homology, Discrete & Comput. Geom. 33 (2005), no. 2, 249–274. doi:10.1007/s00454-004-1146-y. Available at [https://arxiv.org/abs/cs/0306106](https://arxiv.org/abs/cs/0306106).
H. Edelsbrunner and J. Harer, Persistent homology—a survey, Contemp. Math. 453 (2008), 257–282. doi:10.1090/conm/453/08802. Available at [https://www.cs.duke.edu/~jeb/CTIC/cti.pdf](https://www.cs.duke.edu/~jeb/CTIC/cti.pdf).
D. Cohen-Steiner, H. Edelsbrunner, and J. Harer, Stability of persistence diagrams, Discrete & Comput. Geom. 37 (2007), no. 1, 103–120. doi:10.1007/s00454-006-1276-5. Available at [https://arxiv.org/abs/math/0510337](https://arxiv.org/abs/math/0510337).
G. Carlsson and V. de Silva, Zigzag persistence, Found. Comput. Math. 10 (2010), no. 4, 367–405. doi:10.1007/s10208-010-9066-0. Available at [https://arxiv.org/abs/math/0604450](https://arxiv.org/abs/math/0604450).
M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, Spatial transformer networks, in Advances in Neural Information Processing Systems, vol. 28, 2017–2025 (2015). Available at [https://arxiv.org/abs/1506.02025](https://arxiv.org/abs/1506.02025).
S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proceedings of the 32nd International Conference on Machine Learning, 448–456 (2015). Available at [https://arxiv.org/abs/1502.03167](https://arxiv.org/abs/1502.03167).
U. Bauer and M. Lesnick, Induced matchings of barcodes and the algebraic stability of persistence, J. Comput. Geom. 6 (2015), no. 2, 162–191. doi:10.20382/jocg.v6i2a7. Available at [https://www.jocg.org/index.php/jocg/article/view/285](https://www.jocg.org/index.php/joc g/article/view/285).

Details

Primary Language

English

Subjects

Operations Research İn Mathematics

Journal Section

Research Article

Authors

Vu-anh Le ^*
United States

Mehmet Dik
United States

Publication Date

April 30, 2025

Submission Date

January 7, 2025

Acceptance Date

January 31, 2025

Published in Issue

Year 2025 Volume: 7 Number: 1

DOI

https://doi.org/10.47087/mjm.1615296

IZ

https://izlik.org/JA23TN62KE

Cite

RIS / Bibtex

APA

Le, V.- anh, & Dik, M. (2025). Topology-Preserving Scaling in Data Augmentation. Maltepe Journal of Mathematics, 7(1), 9-26. https://doi.org/10.47087/mjm.1615296

AMA

1.Le V anh, Dik M. Topology-Preserving Scaling in Data Augmentation. Maltepe Journal of Mathematics. 2025;7(1):9-26. doi:10.47087/mjm.1615296

Chicago

Le, Vu-anh, and Mehmet Dik. 2025. “Topology-Preserving Scaling in Data Augmentation”. Maltepe Journal of Mathematics 7 (1): 9-26. https://doi.org/10.47087/mjm.1615296.

EndNote

Le V- anh, Dik M (April 1, 2025) Topology-Preserving Scaling in Data Augmentation. Maltepe Journal of Mathematics 7 1 9–26.

IEEE

[1]V.- anh Le and M. Dik, “Topology-Preserving Scaling in Data Augmentation”, Maltepe Journal of Mathematics, vol. 7, no. 1, pp. 9–26, Apr. 2025, doi: 10.47087/mjm.1615296.

ISNAD

Le, Vu-anh - Dik, Mehmet. “Topology-Preserving Scaling in Data Augmentation”. Maltepe Journal of Mathematics 7/1 (April 1, 2025): 9-26. https://doi.org/10.47087/mjm.1615296.

JAMA

1.Le V- anh, Dik M. Topology-Preserving Scaling in Data Augmentation. Maltepe Journal of Mathematics. 2025;7:9–26.

MLA

Le, Vu-anh, and Mehmet Dik. “Topology-Preserving Scaling in Data Augmentation”. Maltepe Journal of Mathematics, vol. 7, no. 1, Apr. 2025, pp. 9-26, doi:10.47087/mjm.1615296.

Vancouver

1.Vu-anh Le, Mehmet Dik. Topology-Preserving Scaling in Data Augmentation. Maltepe Journal of Mathematics. 2025 Apr. 1;7(1):9-26. doi:10.47087/mjm.1615296