Research Article

Topology-Preserving Scaling in Data Augmentation

Volume: 7 Number: 1 April 30, 2025
EN

Topology-Preserving Scaling in Data Augmentation

Abstract

We propose an algorithmic framework for dataset normalization in data augmentation pipelines that preserves topological stability under non-uniform scaling transformations. Given a finite metric space \( X \subset \mathbb{R}^n \) with Euclidean distance \( d_X \), we consider scaling transformations defined by scaling factors \( s_1, s_2, \ldots, s_n > 0 \). Specifically, we define a scaling function \( S \) that maps each point \( x = (x_1, x_2, \ldots, x_n) \in X \) to \[ S(x) = (s_1 x_1, s_2 x_2, \ldots, s_n x_n). \] Our main result establishes that the bottleneck distance \( d_B(D, D_S) \) between the persistence diagrams \( D \) of \( X \) and \( D_S \) of \( S(X) \) satisfies: \[ d_B(D, D_S) \leq (s_{\max} - s_{\min}) \cdot \operatorname{diam}(X), \] where \( s_{\min} = \min_{1 \leq i \leq n} s_i \), \( s_{\max} = \max_{1 \leq i \leq n} s_i \), and \( \operatorname{diam}(X) \) is the diameter of \( X \). Based on this theoretical guarantee, we formulate an optimization problem to minimize the scaling variability \( \Delta_s = s_{\max} - s_{\min} \) under the constraint \( d_B(D, D_S) \leq \epsilon \), where \( \epsilon > 0 \) is a user-defined tolerance. We develop an algorithmic solution to this problem, ensuring that data augmentation via scaling transformations preserves essential topological features. We further extend our analysis to higher-dimensional homological features, alternative metrics such as the Wasserstein distance, and iterative or probabilistic scaling scenarios. Our contributions provide a rigorous mathematical framework for dataset normalization in data augmentation pipelines, ensuring that essential topological characteristics are maintained despite scaling transformations.

Keywords

Ethical Statement

The authors bind no conflicting interests.

References

  1. A. Zomorodian and G. Carlsson, Computing persistent homology, Discrete & Comput. Geom. 33 (2005), no. 2, 249–274. doi:10.1007/s00454-004-1146-y. Available at [https://arxiv.org/abs/cs/0306106](https://arxiv.org/abs/cs/0306106).
  2. H. Edelsbrunner and J. Harer, Persistent homology—a survey, Contemp. Math. 453 (2008), 257–282. doi:10.1090/conm/453/08802. Available at [https://www.cs.duke.edu/~jeb/CTIC/cti.pdf](https://www.cs.duke.edu/~jeb/CTIC/cti.pdf).
  3. D. Cohen-Steiner, H. Edelsbrunner, and J. Harer, Stability of persistence diagrams, Discrete & Comput. Geom. 37 (2007), no. 1, 103–120. doi:10.1007/s00454-006-1276-5. Available at [https://arxiv.org/abs/math/0510337](https://arxiv.org/abs/math/0510337).
  4. G. Carlsson and V. de Silva, Zigzag persistence, Found. Comput. Math. 10 (2010), no. 4, 367–405. doi:10.1007/s10208-010-9066-0. Available at [https://arxiv.org/abs/math/0604450](https://arxiv.org/abs/math/0604450).
  5. M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, Spatial transformer networks, in Advances in Neural Information Processing Systems, vol. 28, 2017–2025 (2015). Available at [https://arxiv.org/abs/1506.02025](https://arxiv.org/abs/1506.02025).
  6. S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proceedings of the 32nd International Conference on Machine Learning, 448–456 (2015). Available at [https://arxiv.org/abs/1502.03167](https://arxiv.org/abs/1502.03167).
  7. U. Bauer and M. Lesnick, Induced matchings of barcodes and the algebraic stability of persistence, J. Comput. Geom. 6 (2015), no. 2, 162–191. doi:10.20382/jocg.v6i2a7. Available at [https://www.jocg.org/index.php/jocg/article/view/285](https://www.jocg.org/index.php/joc g/article/view/285).

Details

Primary Language

English

Subjects

Operations Research İn Mathematics

Journal Section

Research Article

Authors

Vu-anh Le *
United States

Mehmet Dik
United States

Publication Date

April 30, 2025

Submission Date

January 7, 2025

Acceptance Date

January 31, 2025

Published in Issue

Year 2025 Volume: 7 Number: 1

APA
Le, V.- anh, & Dik, M. (2025). Topology-Preserving Scaling in Data Augmentation. Maltepe Journal of Mathematics, 7(1), 9-26. https://doi.org/10.47087/mjm.1615296
AMA
1.Le V anh, Dik M. Topology-Preserving Scaling in Data Augmentation. Maltepe Journal of Mathematics. 2025;7(1):9-26. doi:10.47087/mjm.1615296
Chicago
Le, Vu-anh, and Mehmet Dik. 2025. “Topology-Preserving Scaling in Data Augmentation”. Maltepe Journal of Mathematics 7 (1): 9-26. https://doi.org/10.47087/mjm.1615296.
EndNote
Le V- anh, Dik M (April 1, 2025) Topology-Preserving Scaling in Data Augmentation. Maltepe Journal of Mathematics 7 1 9–26.
IEEE
[1]V.- anh Le and M. Dik, “Topology-Preserving Scaling in Data Augmentation”, Maltepe Journal of Mathematics, vol. 7, no. 1, pp. 9–26, Apr. 2025, doi: 10.47087/mjm.1615296.
ISNAD
Le, Vu-anh - Dik, Mehmet. “Topology-Preserving Scaling in Data Augmentation”. Maltepe Journal of Mathematics 7/1 (April 1, 2025): 9-26. https://doi.org/10.47087/mjm.1615296.
JAMA
1.Le V- anh, Dik M. Topology-Preserving Scaling in Data Augmentation. Maltepe Journal of Mathematics. 2025;7:9–26.
MLA
Le, Vu-anh, and Mehmet Dik. “Topology-Preserving Scaling in Data Augmentation”. Maltepe Journal of Mathematics, vol. 7, no. 1, Apr. 2025, pp. 9-26, doi:10.47087/mjm.1615296.
Vancouver
1.Vu-anh Le, Mehmet Dik. Topology-Preserving Scaling in Data Augmentation. Maltepe Journal of Mathematics. 2025 Apr. 1;7(1):9-26. doi:10.47087/mjm.1615296

Creative Commons License
The published articles in MJM are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

ISSN 2667-7660