Research Article
BibTex RIS Cite

Topology-Preserving Scaling in Data Augmentation

Year 2025, Volume: 7 Issue: 1, 9 - 26, 30.04.2025
https://doi.org/10.47087/mjm.1615296

Abstract

We propose an algorithmic framework for dataset normalization in data augmentation pipelines that preserves topological stability under non-uniform scaling transformations. Given a finite metric space \( X \subset \mathbb{R}^n \) with Euclidean distance \( d_X \), we consider scaling transformations defined by scaling factors \( s_1, s_2, \ldots, s_n > 0 \). Specifically, we define a scaling function \( S \) that maps each point \( x = (x_1, x_2, \ldots, x_n) \in X \) to
\[
S(x) = (s_1 x_1, s_2 x_2, \ldots, s_n x_n).
\]
Our main result establishes that the bottleneck distance \( d_B(D, D_S) \) between the persistence diagrams \( D \) of \( X \) and \( D_S \) of \( S(X) \) satisfies:
\[
d_B(D, D_S) \leq (s_{\max} - s_{\min}) \cdot \operatorname{diam}(X),
\]
where \( s_{\min} = \min_{1 \leq i \leq n} s_i \), \( s_{\max} = \max_{1 \leq i \leq n} s_i \), and \( \operatorname{diam}(X) \) is the diameter of \( X \). Based on this theoretical guarantee, we formulate an optimization problem to minimize the scaling variability \( \Delta_s = s_{\max} - s_{\min} \) under the constraint \( d_B(D, D_S) \leq \epsilon \), where \( \epsilon > 0 \) is a user-defined tolerance.

We develop an algorithmic solution to this problem, ensuring that data augmentation via scaling transformations preserves essential topological features. We further extend our analysis to higher-dimensional homological features, alternative metrics such as the Wasserstein distance, and iterative or probabilistic scaling scenarios. Our contributions provide a rigorous mathematical framework for dataset normalization in data augmentation pipelines, ensuring that essential topological characteristics are maintained despite scaling transformations.

Ethical Statement

The authors bind no conflicting interests.

References

  • A. Zomorodian and G. Carlsson, Computing persistent homology, Discrete & Comput. Geom. 33 (2005), no. 2, 249–274. doi:10.1007/s00454-004-1146-y. Available at [https://arxiv.org/abs/cs/0306106](https://arxiv.org/abs/cs/0306106).
  • H. Edelsbrunner and J. Harer, Persistent homology—a survey, Contemp. Math. 453 (2008), 257–282. doi:10.1090/conm/453/08802. Available at [https://www.cs.duke.edu/~jeb/CTIC/cti.pdf](https://www.cs.duke.edu/~jeb/CTIC/cti.pdf).
  • D. Cohen-Steiner, H. Edelsbrunner, and J. Harer, Stability of persistence diagrams, Discrete & Comput. Geom. 37 (2007), no. 1, 103–120. doi:10.1007/s00454-006-1276-5. Available at [https://arxiv.org/abs/math/0510337](https://arxiv.org/abs/math/0510337).
  • G. Carlsson and V. de Silva, Zigzag persistence, Found. Comput. Math. 10 (2010), no. 4, 367–405. doi:10.1007/s10208-010-9066-0. Available at [https://arxiv.org/abs/math/0604450](https://arxiv.org/abs/math/0604450).
  • M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, Spatial transformer networks, in Advances in Neural Information Processing Systems, vol. 28, 2017–2025 (2015). Available at [https://arxiv.org/abs/1506.02025](https://arxiv.org/abs/1506.02025).
  • S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proceedings of the 32nd International Conference on Machine Learning, 448–456 (2015). Available at [https://arxiv.org/abs/1502.03167](https://arxiv.org/abs/1502.03167).
  • U. Bauer and M. Lesnick, Induced matchings of barcodes and the algebraic stability of persistence, J. Comput. Geom. 6 (2015), no. 2, 162–191. doi:10.20382/jocg.v6i2a7. Available at [https://www.jocg.org/index.php/jocg/article/view/285](https://www.jocg.org/index.php/joc g/article/view/285).

Year 2025, Volume: 7 Issue: 1, 9 - 26, 30.04.2025
https://doi.org/10.47087/mjm.1615296

Abstract

References

  • A. Zomorodian and G. Carlsson, Computing persistent homology, Discrete & Comput. Geom. 33 (2005), no. 2, 249–274. doi:10.1007/s00454-004-1146-y. Available at [https://arxiv.org/abs/cs/0306106](https://arxiv.org/abs/cs/0306106).
  • H. Edelsbrunner and J. Harer, Persistent homology—a survey, Contemp. Math. 453 (2008), 257–282. doi:10.1090/conm/453/08802. Available at [https://www.cs.duke.edu/~jeb/CTIC/cti.pdf](https://www.cs.duke.edu/~jeb/CTIC/cti.pdf).
  • D. Cohen-Steiner, H. Edelsbrunner, and J. Harer, Stability of persistence diagrams, Discrete & Comput. Geom. 37 (2007), no. 1, 103–120. doi:10.1007/s00454-006-1276-5. Available at [https://arxiv.org/abs/math/0510337](https://arxiv.org/abs/math/0510337).
  • G. Carlsson and V. de Silva, Zigzag persistence, Found. Comput. Math. 10 (2010), no. 4, 367–405. doi:10.1007/s10208-010-9066-0. Available at [https://arxiv.org/abs/math/0604450](https://arxiv.org/abs/math/0604450).
  • M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, Spatial transformer networks, in Advances in Neural Information Processing Systems, vol. 28, 2017–2025 (2015). Available at [https://arxiv.org/abs/1506.02025](https://arxiv.org/abs/1506.02025).
  • S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proceedings of the 32nd International Conference on Machine Learning, 448–456 (2015). Available at [https://arxiv.org/abs/1502.03167](https://arxiv.org/abs/1502.03167).
  • U. Bauer and M. Lesnick, Induced matchings of barcodes and the algebraic stability of persistence, J. Comput. Geom. 6 (2015), no. 2, 162–191. doi:10.20382/jocg.v6i2a7. Available at [https://www.jocg.org/index.php/jocg/article/view/285](https://www.jocg.org/index.php/joc g/article/view/285).
There are 7 citations in total.

Details

Primary Language English
Subjects Operations Research İn Mathematics
Journal Section Articles
Authors

Vu-anh Le

Mehmet Dik

Publication Date April 30, 2025
Submission Date January 7, 2025
Acceptance Date January 31, 2025
Published in Issue Year 2025 Volume: 7 Issue: 1

Cite

APA Le, V.- anh, & Dik, M. (2025). Topology-Preserving Scaling in Data Augmentation. Maltepe Journal of Mathematics, 7(1), 9-26. https://doi.org/10.47087/mjm.1615296
AMA Le V anh, Dik M. Topology-Preserving Scaling in Data Augmentation. Maltepe Journal of Mathematics. April 2025;7(1):9-26. doi:10.47087/mjm.1615296
Chicago Le, Vu-anh, and Mehmet Dik. “Topology-Preserving Scaling in Data Augmentation”. Maltepe Journal of Mathematics 7, no. 1 (April 2025): 9-26. https://doi.org/10.47087/mjm.1615296.
EndNote Le V- anh, Dik M (April 1, 2025) Topology-Preserving Scaling in Data Augmentation. Maltepe Journal of Mathematics 7 1 9–26.
IEEE V.- anh Le and M. Dik, “Topology-Preserving Scaling in Data Augmentation”, Maltepe Journal of Mathematics, vol. 7, no. 1, pp. 9–26, 2025, doi: 10.47087/mjm.1615296.
ISNAD Le, Vu-anh - Dik, Mehmet. “Topology-Preserving Scaling in Data Augmentation”. Maltepe Journal of Mathematics 7/1 (April2025), 9-26. https://doi.org/10.47087/mjm.1615296.
JAMA Le V- anh, Dik M. Topology-Preserving Scaling in Data Augmentation. Maltepe Journal of Mathematics. 2025;7:9–26.
MLA Le, Vu-anh and Mehmet Dik. “Topology-Preserving Scaling in Data Augmentation”. Maltepe Journal of Mathematics, vol. 7, no. 1, 2025, pp. 9-26, doi:10.47087/mjm.1615296.
Vancouver Le V- anh, Dik M. Topology-Preserving Scaling in Data Augmentation. Maltepe Journal of Mathematics. 2025;7(1):9-26.

Creative Commons License
The published articles in MJM are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

ISSN 2667-7660