We propose an algorithmic framework for dataset normalization in data augmentation pipelines that preserves topological stability under non-uniform scaling transformations. Given a finite metric space \( X \subset \mathbb{R}^n \) with Euclidean distance \( d_X \), we consider scaling transformations defined by scaling factors \( s_1, s_2, \ldots, s_n > 0 \). Specifically, we define a scaling function \( S \) that maps each point \( x = (x_1, x_2, \ldots, x_n) \in X \) to
\[
S(x) = (s_1 x_1, s_2 x_2, \ldots, s_n x_n).
\]
Our main result establishes that the bottleneck distance \( d_B(D, D_S) \) between the persistence diagrams \( D \) of \( X \) and \( D_S \) of \( S(X) \) satisfies:
\[
d_B(D, D_S) \leq (s_{\max} - s_{\min}) \cdot \operatorname{diam}(X),
\]
where \( s_{\min} = \min_{1 \leq i \leq n} s_i \), \( s_{\max} = \max_{1 \leq i \leq n} s_i \), and \( \operatorname{diam}(X) \) is the diameter of \( X \). Based on this theoretical guarantee, we formulate an optimization problem to minimize the scaling variability \( \Delta_s = s_{\max} - s_{\min} \) under the constraint \( d_B(D, D_S) \leq \epsilon \), where \( \epsilon > 0 \) is a user-defined tolerance.
We develop an algorithmic solution to this problem, ensuring that data augmentation via scaling transformations preserves essential topological features. We further extend our analysis to higher-dimensional homological features, alternative metrics such as the Wasserstein distance, and iterative or probabilistic scaling scenarios. Our contributions provide a rigorous mathematical framework for dataset normalization in data augmentation pipelines, ensuring that essential topological characteristics are maintained despite scaling transformations.
The authors bind no conflicting interests.
Primary Language | English |
---|---|
Subjects | Operations Research İn Mathematics |
Journal Section | Articles |
Authors | |
Publication Date | April 30, 2025 |
Submission Date | January 7, 2025 |
Acceptance Date | January 31, 2025 |
Published in Issue | Year 2025 Volume: 7 Issue: 1 |
The published articles in MJM are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
ISSN 2667-7660