A Unified Evaluation Framework for Utility and Privacy Risks of LLM-Generated Synthetic Text Data
Abstract
The increasing use of Large Language Models (LLMs) has enabled the generation of high-quality synthetic text, providing a potential alternative to sensitive real-world datasets in domains where privacy concerns limit data sharing. However, synthetic text is not inherently privacy safe. Fine-tuning generative models on domain-specific data can enhance semantic fidelity while simultaneously increasing the risk of memorization and information leaks. In this work, we propose a unified evaluation framework to systematically analyze the connection between utility and privacy risk in LLM-generated synthetic text. Our framework combines semantic utility metrics and practical privacy attacks within a single, controlled pipeline. The key novelty of the proposed framework is its joint evaluation of utility and privacy within a single experimental pipeline. Unlike prior studies that often assess text quality and privacy risk separately, our framework jointly measures semantic fidelity, distributional alignment, memorization behavior, and membership inference vulnerability under the same controlled protocol, enabling direct analysis of the utility–privacy trade-off in synthetic text generation. We empirically evaluate the framework using GPT-2 fine-tuned on two datasets: AG News as a general-domain benchmark and PubMed abstracts as a biomedical-domain validation dataset. Results show that fine-tuning improves semantic utility but also increases empirical privacy risk. On AG News, BERTScore increases to 0.81, while membership inference ROC-AUC rises from 0.45 to 0.64. The PubMed experiment shows the same directional trend, with improved semantic fidelity accompanied by higher canary memorization and membership inference vulnerability. Additionally, canary exposure analysis indicates clear memorization of rare sequences after fine-tuning. These findings demonstrate a measurable trade-off between utility and privacy in synthetic text generation and highlight the importance of jointly evaluating both dimensions. The proposed framework provides a reproducible methodology for assessing the privacy risks of high-quality synthetic text and supports more responsible deployment of LLM-based synthetic data systems.
Keywords
References
- Çelik, A., & Kunt, M. A. (2026). A study on predicting engine performance outputs by machine learning algorithms in a single cylinder HCCI engine. Engineering Perspective, 6(1), 57–68. https://doi.org/10.64808/engineeringperspective.1782349
- Aloui, K. (2025). AI-driven unified SysML–RoadRunner integration approach: An intelligent bridge between MBSE and 3D simulation for autonomous vehicle. Engineering Perspective, 5(4), 129–148. https://doi.org/10.64808/engineeringperspective.1760896
- Sun, Y., Schlegel, V., Nandakumar, S., Zahid, I., Wu, Y., Wu, Y., Li, H., Zhang, J., Del-Pinto, W., Nenadic, G., Lam, S. K., & Bharath, A. A. (2025). SynBench: A benchmark for differentially private text generation. arXiv. https://doi.org/10.48550/arXiv.2509.14594
- Zhao, X., Li, L., & Wang, Y.-X. (2022). Provably confidential language modelling. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 943–955). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.naacl-main.69
- Yuan, Y., Liu, Y., & Cheng, L. (2025). A multi-faceted evaluation framework for assessing synthetic data generated by large language models. arXiv. https://doi.org/10.48550/arXiv.2404.14445
- Kurakin, A., Ponomareva, N., Syed, U., MacDermed, L., & Terzis, A. (2024). Harnessing large-language models to generate private synthetic text. arXiv. https://doi.org/10.48550/arXiv.2306.01684
- Carranza, A., Farahani, R., Ponomareva, N., Kurakin, A., Jagielski, M., & Nasr, M. (2024). Synthetic query generation for privacy-preserving deep retrieval systems using differentially private language models. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) (pp. 3920–3930). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.naacl-long.217
- Yue, X., Inan, H., Li, X., Kumar, G., McAnallen, J., Shajari, H., Sun, H., Levitan, D., & Sim, R. (2023). Synthetic text generation with differential privacy: A simple and practical recipe. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1321–1342). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.74
Details
Primary Language
English
Subjects
Electrical Engineering (Other)
Journal Section
Research Article
Publication Date
June 6, 2026
Submission Date
March 16, 2026
Acceptance Date
June 1, 2026
Published in Issue
Year 2026 Volume: 6 Number: 3