A Unified Evaluation Framework for Utility and Privacy Risks of LLM-Generated Synthetic Text Data

Lubana Isaoglu; Zeynep Orman

doi:10.64808/engineeringperspective.1910777

A Unified Evaluation Framework for Utility and Privacy Risks of LLM-Generated Synthetic Text Data

Abstract

The increasing use of Large Language Models (LLMs) has enabled the generation of high-quality synthetic text, providing a potential alternative to sensitive real-world datasets in domains where privacy concerns limit data sharing. However, synthetic text is not inherently privacy safe. Fine-tuning generative models on domain-specific data can enhance semantic fidelity while simultaneously increasing the risk of memorization and information leaks. In this work, we propose a unified evaluation framework to systematically analyze the connection between utility and privacy risk in LLM-generated synthetic text. Our framework combines semantic utility metrics and practical privacy attacks within a single, controlled pipeline. The key novelty of the proposed framework is its joint evaluation of utility and privacy within a single experimental pipeline. Unlike prior studies that often assess text quality and privacy risk separately, our framework jointly measures semantic fidelity, distributional alignment, memorization behavior, and membership inference vulnerability under the same controlled protocol, enabling direct analysis of the utility–privacy trade-off in synthetic text generation. We empirically evaluate the framework using GPT-2 fine-tuned on two datasets: AG News as a general-domain benchmark and PubMed abstracts as a biomedical-domain validation dataset. Results show that fine-tuning improves semantic utility but also increases empirical privacy risk. On AG News, BERTScore increases to 0.81, while membership inference ROC-AUC rises from 0.45 to 0.64. The PubMed experiment shows the same directional trend, with improved semantic fidelity accompanied by higher canary memorization and membership inference vulnerability. Additionally, canary exposure analysis indicates clear memorization of rare sequences after fine-tuning. These findings demonstrate a measurable trade-off between utility and privacy in synthetic text generation and highlight the importance of jointly evaluating both dimensions. The proposed framework provides a reproducible methodology for assessing the privacy risks of high-quality synthetic text and supports more responsible deployment of LLM-based synthetic data systems.

Keywords

References

Çelik, A., & Kunt, M. A. (2026). A study on predicting engine performance outputs by machine learning algorithms in a single cylinder HCCI engine. Engineering Perspective, 6(1), 57–68. https://doi.org/10.64808/engineeringperspective.1782349
Aloui, K. (2025). AI-driven unified SysML–RoadRunner integration approach: An intelligent bridge between MBSE and 3D simulation for autonomous vehicle. Engineering Perspective, 5(4), 129–148. https://doi.org/10.64808/engineeringperspective.1760896
Sun, Y., Schlegel, V., Nandakumar, S., Zahid, I., Wu, Y., Wu, Y., Li, H., Zhang, J., Del-Pinto, W., Nenadic, G., Lam, S. K., & Bharath, A. A. (2025). SynBench: A benchmark for differentially private text generation. arXiv. https://doi.org/10.48550/arXiv.2509.14594
Zhao, X., Li, L., & Wang, Y.-X. (2022). Provably confidential language modelling. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 943–955). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.naacl-main.69
Yuan, Y., Liu, Y., & Cheng, L. (2025). A multi-faceted evaluation framework for assessing synthetic data generated by large language models. arXiv. https://doi.org/10.48550/arXiv.2404.14445
Kurakin, A., Ponomareva, N., Syed, U., MacDermed, L., & Terzis, A. (2024). Harnessing large-language models to generate private synthetic text. arXiv. https://doi.org/10.48550/arXiv.2306.01684
Carranza, A., Farahani, R., Ponomareva, N., Kurakin, A., Jagielski, M., & Nasr, M. (2024). Synthetic query generation for privacy-preserving deep retrieval systems using differentially private language models. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) (pp. 3920–3930). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.naacl-long.217
Yue, X., Inan, H., Li, X., Kumar, G., McAnallen, J., Shajari, H., Sun, H., Levitan, D., & Sim, R. (2023). Synthetic text generation with differential privacy: A simple and practical recipe. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1321–1342). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.74

Utpala, S., Hooker, S., & Chen, P.-Y. (2023). Locally differentially private document generation using zero shot prompting. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 8442–8457). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.findings-emnlp.566
Meisenbacher, S., & Matthes, F. (2024). Thinking outside of the differential privacy box: A case study in text privatization with language model prompting. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 5656–5665). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.emnlp-main.324
Ramesh, K., Gandhi, N., Madaan, P., Bauer, L., Peris, C., & Field, A. (2024). Evaluating differentially private synthetic data generation in high-stakes domains. In Findings of the Association for Computational Linguistics: EMNLP 2024 (pp. 15254–15269). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.findings-emnlp.894
Zecevic, A., Zhang, X., Zeki, S., & Roberts, A. (2024). Generation and evaluation of synthetic endoscopy free-text reports with differential privacy. In Proceedings of the 23rd Workshop on Biomedical Natural Language Processing (pp. 14–24). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.bionlp-1.2
Tang, X., Shin, R., Inan, H. A., Manoel, A., Mireshghallah, F., Lin, Z., Gopi, S., Kulkarni, J., & Sim, R. (2024). Privacy-preserving in-context learning with differentially private few-shot generation. International Conference on Learning Representations. https://doi.org/10.48550/arXiv.2309.11765
Gao, F., Zhou, R., Wang, T., Shen, C., & Yang, J. (2025). Data-adaptive differentially private prompt synthesis for in-context learning. International Conference on Learning Representations. https://doi.org/10.48550/arXiv.2410.12085
Vinod, V., Pillutla, K., & Thakurta, A. G. (2025). InvisibleInk: High-utility and low-cost text generation with differential privacy. Advances in Neural Information Processing Systems, 38. https://doi.org/10.48550/arXiv.2507.02974
Xie, C., Lin, Z., Backurs, A., Gopi, S., Yu, D., Inan, H. A., Nori, H., Jiang, H., Zhang, H., Lee, Y. T., Li, B., & Yekhanin, S. (2024). Differentially private synthetic data via foundation model APIs 2: Text. Proceedings of Machine Learning Research, 235, 54531–54560. https://doi.org/10.48550/arXiv.2403.01749
Wang, W., Liang, X., Ye, R., Chai, J., Chen, S., & Wang, Y. (2024). KnowledgeSG: Privacy-preserving synthetic text generation with knowledge distillation from server. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 7677–7695). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.emnlp-main.438
Yan, B., Li, K., Xu, M., Dong, Y., Zhang, Y., Ren, Z., & Cheng, X. (2024). On protecting the data privacy of large language models (LLMs): A survey. arXiv. https://doi.org/10.48550/arXiv.2403.05156
Maddock, S., Sablayrolles, A., & Stock, P. (2023). CANIFE: Crafting canaries for empirical privacy measurement in federated learning. International Conference on Learning Representations. https://doi.org/10.48550/arXiv.2210.02912
Mireshghallah, F., Goyal, K., Uniyal, A., Berg-Kirkpatrick, T., & Shokri, R. (2022). Quantifying privacy risks of masked language models using membership inference attacks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (pp. 8332–8347). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.emnlp-main.570
Fu, W., Wang, H., Gao, C., Liu, G., Li, Y., & Jiang, T. (2024). Practical membership inference attacks against fine-tuned large language models via self-prompt calibration. Advances in Neural Information Processing Systems, 37. https://doi.org/10.48550/arXiv.2311.06062
Zhang, Z., Ma, J., Ma, X., Yang, R., Wang, X., & Zhang, J. (2024). Defending against membership inference attacks: RM learning is all you need. Information Sciences, 670, Article 120636. https://doi.org/10.1016/j.ins.2024.120636
Hu, L., Yan, A., Yan, H., Li, J., Huang, T., Zhang, Y., Dong, C., & Yang, C. (2023). Defenses to membership inference attacks: A survey. ACM Computing Surveys, 56(4), Article 92. https://doi.org/10.1145/3620667
Carlini, N., Tramèr, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., Roberts, A., Brown, T., Song, D., Erlingsson, U., Oprea, A., & Raffel, C. (2021). Extracting training data from large language models. In 30th USENIX Security Symposium (pp. 2633–2650). USENIX Association. https://doi.org/10.48550/arXiv.2012.07805
Ye, J., Maddi, A., Murakonda, S. K., Bindschaedler, V., & Shokri, R. (2022). Enhanced membership inference attacks against machine learning models. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (pp. 3093–3106). Association for Computing Machinery. https://doi.org/10.1145/3548606.3560675
Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level convolutional networks for text classification. Advances in Neural Information Processing Systems, 28, 649–657. https://doi.org/10.48550/arXiv.1509.01626
Cohan, A., Dernoncourt, F., Kim, D. S., Bui, T., Kim, S., Chang, W., & Goharian, N. (2018). A discourse-aware attention model for abstractive summarization of long documents. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) (pp. 615–621). Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-2097
Gupta, V., Bharti, P., Nokhiz, P., & Karnick, H. (2021). SumPubMed: Summarization dataset of PubMed scientific articles. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop (pp. 292–303). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-srw.30
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., ... Rush, A. (2020). Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 38–45). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-demos.6
Pillutla, K., Swayamdipta, S., Zellers, R., Thickstun, J., Welleck, S., Choi, Y., & Harchaoui, Z. (2021). MAUVE: Measuring the gap between neural text and human text using divergence frontiers. Advances in Neural Information Processing Systems, 34. https://doi.org/10.48550/arXiv.2102.01454

Details

Primary Language

English

Subjects

Electrical Engineering (Other)

Journal Section

Research Article

Authors

Lubana Isaoglu
Türkiye

Zeynep Orman ^*
Türkiye

Publication Date

June 6, 2026

Submission Date

March 16, 2026

Acceptance Date

June 1, 2026

Published in Issue

Year 2026 Volume: 6 Number: 3

DOI

https://doi.org/10.64808/engineeringperspective.1910777

IZ

https://izlik.org/JA33NY94FE

Cite

RIS / Bibtex

APA

Isaoglu, L., & Orman, Z. (2026). A Unified Evaluation Framework for Utility and Privacy Risks of LLM-Generated Synthetic Text Data. Engineering Perspective, 6(3), 456-466. https://doi.org/10.64808/engineeringperspective.1910777

AMA

1.Isaoglu L, Orman Z. A Unified Evaluation Framework for Utility and Privacy Risks of LLM-Generated Synthetic Text Data. engineeringperspective. 2026;6(3):456-466. doi:10.64808/engineeringperspective.1910777

Chicago

Isaoglu, Lubana, and Zeynep Orman. 2026. “A Unified Evaluation Framework for Utility and Privacy Risks of LLM-Generated Synthetic Text Data”. Engineering Perspective 6 (3): 456-66. https://doi.org/10.64808/engineeringperspective.1910777.

EndNote

Isaoglu L, Orman Z (June 1, 2026) A Unified Evaluation Framework for Utility and Privacy Risks of LLM-Generated Synthetic Text Data. Engineering Perspective 6 3 456–466.

IEEE

[1]L. Isaoglu and Z. Orman, “A Unified Evaluation Framework for Utility and Privacy Risks of LLM-Generated Synthetic Text Data”, engineeringperspective, vol. 6, no. 3, pp. 456–466, June 2026, doi: 10.64808/engineeringperspective.1910777.

ISNAD

Isaoglu, Lubana - Orman, Zeynep. “A Unified Evaluation Framework for Utility and Privacy Risks of LLM-Generated Synthetic Text Data”. Engineering Perspective 6/3 (June 1, 2026): 456-466. https://doi.org/10.64808/engineeringperspective.1910777.

JAMA

1.Isaoglu L, Orman Z. A Unified Evaluation Framework for Utility and Privacy Risks of LLM-Generated Synthetic Text Data. engineeringperspective. 2026;6:456–466.

MLA

Isaoglu, Lubana, and Zeynep Orman. “A Unified Evaluation Framework for Utility and Privacy Risks of LLM-Generated Synthetic Text Data”. Engineering Perspective, vol. 6, no. 3, June 2026, pp. 456-6, doi:10.64808/engineeringperspective.1910777.

Vancouver

1.Lubana Isaoglu, Zeynep Orman. A Unified Evaluation Framework for Utility and Privacy Risks of LLM-Generated Synthetic Text Data. engineeringperspective. 2026 Jun. 1;6(3):456-6. doi:10.64808/engineeringperspective.1910777