Mondrian Based Real Time Anonymization Model

İrem Civelek; Muhammed Ali Aydın

doi:10.35193/bseufbd.923267

Research Article

Mondrian Tabanlı Kimliksizleştirme Modeli

Year 2021, Volume: 8 Issue: 1, 472 - 483, 30.06.2021

İrem Civelek , Muhammed Ali Aydın

https://doi.org/10.35193/bseufbd.923267

Cited By: 1

Abstract

“Büyük Veri” olarak adlandırılan veri yığınlarında kişilere ait özel bilgilerin bulunması ifşa ataklarına karşı kişinin mahremiyetinin tehlikeye girmesine neden olmaktadır. Büyük veride kişi mahremiyetinin korunması için kimliksizleştirme yöntemleri ile kimliksiz veri oluşturup sistemlerde bu şekilde saklanması ve paylaşılması sağlanmaktadır. Fakat kimliksiz hale getirilen veride bilgi kaybı olduğu için veri eski haline döndürülememektedir. Bu çalışmanın amacı; büyük veri yığınları için anlık olarak kimliksizleştirme sağlayan ve sistemdeki veri yapısını bozmayan yeni bir yöntem oluşturmaktır. Çalışmada büyük veri yığınlarını işleyebilmek için Hadoop ekosistemi kullanılmıştır. Önerilen model ile kullanıcıdan gelen isteklerin ara katmanda bulunan servisler yardımı ile Hadoop ekosisteminde işlenmesi sağlanarak kimliksiz veri elde edilmesi sağlanmıştır. Kimliksizleştirme için kullanılan algoritma optimize edilerek kullanılmış ve literatürdeki algoritmalara göre avantajları kaydedilmiştir. Önerilen Modelle, kullanıcının sorgu çekmesi ve kimliksiz veri seti elde etmesi bakımından kullanıcı dostu olduğu görülmüştür. Analiz sonuçlarına göre, modelde kullanılan kimliksizleştirme algoritmasıyla işleme hızı bakımından diğer algoritmalara göre %40 verimli çalışan bir algoritma oluşturulmuştur.

Keywords

Kimliksizleştirme , Mahremiyet Koruma Modeli , Spark

References

Erdoğan, H., Küçük, K., & Khan, S. A. Endüstriyel IoT Bulut Uygulamaları için Düşük Maliyetli Modbus/MQTT Ağ Geçidi Tasarımı ve Gerçekleştirilmesi. Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Dergisi, 7(1), 170-183.
T., Nasser., & RS, T. (2015). Big Data Challenges. Journal of Computer Engineering & Information Technology, 4, 1-10.
Mehmood, A., Natgunanathan, I., Xiang, Y., Hua, G., & Guo, S. (2016). Protection of big data privacy. IEEE access, 4, 1821-1834. Hg.
Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 557-570.
Nergiz, M. E., Atzori, M., & Clifton, C. (2007, June). Hiding the presence of individuals from shared databases. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data (pp. 665-676).
Machanavajjhala, A., Kifer, D., Gehrke, J., & Venkitasubramaniam, M. (2007). l-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1), 3-es.
Venugopal, V., & Vigila, S. M. C. (2018). Implementing Big Data Privacy with MapReduce for Multidimensional Sensitive Data. International Journal of Applied Engineering Research, 13(15), 11824-11829.
Jadhav, R. H. (2018). Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud. INTERNATIONAL JOURNAL, 3(6).
Canbay, Y., Vural, Y., & Sagiroglu, S. (2018, December). Privacy preserving big data publishing. In 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT) (pp. 24-29). IEEE.
Goswami, P., & Madan, S. (2017). A survey on big data & privacy preserving publishing techniques. Advances in Computational Sciences and Technology, 10(3), 395-408.
Wang, L., Jajodia, S., & Wijesekera, D. (2004, May). Securing OLAP data cubes against privacy breaches. In IEEE Symposium on Security and Privacy, 2004. Proceedings. 2004 (pp. 161-175). IEEE.
Li, N., Li, T., & Venkatasubramanian, S. (2007, April). t-closeness: Privacy beyond k-anonymity and l-diversity. In 2007 IEEE 23rd International Conference on Data Engineering (pp. 106-115). IEEE.
Kohlmayer, F., Prasser, F., & Kuhn, K. A. (2015). The cost of quality: Implementing generalization and suppression for anonymizing biomedical data with minimal information loss. Journal of biomedical informatics, 58, 37-48.
Apache Hadoop. (2006). The Apache Software Foundation, https://hadoop.apache.org/ (25.03.2021).
Shvachko, K., Kuang, H., Radia, S., & Chansler, R. (2010, May). The hadoop distributed file system. In 2010 IEEE 26th symposium on mass storage systems and technologies (MSST) (pp. 1-10). Ieee.
Dean, J., & Ghemawat, S. (2004). MapReduce: Simplified data processing on large clusters.
APACHE HIVE. (2011). APACHE HIVE TM, https://hive.apache.org/ (18.04.2021).
Apache Impala. (2021). Apachecon,https://impala.apache.org/overview.html (18.04.2021).
Kornacker, M., Behm, A., Bittorf, V., Bobrovytsky, T., Ching, C., Choi, A., ... & Yoder, M. (2015, January). Impala: A Modern, Open-Source SQL Engine for Hadoop. In Cidr (Vol. 1, p. 9).
Spark Apache (2011). The Apache Software Foundation http://spark.apache.org/ (26.03.2021).
Sweeney, L. (1998). Datafly: A system for providing anonymity in medical data. In Database Security XI (pp. 356-381). Springer, Boston, MA.
Sweeney, L. (2002). Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 571-588.
Wang, K., Yu, P. S., & Chakraborty, S. (2004, November). Bottom-up generalization: A data mining solution to privacy protection. In Fourth IEEE International Conference on Data Mining (ICDM'04) (pp. 249-256). IEEE.
Fung, B. C., Wang, K., & Yu, P. S. (2005, April). Top-down specialization for information and privacy preservation. In 21st international conference on data engineering (ICDE'05) (pp. 205-216). IEEE.
LeFevre, K., DeWitt, D. J., & Ramakrishnan, R. (2006, April). Mondrian multidimensional k-anonymity. In 22nd International conference on data engineering (ICDE'06) (pp. 25-25). IEEE.
Wang, H., & Liu, R. (2009, March). Hiding distinguished ones into crowd: privacy-preserving publishing data with outliers. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (pp. 624-635).
Majeed, A. (2019). Attribute-centric anonymization scheme for improving user privacy and utility of publishing e-health data. Journal of King Saud University-Computer and Information Sciences, 31(4), 426-435.
Canbay, Y., Vural, Y., & SAĞIROĞLU, Ş. (2020). OAN: outlier record-oriented utility-based privacy preserving model.
Tortikar, P. (2019). K-Anonymization Implementation Using Apache Spark, Master of Science, North Dakota State University,Department of Computer Science, Fargo, North Dakota.
Ashkouti, F., & Sheikhahmadi, A. (2021). DI-Mondrian: Distributed improved Mondrian for satisfaction of the L-diversity privacy model using Apache Spark. Information Sciences, 546, 1-24.
GÜNDÜZ, H. WEKA Veri Madenciliği Yazılımının Sürümleri Arasındaki Kalite Değişimlerinin QMOOD ile İncelenmesi. Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Dergisi, 7(2).
Sezgin, E., & Çelik, Y. (2013). Veri madenciliğinde kayıp veriler için kullanılan yöntemlerin karşılaştırılması. Akademik Bilişim Konferansı, Akdeniz Üniversitesi, 23-25.
Adult Data Set (1994), The UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/datasets/Adult (26.03.2021).

Mondrian Based Real Time Anonymization Model

Year 2021, Volume: 8 Issue: 1, 472 - 483, 30.06.2021

İrem Civelek , Muhammed Ali Aydın

https://doi.org/10.35193/bseufbd.923267

Cited By: 1

Abstract

The presence of private information belonging to individuals in data heaps called "Big Data" causes the privacy of the person to be endangered against disclosure attacks. To protect personal privacy in big data, it is ensured that anonymous data is created, stored, and shared in systems with anonymization methods. However, de-identified data cannot be reinstatement. The aim of this study is to create a new method that provides instant disidentification and does not disrupt the data structure in the system. In the study, the Hadoop ecosystem was used to process large data heaps. With the proposed model, it has been ensured that the requests from the user are processed in the Hadoop ecosystem with the services in the middle layer, thus obtaining anonymous data. The algorithm used for disidentification is optimized and results are compared according to algorithms in the literature. With the proposed model, it has been observed that the user is user-friendly in terms of querying and obtaining an anonymous data set. According to the analysis results, an algorithm that works with 40% efficiency compared to other algorithms in terms of processing speed was created with the disidentification algorithm used in the model.

Keywords

Anonymization , Privacy Protection Model , Spark

References

Erdoğan, H., Küçük, K., & Khan, S. A. Endüstriyel IoT Bulut Uygulamaları için Düşük Maliyetli Modbus/MQTT Ağ Geçidi Tasarımı ve Gerçekleştirilmesi. Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Dergisi, 7(1), 170-183.
T., Nasser., & RS, T. (2015). Big Data Challenges. Journal of Computer Engineering & Information Technology, 4, 1-10.
Mehmood, A., Natgunanathan, I., Xiang, Y., Hua, G., & Guo, S. (2016). Protection of big data privacy. IEEE access, 4, 1821-1834. Hg.
Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 557-570.
Nergiz, M. E., Atzori, M., & Clifton, C. (2007, June). Hiding the presence of individuals from shared databases. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data (pp. 665-676).
Machanavajjhala, A., Kifer, D., Gehrke, J., & Venkitasubramaniam, M. (2007). l-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1), 3-es.
Venugopal, V., & Vigila, S. M. C. (2018). Implementing Big Data Privacy with MapReduce for Multidimensional Sensitive Data. International Journal of Applied Engineering Research, 13(15), 11824-11829.
Jadhav, R. H. (2018). Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud. INTERNATIONAL JOURNAL, 3(6).
Canbay, Y., Vural, Y., & Sagiroglu, S. (2018, December). Privacy preserving big data publishing. In 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT) (pp. 24-29). IEEE.
Goswami, P., & Madan, S. (2017). A survey on big data & privacy preserving publishing techniques. Advances in Computational Sciences and Technology, 10(3), 395-408.
Wang, L., Jajodia, S., & Wijesekera, D. (2004, May). Securing OLAP data cubes against privacy breaches. In IEEE Symposium on Security and Privacy, 2004. Proceedings. 2004 (pp. 161-175). IEEE.
Li, N., Li, T., & Venkatasubramanian, S. (2007, April). t-closeness: Privacy beyond k-anonymity and l-diversity. In 2007 IEEE 23rd International Conference on Data Engineering (pp. 106-115). IEEE.
Kohlmayer, F., Prasser, F., & Kuhn, K. A. (2015). The cost of quality: Implementing generalization and suppression for anonymizing biomedical data with minimal information loss. Journal of biomedical informatics, 58, 37-48.
Apache Hadoop. (2006). The Apache Software Foundation, https://hadoop.apache.org/ (25.03.2021).
Shvachko, K., Kuang, H., Radia, S., & Chansler, R. (2010, May). The hadoop distributed file system. In 2010 IEEE 26th symposium on mass storage systems and technologies (MSST) (pp. 1-10). Ieee.
Dean, J., & Ghemawat, S. (2004). MapReduce: Simplified data processing on large clusters.
APACHE HIVE. (2011). APACHE HIVE TM, https://hive.apache.org/ (18.04.2021).
Apache Impala. (2021). Apachecon,https://impala.apache.org/overview.html (18.04.2021).
Kornacker, M., Behm, A., Bittorf, V., Bobrovytsky, T., Ching, C., Choi, A., ... & Yoder, M. (2015, January). Impala: A Modern, Open-Source SQL Engine for Hadoop. In Cidr (Vol. 1, p. 9).
Spark Apache (2011). The Apache Software Foundation http://spark.apache.org/ (26.03.2021).
Sweeney, L. (1998). Datafly: A system for providing anonymity in medical data. In Database Security XI (pp. 356-381). Springer, Boston, MA.
Sweeney, L. (2002). Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 571-588.
Wang, K., Yu, P. S., & Chakraborty, S. (2004, November). Bottom-up generalization: A data mining solution to privacy protection. In Fourth IEEE International Conference on Data Mining (ICDM'04) (pp. 249-256). IEEE.
Fung, B. C., Wang, K., & Yu, P. S. (2005, April). Top-down specialization for information and privacy preservation. In 21st international conference on data engineering (ICDE'05) (pp. 205-216). IEEE.
LeFevre, K., DeWitt, D. J., & Ramakrishnan, R. (2006, April). Mondrian multidimensional k-anonymity. In 22nd International conference on data engineering (ICDE'06) (pp. 25-25). IEEE.
Wang, H., & Liu, R. (2009, March). Hiding distinguished ones into crowd: privacy-preserving publishing data with outliers. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (pp. 624-635).
Majeed, A. (2019). Attribute-centric anonymization scheme for improving user privacy and utility of publishing e-health data. Journal of King Saud University-Computer and Information Sciences, 31(4), 426-435.
Canbay, Y., Vural, Y., & SAĞIROĞLU, Ş. (2020). OAN: outlier record-oriented utility-based privacy preserving model.
Tortikar, P. (2019). K-Anonymization Implementation Using Apache Spark, Master of Science, North Dakota State University,Department of Computer Science, Fargo, North Dakota.
Ashkouti, F., & Sheikhahmadi, A. (2021). DI-Mondrian: Distributed improved Mondrian for satisfaction of the L-diversity privacy model using Apache Spark. Information Sciences, 546, 1-24.
GÜNDÜZ, H. WEKA Veri Madenciliği Yazılımının Sürümleri Arasındaki Kalite Değişimlerinin QMOOD ile İncelenmesi. Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Dergisi, 7(2).
Sezgin, E., & Çelik, Y. (2013). Veri madenciliğinde kayıp veriler için kullanılan yöntemlerin karşılaştırılması. Akademik Bilişim Konferansı, Akdeniz Üniversitesi, 23-25.
Adult Data Set (1994), The UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/datasets/Adult (26.03.2021).

There are 33 citations in total.

Details

Primary Language	English
Subjects	Engineering
Journal Section	Articles
Authors	İrem Civelek 0000-0002-8995-7161 Muhammed Ali Aydın 0000-0002-1846-6090
Publication Date	June 30, 2021
Submission Date	April 20, 2021
Acceptance Date	May 4, 2021
Published in Issue	Year 2021 Volume: 8 Issue: 1

Cite

APA	Civelek, İ., & Aydın, M. A. (2021). Mondrian Based Real Time Anonymization Model. Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Dergisi, 8(1), 472-483. https://doi.org/10.35193/bseufbd.923267

Cited By

Toward Privacy Preservation Using Clustering Based Anonymization: Recent Advances and Future Research Outlook

IEEE Access

https://doi.org/10.1109/ACCESS.2022.3175219

Article Files

Full Text