Statistical limitations of sensitive itemset hiding methods

Shalini, Jangra; Durga, Toshniwal; Chris, Clifton

doi:10.1007/s10489-023-04781-4

Statistical limitations of sensitive itemset hiding methods

Published: 20 July 2023

Volume 53, pages 24275–24292, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

209 Accesses
Explore all metrics

Abstract

Frequent Itemset Hiding has long been an area of study for privacy-preserving data mining. The goal is to alter a dataset so that it may be released without revealing particular sensitive aggregates (e.g., frequent itemsets or association rules.) Typically the approach is to remove items from transactions to reduce the support of the sensitive itemset(s) below a threshold, while minimizing the changes or impact on other frequent itemsets. In this paper, we ask if such hiding can be discovered: Do hiding methods lead to anomalies that suggest that a sensitive itemset likely existed in the dataset, and has been hidden? We show that a suppressed sensitive itemset may behave like an outlier among its neighboring itemsets after suppression, indicating that the dataset is likely altered. KL-divergence and $\chi ^2$-divergence are used to calculate the difference between expected and actual probability distributions of itemsets for observing anomalous behavior. Experimental results on four datasets show that suppressed sensitive itemsets often stand out as the most significant outlier in many cases, irrespective of the victim item selection method. We propose two defensive approaches that counter this attack.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Greedy Approach to Hide Sensitive Frequent Itemsets with Reduced Side Effects

A Frequent Itemset Hiding Toolbox

A Heuristic Approach for Sensitive Pattern Hiding with Improved Data Quality

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability and access

The datasets used during the current study are available on the GitHub platform link https://github.com/ShaliniJangra/Attack_Defense and online repository https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php.

References

Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Record 22(2):207–216
Article Google Scholar
Martin E, Hans-Peter K, Jörg S, Xiaowei X, Evangelos Si, Jiawei H, Usama MF (eds)(1996) A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96) Portland, Oregon, USA, p 226–231, http://www.aaai.org/Library/KDD/1996/kdd96-037.php
Tatti N (2008) Maximum entropy based significance of itemsets. Knowl Inf Syst 17(1):57–77
Article Google Scholar
Clifton C, Marks D (1996) Security and privacy implications of data mining, ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Citeseer p 15–19
Agrawal R, Srikant R (2000) Privacy-preserving data mining, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p 439–450
Oliveira SR, Zaiane Osmar R (2002) Privacy preserving frequent itemset mining, Proceedings of the IEEE international conference on Privacy, security and data mining, Australian Computer Society, Inc. Vol 14 p 43–54
Sharma S, Toshniwal D (2018) MR-I MaxMin-scalable two-phase border based knowledge hiding technique using MapReduce, Futur Gener Comput Syst
Daniel EOL (1991) Knowledge Discovery as a Threat to Database Security, Proceedings of the 1st International Conference on Knowledge Discovery and Databases, vol 107 p 516
Aggarwal CC, Philip SY (2008) A general survey of privacy-preserving data mining models and algorithms, Privacy-preserving data mining p 11–52
Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios V (1999) Disclosure limitation of sensitive rules, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX’99)(Cat. No. PR00453), IEEE p 45–52
Vaidya J, Clifton C (2002) Privacy preserving association rule mining in vertically partitioned data, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM p 639–644
Saygin Y, Verykios VS, Elmagarmid AK (2002) Privacy preserving association rule mining, Proceedings Twelfth International Workshop on Research Issues in Data Engineering: Engineering E-Commerce/E-Business Systems RIDE-2EC IEEE p 151–158
Moustakides GV, Verykios VS (2008) A maxmin approach for hiding frequent itemsets. Data & Knowledge Engineering 65(1):75–89
Article Google Scholar
Gkoulalas-Divanis A, Verykios VS (2006) An integer programming approach for frequent itemset hiding, Proceedings of the 15th ACM international conference on Information and knowledge management, p 748–757
Gkoulalas-Divanis A, Verykios VS (2008) Exact knowledge hiding through database extension. IEEE Transactions on Knowledge and Data Engineering 21(5):699–713
Article Google Scholar
Dinusha V, Peter C, Vassilios SV (2013) A taxonomy of privacy-preserving record linkage techniques. Info Syst 38(6):946–969. https://doi.org/10.1016/j.is.2012.11.005
Article Google Scholar
Verykios VS, Stavropoulos EC, Krasadakis P, Sakkopoulos E (2022) Frequent itemset hiding revisited: pushing hiding constraints into mining. Appl Intell 52(3):2539–2555
Article Google Scholar
Sun X, Yu PS (2007) Hiding sensitive frequent itemsets by a border-based approach. J Comput Sci Eng, Korean Institute of Information Scientists and Engineers 1(1):74–94
Google Scholar
Hong TP, Lin CW, Yang KT, Wang SL (2013) Using TF-IDF to hide sensitive itemsets. Appl Intell 38(4):502–510
Article Google Scholar
Amiri A (2007) Dare to share: Protecting sensitive knowledge with data sanitization. Decision Support Systems 43(1):181–191
Article Google Scholar
Lin CW Hong TP, Hsu HC (2014) Reducing side effects of hiding sensitive itemsets in privacy preserving data mining, Sci World J. vol 2014
Cheng P, Roddick JF, Chu SC, Lin CW (2016) Privacy preservation through a greedy, distortion-based rule-hiding method. Appl Intell. 44(2):295–306
Article Google Scholar
Lin CW, Hong TP, Yang KT, Wang SL (2015) The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion. Appl Intell 42(2):210–230
Article Google Scholar
Lin CW, Zhang B, Yang KT, Hong TP (2014) Efficiently hiding sensitive itemsets with transaction deletion based on genetic algorithms, Sci World J. vol 2014
Lin JCW, Liu Q, Fournier-Viger P, Hong TP, Voznak M, Zhan J (2016) A sanitization approach for hiding sensitive itemsets based on particle swarm optimization. Eng Appl Artif Intell 53:1–18
Article Google Scholar
Wu JMT, Zhan J, Lin JCW (2017) Ant colony system sanitization approach to hiding sensitive itemsets. IEEE Access 5:10024–10039
Article Google Scholar
Kullback S (1997) Information theory and statistics, Courier Corporation
Shannon CE (1948) A mathematical theory of communication. The Bell syst tech J, Nokia Bell Labs 27(3):379–423
Article MathSciNet MATH Google Scholar
Hawkins DM (1980) Identification of outliers vol 11
Ben-Gal I (2005) Outlier detection, Data mining and knowledge discovery handbook, p 131–146
Hodge V, Austin J (2004) A survey of outlier detection methodologies. Artif intell rev 22(2):85–126
Article MATH Google Scholar
Jangra S, Toshniwal D (2022) Efficient algorithms for victim item selection in privacy-preserving utility mining. Futur Gener Comput Syst 128:219–234
Article Google Scholar
Jangra S, Toshniowal D (2019) A Heuristic Approach for Sensitive Pattern Hiding with Improved Data Quality, International Workshop on New Frontiers in Mining Complex Patterns, p 21–35
Jangra S, Toshniwal D (2020) VIDPSO: Victim item deletion based PSO inspired sensitive pattern hiding algorithm for dense datasets. Inf Process Manag 57(5):102255
Article Google Scholar
Oliveira SRM, Zaiane OR (2003) Protecting sensitive knowledge by data sanitization, Third IEEE International conference on data mining, p 613–616
Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support, International Workshop on Information Hiding, p 369–383
Sharma S, Toshniwal D (2017) Scalable two-phase co-occurring sensitive pattern hiding using mapreduce. J. Big Data 4(1):1–18
Article Google Scholar
Rousseeuw PJ, Hubert M (2011) Robust statistics for outlier detection, Wiley interdisciplinary reviews: Data mining and knowledge discovery, Wiley Online. Library 1(1):73–79
Google Scholar
Zani S, Riani M, Corbellini A (1998) Robust bivariate boxplots and multiple outlier detection. Computational Statistics & Data Analysis 28(3):257–270
Article MATH Google Scholar
IBM Quest Synthetic Data Generator (2021) https://sourceforge.net/projects/ibmquestdatagen/ 20 Jun 2021
Fournier-Viger P (2021) SPMF: An Open-Source Data Mining Library, https://www.philippe-fournier-viger.com/spmf/, 29 Jul 2021
Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: Generalizing association rules to correlations, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p 265–276
Silverstein C, Brin S, Motwani R (1998) Beyond market baskets: Generalizing association rules to dependence rules. Data Min Knowl Discov 2(1):39–68
Article Google Scholar
Smets K, Vreeken J (2012) Slim: Directly mining descriptive patterns, Proceedings of the SIAM international conference on data mining, p 236–247
De Bie T (2011) Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Min Knowl Discov 23(3):407–446
Article MathSciNet MATH Google Scholar
Guns T, Nijssen S, De Raedt L (2011) k-Pattern set mining under constraints. IEEE Trans Knowl Data Eng 25(2):402–418
Article MATH Google Scholar
Smiti A (2020) A critical overview of outlier detection methods. Comput Sci Rev vol 38 p 100306

Download references

Funding

This work is funded by Science and Engineering Research Board (SERB), a statutory body under the Department of Science and Technology (DST), Government of India.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, IIT Roorkee, Uttarakhand, 247667, Roorkee, India
Jangra Shalini & Toshniwal Durga
Department of Computer Science and CERIAS, Purdue University, West Lafayette, IN, 47907-2107, USA
Jangra Shalini & Clifton Chris

Authors

Jangra Shalini
View author publications
You can also search for this author in PubMed Google Scholar
Toshniwal Durga
View author publications
You can also search for this author in PubMed Google Scholar
Clifton Chris
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Shalini Jangra: Conception and design of study, Acquisition of data, Coding and Implementation, Analysis and interpretation of Results, Writing- Original draft preparation, Review & Editing, Funding acquisition Durga Toshniwal: Review & Editing, Supervision Chris Clifton: Conceptualization, Methodology, Analysis and interpretation of Results, Review & Editing, Supervision

Corresponding author

Correspondence to Jangra Shalini.

Ethics declarations

Conflicts of interest

The authors have no competing interest to declare that are relevant to the content of this article.

Ethical and informed consent for data used

The work uses publicly available and synthetically generated datasets which do not have any identifiable information. No ethical approval was needed.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shalini, J., Durga, T. & Chris, C. Statistical limitations of sensitive itemset hiding methods. Appl Intell 53, 24275–24292 (2023). https://doi.org/10.1007/s10489-023-04781-4

Download citation

Accepted: 10 June 2023
Published: 20 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10489-023-04781-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical limitations of sensitive itemset hiding methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Greedy Approach to Hide Sensitive Frequent Itemsets with Reduced Side Effects

A Frequent Itemset Hiding Toolbox

A Heuristic Approach for Sensitive Pattern Hiding with Improved Data Quality

Data availability and access

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Statistical limitations of sensitive itemset hiding methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Greedy Approach to Hide Sensitive Frequent Itemsets with Reduced Side Effects

A Frequent Itemset Hiding Toolbox

A Heuristic Approach for Sensitive Pattern Hiding with Improved Data Quality

Explore related subjects

Data availability and access

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation