Information Systems Research
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


INFORMATION SYSTEMS RESEARCH
Vol. 16, No. 3, September 2005, pp. 256-270
DOI: 10.1287/isre.1050.0056
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Menon, S.
Right arrow Articles by Mukherjee, S.
Right arrow Search for Related Content

Maximizing Accuracy of Shared Databases when Concealing Sensitive Patterns

Syam Menon, Sumit Sarkar, Shibnath Mukherjee

School of Management, University of Texas at Dallas, Richardson, Texas 75083
School of Management, University of Texas at Dallas, Richardson, Texas 75083
School of Management, University of Texas at Dallas, Richardson, Texas 75083

syam{at}utdallas.edu
sumit{at}utdallas.edu
sxm038300{at}utdallas.edu

The sharing of databases either within or across organizations raises the possibility of unintentionally revealing sensitive relationships contained in them. Recent advances in data-mining technology have increased the chances of such disclosure. Consequently, firms that share their databases might choose to hide these sensitive relationships prior to sharing. Ideally, the approach used to hide relationships should be impervious to as many data-mining techniques as possible, while minimizing the resulting distortion to the database. This paper focuses on frequent item sets, the identification of which forms a critical initial step in a variety of data-mining tasks. It presents an optimal approach for hiding sensitive item sets, while keeping the number of modified transactions to a minimum. The approach is particularly attractive as it easily handles databases with millions of transactions. Results from extensive tests conducted on publicly available real data and data generated using IBM’s synthetic data generator indicate that the approach presented is very effective, optimally solving problems involving millions of transactions in a few seconds.

Key Words: data quality; privacy; item set mining
History: This paper was received on June 18, 2004.


This article has been cited by other articles:


Home page
Information Systems ResearchHome page
R. Garfinkel, R. Gopal, and S. Thompson
Releasing Individually Identifiable Microdata with Privacy Protection Against Stochastic Threat: An Application to Health Information
Information Systems Research, March 1, 2007; 18(1): 23 - 41.
[Abstract] [PDF]


Home page
Management ScienceHome page
S. Menon and S. Sarkar
Minimizing Information Loss and Preserving Privacy
Management Science, January 1, 2007; 53(1): 101 - 116.
[Abstract] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2005 by INFORMS.