Information Systems Research
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH
 QUICK SEARCH:   [advanced]


     


INFORMATION SYSTEMS RESEARCH,
Published online in Articles in Advance, June 20, 2008
DOI: 10.1287/isre.1070.0161
This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Google Scholar
Right arrow Articles by Parssian, A.
Right arrow Articles by Jacob, V. S.

Impact of the Union and Difference Operations on the Quality of Information Products

Amir Parssian, Sumit Sarkar, Varghese S. Jacob

Department of Information Systems, Instituto de Empresa Business School, Madrid 28006, Spain
School of Management, University of Texas at Dallas, Richardson, Texas 75080
School of Management, University of Texas at Dallas, Richardson, Texas 75080

amir.parssian{at}ie.edu
sumit{at}utd.edu
vjacob{at}utd.edu

Information derived from relational databases is routinely used for decision making. However, little thought is usually given to the quality of the source data, its impact on the quality of the derived information, and how this in turn affects decisions. To assess quality, one needs a framework that defines relevant metrics that constitute the quality profile of a relation, and provides mechanisms for their evaluation. We build on a quality framework proposed in prior work, and develop quality profiles for the result of the primitive relational operations Difference and Union. These operations have nuances that make both the classification of the resulting records as well as the estimation of the different classes quite difficult to address, and very different from that for other operations. We first determine how tuples appearing in the results of these operations should be classified as accurate, inaccurate or mismember, and when tuples that should appear do not (called incomplete) in the result. Although estimating the cardinalities of these subsets directly is difficult, we resolve this by decomposing the problem into a sequence of drawing processes, each of which follows a hyper-geometric distribution. Finally, we discuss how decisions would be influenced based on the resulting quality profiles.

Key Words: information quality framework; relational data model; probability calculus; hyper-geometric distributions; database marketing
History: This paper was received on February 28, 2006.





HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH
Copyright © 2008 by INFORMS.