Abstract
In former days statistical of®ces used to publish only macrodata, i.e., tables. This was suf®cient to satisfy the demands of the users of statistical data. Nowadays, however, the users of statistical data want to have data that are as detailed as possible. Not only do they want more detailed tables, but they also want to have microdata, i.e., data for individual respondents. This is mainly due to the increased power of modern computers, which enables users of statistical data to analyze these microdata by themselves. As a consequence of the demand for microdata statistical of®ces are put in a dif®cult position. On the one hand it is their duty to satisfy this demand, on the other hand they should protect the privacy of their respondents. To achieve both aims, i.e., to satisfy the demand for microdata while protecting the privacy of the individual respondents, statistical of®ces apply certain protection measures. The measures are applied only when the privacy of some respondents is endangered. Information that is deemed safe is not protected in order to release as much information as possible. Examples of protection measures are global recoding, where several categories of a variable are combined into a single one, and local suppression, where a value of a In this article we assume that a safe microdata set has to be produced by a statistical of®ce, for release to external researchers. To check the safety of such a microdata set, we assume that the statistical of®ce checks the frequency of certain combinations of values. If a combination occurs frequently enough in the ®le, it is considered safe, otherwise unsafe. Unsafe combinations can be eliminated from the ®le by using techniques such as global recoding (ˆ combining several categories of a variable into a single one) and local suppression (ˆ replacing the value of a variable in a record by a missing value). In practice one ®rst applies global recodings interactively to reduce the initial number of unsafe combinations drastically. Possible remaining unsafe combinations in the microdata set are then eliminated automatically through the application of local suppressions. The present article concentrates on this second step, i.e., the elimination of unsafe combinations by local suppressions, in an optimal way. In particular several optimal local suppression models are formulated and studied. The aim of these models is to apply local suppression in an optimal way, under various constraints. All these local suppression models turn out to be set-covering problems
Original language | English |
---|---|
Pages (from-to) | 421-435 |
Journal | Journal of Official Statistics |
Volume | 14 |
Issue number | 4 |
Publication status | Published - 1998 |
Externally published | Yes |