The emerging public awareness and government regulations of data privacy
motivate new paradigms of collecting and analyzing data transparent and
acceptable to data owners. We present a new concept of privacy and
corresponding data formats, mechanisms, and tradeoffs for privatizing data
during data collection. The privacy, named Interval Privacy, enforces the raw
data conditional distribution on the privatized data to be the same as its
unconditional distribution over a nontrivial support set. Correspondingly, the
proposed privacy mechanism will record each data value as a random interval
containing it. The proposed interval privacy mechanisms can be easily deployed
through most existing survey-based data collection paradigms, e.g., by asking a
respondent whether its data value is within a randomly generated range. Another
unique feature of interval mechanisms is that they obfuscate the truth but not
distort it. The way of using narrowed range to convey information is
complementary to the popular paradigm of perturbing data. Also, the interval
mechanisms can generate progressively refined information at the discretion of
individual respondents. We study different theoretical aspects of the proposed
privacy. In the context of supervised learning, we also offer a method such
that existing supervised learning algorithms designed for point-valued data
could be directly applied to learning from interval-valued data.

By admin