By Kato Mivule
The latest state-of-the-art in data privacy, proposed by Cynthia Dwork (2006).
Differential privacy enforces confidentiality by:
Returning perturbed aggregated query results from databases.
Such that users cannot discern if any data item has been altered or not.
An attacker cannot derive info about any data item in the database.
According to Dwork (2006):
Two databases D1 and D2 are considered similar, if they differ in only one element or row,
That is D1 Δ D2 = 1.
Therefore, a privacy granting procedure qn satisfies –differential privacy if results to the same query run on database D1 and again run on database D2 should probabilistically be similar, and satisfy the following condition:
P[qn(D1) ∈ R] / P[qn(D2) ∈ R] ≤ exp()
- Where D1and D2 are the two databases.
P is the probability of the perturbed query results D1and D2 respectively.
qn() is the privacy granting procedure (perturbation).
qn(D1) the privacy granting procedure on query results from database D1 .
qn(D2) the privacy granting procedure on query results from database D2 .
R is the perturbed query results from the databases D1and D2 respectively.
exp()the exponential epsilon value.
The probability of the perturbed query results qn(D1) divided by the probability of the perturbed query results qn(D2) should be less or equal to a exponential epsilon value.
That is to say, if we run the same query on database D1, and then run the same query again on database D2, our query results should probabilistically be similar.
If the condition can be satisfied in the existence or nonexistence of the most influential observation for a specific query, then this condition will also be satisfied for any other observation.
The effect of the most influential observation for a given query is Δf, assessed as follows:
Δf = Max||f(D1) – f(D1)|| for all possible observed values of D1 and D2
According to Dwork (2006), the results to a query are given as
f(x) + Laplace(0, b) noise addition
Where b = Δf/
x represents a particular observed value of the database
f(x) represents the true result to the query
Then the result would satisfy -differential privacy
The Δf must consider all possible observed values of D1 and D1
What is the GPA of students at Geek Nerd State University?
We compute the maximum possible difference between two databases that differ in exactly one record for a specific query.
Δf = Max||f(D1) – f(D2)||
Let Min GPA = 2.0 for smallest possible GPA
Let Max GPA = 4.0 for largest possible GPA
Δf = | Max GPA – Min GPA|
Δf = 2.0
The parameter b of the Laplace noise is set to Δf/ = 2.0/0.01 = 200
Laplace (0, 200) noise distribution.
Variance of the noise distribution = 2* 200^2 = 80000
A small value of 0.01 is chosen. Smaller yield greater privacy by the procedure.
However, utility risks degeneration with a much more smaller value of .
For example, at 0.0001, will give b as 20000, Laplace (0, 20000) noise distribution.
The unperturbed results of the query +
Noise from Laplace (0, 200) =
Perturbed query results satisfying -differential privacy.
SQL: SELECT GPA FROM Student + Laplace Noise (0, 200) = -differential query results.
Pros and Cons
Grants across-the-board Privacy.
Easy to implement with SQL for aggregated data publication.
Utility a challenge as statistical properties change with a much smaller .
Noise takes into account the outliers and most influential observation.
Example, income of Warren Buffet verses income of janitor in Omaha Nebraska.
Balance between Privacy and Utility still an NP-Hard challenge.
 C. Dwork, “Differential privacy,” in in ICALP, vol. 2, 2006, pp. 1-12. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.83.7534
 K. Muralidhar and R. Sarathy, “Does differential privacy protect terry gross’ privacy?” in Privacy in Statistical Databases, ser. Lecture Notes in Computer Science, J. Domingo-Ferrer and E. Magkos, Eds. Berlin, Heidelberg: Springer Berlin / Heidelberg, 2011, vol. 6344, ch. 18, pp. 200-209. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-15838-4_18