Archive for August, 2011

By Kato Mivule

The latest state-of-the-art in data privacy, proposed by Cynthia Dwork (2006).

Differential privacy enforces confidentiality by:

  • Returning perturbed aggregated query results from databases.

  • Such that users cannot discern if any data item has been altered or not.

  • An attacker cannot derive info about any data item in the database.

According to Dwork (2006):

  • Two databases D1 and D2 are considered similar, if they differ in only one element or row,

  • That is D1 Δ D2 = 1.

  • Therefore, a privacy granting procedure qn satisfies  –differential privacy if results to the same query run on database D1 and again run on database D2 should probabilistically be similar, and satisfy the following condition:

P[qn(D1) ∈ R] / P[qn(D2) ∈ R] ≤ exp()

  • Where D1and D2 are the two databases.
  • P is the probability of the perturbed query results D1and D2 respectively.

  • qn() is the privacy granting procedure (perturbation).

  • qn(D1) the privacy granting procedure on query results from database D1 .

  • qn(D2) the privacy granting procedure on query results from database D2 .

  • R is the perturbed query results from the databases D1and D2 respectively.

  • exp()the exponential epsilon value.

The probability of the perturbed query results qn(D1) divided by the probability of the perturbed query results qn(D2) should be less or equal to a exponential epsilon value.

That is to say, if we run the same query on database D1, and then run the same query again on database D2, our query results should probabilistically be similar.

If the condition can be satisfied in the existence or nonexistence of the most influential observation for a specific query, then this condition will also be satisfied for any other observation.

The effect of the most influential observation for a given query is Δf, assessed as follows:

Δf = Max||f(D1) – f(D1)|| for all possible observed values of D1 and D2

According to Dwork (2006), the results to a query are given as

  • f(x) + Laplace(0, b) noise addition

  • Where b = Δf/

  • x represents a particular observed value of the database

  • f(x) represents the true result to the query

  • Then the result would satisfy -differential privacy

  • The Δf must consider all possible observed values of D1 and D1

Differential Privacy Ilustration

Differential Privacy Ilustration


What is the GPA of students at Geek Nerd State University?

We compute the maximum possible difference between two databases that differ in exactly one record for a specific query.

Δf = Max||f(D1) – f(D2)||

Let Min GPA = 2.0 for smallest possible GPA

Let Max GPA = 4.0 for largest possible GPA

Δf = | Max GPA – Min GPA|

Δf = 2.0

The parameter b of the Laplace noise is set to Δf/ = 2.0/0.01 = 200

Laplace (0, 200) noise distribution.

Variance of the noise distribution = 2* 200^2 = 80000

A small value of 0.01 is chosen. Smaller yield greater privacy by the procedure.

However, utility risks degeneration with a much more smaller value of .

For example, at 0.0001, will give b as 20000, Laplace (0, 20000) noise distribution.

The unperturbed results of the query +

Noise from Laplace (0, 200) =

Perturbed query results satisfying -differential privacy.

SQL: SELECT GPA FROM Student + Laplace Noise (0, 200) = -differential query results.

Pros and Cons

  • Grants across-the-board Privacy.

  • Easy to implement with SQL for aggregated data publication.

  • Utility a challenge as statistical properties change with a much smaller .

  • Noise takes into account the outliers and most influential observation.

  • Example, income of Warren Buffet verses income of janitor in Omaha Nebraska.

  • Balance between Privacy and Utility still an NP-Hard challenge.


[1] C. Dwork, “Differential privacy,” in in ICALP, vol. 2, 2006, pp. 1-12. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

[2] K. Muralidhar and R. Sarathy, “Does differential privacy protect terry gross’ privacy?” in Privacy in Statistical Databases, ser. Lecture Notes in Computer Science, J. Domingo-Ferrer and E. Magkos, Eds.    Berlin, Heidelberg: Springer Berlin / Heidelberg, 2011, vol. 6344, ch. 18, pp. 200-209. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-15838-4_18

Read Full Post »

By Kato Mivule

  • Noise addition: A stochastic value is added to confidential numeric attributes.

  • The stochastic value is chosen from a normal distribution with zero mean and a diminutive standard deviation. First publicized by Jay Kim (1986) with the expression that

  • Z = X + ɛ

  • Where Z is the transformed data point

  • X is the original data point.

  • ɛ (epsilon)is the random variable (noise) with a distribution ε∼ N(0, σ2 ).

  • The X is then replaced with the Z for the data set to be published, Z = X + ɛ

Statistical Considerations in Noise Addition

Gaussian Noise Distribution

  • The Normal Distribution( Gaussian distribution), is a bell shaped probability distribution depicting real-valued stochastic variables clustered around a single mean…

  • Gaussian Noise Distribution
  • μ (mu) is the mean

  • σ2 (Sigma) is the variance

  • N(μ, σ2) is the normal distribution with mean μ and variance σ2

  • Transformed data has to keep the same statistical properties as the original data.
  • Covariance:Cov(X, Y): How affiliated the deviations between points X and Y.
  • Covariance
  • If Cov(X, Y) is positive, X and Y increase together, otherwise they don’t.
  • If Cov(X, Y) is zero, X and Y are each autonomous.
  • Correlation rxy calculates tendency of linear relation between two data points.
  • Correlation
  • If -1, then rxy is a negative linear relation between x and y,
  • if 0, no linear relation,
  • if +1, a strong linear relation.


[1] Jay Kim, A Method For Limiting Disclosure in Microdata Based Random Noise and Transformation, Proceedings of the Survey Research Methods, American Statistical Association, Pages 370-374, 1986.

[2] J. Domingo-ferrer, F. Sebe, and J. Castella-Roca, “On the security of noise addition for privacy in statistical databases,” in Privacy in Statistical Databases 2004, 2004, pp. 149-161. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

Read Full Post »

%d bloggers like this: