Data Anonymisation Flashcards
(17 cards)
What is Data Anonymisation?
Removing personally identifiable info (PII) from data
Why would you do Data Anonymisation?
Preserve privacy on analytical data (e.g. student grades)
What are Key Attributes?
Things that identify a user, e.g. name, passport number
What are Quasi-identifiers?
Things that, when combined, identify a user (e.g. date of birth and postcode)
Name 4 Data Anonymisation techniques?
k-anonymity • l-diversity • t-closeness • differential privacy
Where databases have quasi-identifiers, K-Anonymity should be quality. What is K-Anonymity?
Database entries have at least k-1 records that have the same quasi-identifier values. K is a chosen value, the higher the better
What are two ways can you achieve K-Anonymity?
Generalisation (e.g. do age brackets instead of exact ages) and Suppression (e.g. hiding part of a student ID)
Why may you use Suppression over Generalisation?
Generalisation (e.g. age brackets) may remove important information
What are 2 problems with k-anonymity?
1) The attacker has background knowledge of the users and can still tell who is who 2) The sensitive data has no variance (e.g. all 20-29 age people have Heart Disease)
What is L-Diversity?
A database has L different values for the quasi-identifier classes. L is a chosen number
What are 2 problems with L-Diversity?
Skewness Attacks (one trait is more likely than another, so you can guess) and Similarity Attack (list of different stomach diseases, I don’t know which one the patient has exactly, but it’s one of them) can happen
What is T-Closeness?
Distribution of a certain atribute (e.g. heart disease) in groups of data should be as similar to the overall distrubution
What is Differential Privacy?
Differential privacy guarantees that the output of a query on a database does not depend on ________
K-Anonymity only prevents _____?
K-Anonymity only prevents Identity Disclosure
L-Diversity does not protect from ____?
L-Diversity does not protect from Attribute Disclosure
T-Closeness protects against _____?
T-Closeness protects against Attribute Disclosure
Differential privacy guarantees that the output of a query on a database does not depend on ________
Differential privacy guarantees that the output of a query on a database does not depend on who is in the query