Data Anonymisation Flashcards

(17 cards)

1
Q

What is Data Anonymisation?

A

Removing personally identifiable info (PII) from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why would you do Data Anonymisation?

A

Preserve privacy on analytical data (e.g. student grades)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are Key Attributes?

A

Things that identify a user, e.g. name, passport number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are Quasi-identifiers?

A

Things that, when combined, identify a user (e.g. date of birth and postcode)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Name 4 Data Anonymisation techniques?

A

k-anonymity • l-diversity • t-closeness • differential privacy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Where databases have quasi-identifiers, K-Anonymity should be quality. What is K-Anonymity?

A

Database entries have at least k-1 records that have the same quasi-identifier values. K is a chosen value, the higher the better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are two ways can you achieve K-Anonymity?

A

Generalisation (e.g. do age brackets instead of exact ages) and Suppression (e.g. hiding part of a student ID)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why may you use Suppression over Generalisation?

A

Generalisation (e.g. age brackets) may remove important information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are 2 problems with k-anonymity?

A

1) The attacker has background knowledge of the users and can still tell who is who 2) The sensitive data has no variance (e.g. all 20-29 age people have Heart Disease)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is L-Diversity?

A

A database has L different values for the quasi-identifier classes. L is a chosen number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are 2 problems with L-Diversity?

A

Skewness Attacks (one trait is more likely than another, so you can guess) and Similarity Attack (list of different stomach diseases, I don’t know which one the patient has exactly, but it’s one of them) can happen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is T-Closeness?

A

Distribution of a certain atribute (e.g. heart disease) in groups of data should be as similar to the overall distrubution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Differential Privacy?

A

Differential privacy guarantees that the output of a query on a database does not depend on ________

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

K-Anonymity only prevents _____?

A

K-Anonymity only prevents Identity Disclosure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

L-Diversity does not protect from ____?

A

L-Diversity does not protect from Attribute Disclosure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

T-Closeness protects against _____?

A

T-Closeness protects against Attribute Disclosure

17
Q

Differential privacy guarantees that the output of a query on a database does not depend on ________

A

Differential privacy guarantees that the output of a query on a database does not depend on who is in the query