Midterm Flashcards

(49 cards)

1
Q

A -> B
B -> C
so Link A-> C

A

Transitive Closure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

After an ER process links a cluster of references, only one reference is retained

A

Survivor Record MDM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Survivor EIS where you select one ‘best’ record from the cluster to represent the identity

A

Best Reference Style

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Survivor EIS where you create a new record from the best parts of records in the cluster

A

Exemplar Reference Style

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  • Master identifier
  • Identity attributes
  • Application information
A

Components of MDM architecture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What MDM architecture is good for cybersecurity?

A

External reference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What MDM architecture stores no identify information in the IKB?

A

External reference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What MDM architecture components are stored in IKB in external reference?

A

Master identifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What MDM architecture components are stored in IKP in registry?

A

Master identifier and Identity attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What MDM architecture is the registry schematic with the cross walk added?

A

Reconciliation Engine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What MDM architecture stores all components in the IKB?

A

Transaction Hub

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What MDM architecture components are stored in the IKB in Transaction Hub?

A

Master identifier, identity attributes, and application information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the methods of updating IKB in MDM?

A

Automatic and manual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the output of Entity Resolution?

A

A link index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

1 cause of data quality issues?

A

Multiple sources of the same information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Three parts of entity-based data integration

A
  1. Standardization
  2. Entity Resolution
  3. Rationalization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

References to the same entity are called…

A

Equivalent References

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Every entity reference in an information system is created with the intention to reference one, and only one, real-world entity.

A

Unique Reference Assumption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The higher the degree of similarity between two entity references, the higher the probability the references are
equivalent, and the less similar, the less likely they are equivalent

A

Reference Similarity Assumption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Precision Formula

21
Q

Recall Formula

22
Q

F-Measure Formula

A

F= (2 x P x R)/(P + R)

23
Q

Accuracy Formula

A

(TP+TN)/(TP+FP+TN+FN) = (TP+TN)/D

24
Q

How do you calculate unordered pairs?

25
CSRUD Model
• Capture of Entity Identity Information • Store and Share Entity Identity Information • Resolve and Retrieve Entity Identifiers • Update Entity Identity Information • Dispose (Retire) Entity Identity Information
26
Capture Phase Activities
1. Assess the data quality of each identity source, plan the cleansing and standardization processes 2. Profile and select candidates for primary and supporting identity attributes 3. Setup entity identity integrity assessment methods, 4. Craft the matching rules or build ML model 5. Evaluate and refine the rules/model to acceptable levels of false positive and false negative error 6. Develop blocking strategy
27
biggest enemies of ER
inconsistent representation of values and missing values
28
Measures of discrimination power
• Attribute Uniqueness • Attribute Entropy • Attribute Weight
29
Attribute Uniqueness Formula
𝐶𝑜𝑢𝑛𝑡 𝑜𝑓 𝑈𝑛𝑖𝑞𝑢𝑒 𝑁𝑜𝑛𝑁𝑢𝑙𝑙 𝑉𝑎𝑙𝑢𝑒s /𝐶𝑜𝑢𝑛𝑡 𝑜𝑓 𝑁𝑜𝑛𝑁𝑢𝑙𝑙 𝑉𝑎𝑙𝑢𝑒
30
What are the three levels of ER matching?
1. Attribute level 2. Record level 3. Cluster level
31
What do you call and algo to compare attribute values?
Comparator
32
What are the three types of similarity functions for string values?
1. Approximate Syntatic Match (ASM) 2. Approximate Semantic Match 3. Phonetic Match
33
The minimum number of single character changes that will transform one string into the other
Levenshtein Edit Distance
34
Same as Levensthein, but allows one additional string manipulation, transpose adjacent characters
Damereau-Levenshtein Edit Distance
35
Based on the number of characters in common and number of transpositions between two strings S1 and S2
Jaro String Comparator
36
Modification of Jaro Comparator which gives added weight to the first four prefix characters
Jaro-Winkler
37
Replace 'A', E', 'I', 'O', 'U', 'H', 'W', 'Y‘ with “0” (zero) after the first letter, change letters to digits
Soundex Algorithm
38
Tries to measure similarity according to linguistic meaning rather than by character structure.
Approximate Semantic Match
39
Two types of supervised MDM
1. Bring-Your-Own-Identifier MDM 2. Once-and-Done MDM
40
Two types of unsupervised MDM
1. Survivor Record MDM 2. Full-Context MDM
41
Two types of record updates in MDM
1. Automated (Unsupervised) 2. Manual (Supervised)
42
Two types of assertions
1. Correction affirmation 2. Confirmation affirmation
43
Type of assertion to correct the error that two structures are false negatives of each other
Structure-to-Structure
44
Type of assertion to correct the error that a structure has references to more than one identity
Structure-split Assertion
45
Type of assertion which corrects both a false positive and false negative in one operation
Reference-transfer Assertion
46
Asserts that a structure has been reviewed and found to be a true positive
True Positive Assertion
47
Asserts two or more structures have been reviewed and found to be true negatives
True Negative Assertion
48
Which type of assertion creates a cluster from specific set of input references
Reference to Reference Assertion
49
Which type of assertion adds a specific set of input references to a specific structure
Reference-to-Structure Assertion