Week 3: Advanced SQL and Data Integration Flashcards

1
Q

REVERSED

CREATE TABLE Students(sid CHAR(20), name CHAR(20), login CHAR(10), age INTEGER, gpa REAL)

A

How do you create a table in SQL?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

REVERSED

DROP TABLE Students

A

How do you delete an entire table including schema in SQL?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

REVERSED

DELETE FROM Students

A

How do you delete the contents of a table but keep the relation in SQL?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

REVERSED

DELETE FROM Students
WHERE …

A

How do you delete certain rows from a table in SQL?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

REVERSED

ALTER TABLE Students
ADD COLUMN firstYear integer

A

How do you add an empty column to an SQL table?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

REVERSED

ALTER TABLE Students
DROP firstYear

A

How do you delete an attribute from an SQL table?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

REVERSED

INSERT INTO Students
VALUES (53688, ‘Smith’, ‘smith@ee’, 18, 3.2)

INSERT INTO Students(name, Sid, login, age, gpa)
VALUES (‘Smith’, 53688, ‘smith@ee’, 18, 3.2)

A

How do you add a row into an SQL table? (2 ways depending on the order)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

REVERSED

UPDATE Students
SET grade = grade*1.2
WHERE …

A

How do you change the value in an attribute for all or some tuples?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

REVERSED

CREATE TABLE test (sid INTEGER PRIMARY KEY, name VARCHAR(30), major VARCHAR(30))

CREATE TABLE test (sid INTEGER, name VARCHAR(30), major VARCHAR(30), PRIMARY KEY (sid))

A

How do you set a primary key when creating a table in SQL? (2 ways)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

REVERSED

CREATE TABLE Works_In(ssn CHAR(11), did INTEGER, since DATE,
PRIMARY KEY (ssn, did),
FOREIGN KEY (ssn)
REFERENCES Employees)

A

How do you set a foreign key when creating a table in SQL?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

REVERSED

organise a group of entity sets into a parent/child hierarchy

A

What are entity hierarchies?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

REVERSED

specify that the children of an entity do/don’t overlap

A

What are overlap constraints?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

REVERSED

instances of the children of an entity include all instances of their parent (ie cover it)

A

What are covering constraints?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

REVERSED

A view is just a relation, but we store a definition, rather than a set of tuples
Views can be used to present necessary information (or a summary) while hiding details in underlying relations

A

What is a view in SQL and what is it used for?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

REVERSED

CREATE VIEW YoungActiveStudents (name, grade)
AS SELECT S.name, E.grade
FROM Students S, Enrolled E
Where S.Sid = E.Sid and S.age <21

A

How do you create a view in SQL?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

REVERSED

DROP VIEW

A

How can you delete a view in SQL?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

REVERSED

a constraint is a relationship among data elements that the DBMS is required to enforce

A

What is an integrity constraint?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

REVERSED

Integrity constraints guard against accidental damage to the database, by ensuring that authorised changes do not result in a loss of data consistency

A

What is the purpose of integrity constraints?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

REVERSED

one that satisfies all specified ICs

A

What is a legal instance of a relation?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

REVERSED

  • Primary key
  • Foreign key (referential integrity)
  • Value-based
  • Tuple-based
A

What are the 4 types of integrity constraints?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

REVERSED

Means the primary key cannot be null

A

What is entity integrity?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

REVERSED

a key that could be the primary key but is not classified as the primary key

A

What is an alternate key?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

REVERSED

  • NO ACTION - delete/update is rejected
  • CASCADE - make the same changes to all tuples that refer to the updated/deleted tuple
  • SET NULL / SET DEFAULT - sets foreign key value of referencing tuple to NULL or a default value
A

What are the three ways of enforcing referential integrity in SQL?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

REVERSED

CREATE TABLE Enrolled(sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY(sid,cid), FOREIGN KEY(sid) REFERNCES Students
ON DELETE CASCADE
ON UPDATE SET DEFAULT)

A

How do you set the method of enforcing referential integrity when creating an SQL table?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

REVERSED

it will store the three characters followed by 2 spaces

A

What happens if you enter 3 characters into an attribute with CHAR(5)?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

REVERSED

defines constraints on the values of a particular attributes

A

What is the check clause?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

REVERSED

CREATE TABLE section( semester VARCHAR(6) CHECK(semester IN (‘Fall’, ‘Winter’, ‘Spring’, ‘Summer’)), year NUMERIC(4,0) CHECK(year>1990))

A

How do you add a check into an SQL table?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

REVERSED

CHECK(semester IN (‘Fall’, ‘Winter’, ‘Spring’, ‘Summer’) AND (year>1990))

A

What is a tuple based check?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

REVERSED

CHECK(time_slot_id IN (SELECT time_slot_id FROM time_slot))

A

What is a complex check clause?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

REVERSED

name VARCHAR(20) NOT NULL

A

How do you declare that an attribute can’t be null?

31
Q

REVERSED

  • 1 table for each entity
  • 1 table for each relationship
  • Each cell contains a single value
  • If you have big relations, decompose them
A

What is the normalisation theory for designing a good database? (4)

32
Q

REVERSED

means when you decompose a table and then join you get the original table back

A

What is lossless decomposition?

33
Q

REVERSED

let R a relation schema (with constraints). A decomposition R1, R2, … Rn of R is called lossless iff. for each valid relation (instance) r(R): r = piR1(r) join piR2(r) join …

A

What is the official definition of a lossless decomposition?

34
Q

REVERSED

Attribute(R1) UNION Attribute(R2) = Attribute(R)

Attribute(R1) INTERSECTION Attribute(R2) is not empty

Attribute(R1) INTERSECTION Attribute(R2) ->
Attribute(R1) or Attribute(R1) INTERSECTION Attribute(R2) -> Attribute(R2)

A

What are the 3 rules to identify lossless decomposition?

35
Q

REVERSED

  • the values of a set of attributes X determine the values of another set of attributes Y
  • Denoted by X -> Y (X determines Y)
A

What is functional dependence?

36
Q

REVERSED

K is a superkey for relation R iff. K -> R

A

What is a superkey in terms of functional dependence?

37
Q

REVERSED

K is a candidate key for R iff. K -> R AND for any X proper subset of K, X -/-> R

A

What is a candidate key in terms of functional dependence?

38
Q

REVERSED

functional dependencies allow us to express constraints that cannot be expressed using superkeys

A

What is the purpose of functional dependencies?

39
Q

REVERSED

The set of functional dependencies logically implied by F is the closure of F, denoted F+

ie. to find the closure B+, find all the attributes that can be determined starting with B

A

What is the closure of a set of functional dependencies?

40
Q

REVERSED

  • If X subset of Y, then Y -> X (reflexivity)
  • If X -> Y, then AX -> AY (augmentation) (A is a set of attributes)
  • If X -> Y and Y -> W, then X -> W (transitivity)
A

What are Armstrongs axioms? (3)

41
Q

REVERSED

The minimal set of attributes for which their closure is all the set of attributes in a relation

A

What is a candidate key in terms of closure?

42
Q

REVERSED

if X -> Y and X -> W, then X -> YW
if X -> YW, then X -> Y and X -> W
if X -> Y and WY -> Z, then XW -> Z
If X is a candidate key, then X -> Y for all Y

A

What functional dependencies can be derived from armstrongs axoims? (4)

43
Q

REVERSED

For relational schema R(XYZ), the following holds: If X -> Y then the decomposition R1(XY) R2(XZ) is lossless.

A

How do you check if a decomposition is lossless using functional dependencies?

44
Q

REVERSED

A functional dependency that holds only under some contraint

([country = UK, zip] -> street)

A

What is a conditional functional dependency? How is it expressed?

45
Q

REVERSED

  • personalisation: content adapted to user
  • Customisation: structure adapted to user
A

What are personalisation and customisation in terms of data?

46
Q

REVERSED

  • a set of local databases, each with a local schema and a local instance
  • A global integrated schema
  • A set of mappings between the global and local schema
A

What is an information integration system

47
Q

REVERSED

-leave the data in the sources
-When a query comes in:
* Determine their relevant sources to the query
* Break down the query into sub-queries for the sources
* Get the answers from the sources, filter them if needed and combine them appropriately

A

what is visual integration architecture / on demand integration?

48
Q

REVERSED

-extension of jaro-similarity that gives higher weight to matching prefix

Jw(S1, S2) = JaroSim + P*L*(1-JaroSim)

  • P is a scaling factor, 0.1 by default
  • L is length of common prefix up to max 4
A

What is jaro-winkler similarity?

49
Q

REVERSED

Text variations, local knowledge, evolving nature of data, new functionalities

A

What are reasons for discrepancies between entities? (4)

50
Q

REVERSED

identify the different structures/records that model the same real-world object

A

What is entity resolution?

51
Q

REVERSED

improves data quality and integrity, fosters re-use of existing data sources, optimises space

A

Why is entity resolution useful? (3)

52
Q

REVERSED

  • number of operations to convert from 1st string to 2nd string
  • delete and insert character with cost 1
  • substitute character with cost 2
A

What is the levenshtein edit distance?

53
Q

REVERSED

  • overcome limitation of edit distance with shortened strings (abbreviations)
  • Considered two extra operations: open gap and extend gap (with small cost)

Cost = insert + open + extend

A

What is the gap distance?

54
Q

REVERSED

JaroSim(S1, S2) = (1/3)*(C/|S1| + C/|S2| + (C-T)/C)

C=common/matching characters in S1 and S2
T=transpositions/2 (transposition=matching but different sequence order)

characters are considered matching when they are the same and not further than [max(|S1|, |S2|)/2]-1 apart

A

What is jaro-similarity?

55
Q

REVERSED

Represent document as a set of its k-shingles
Use jaccard set similarity

A

What is a similarity metric for documents?

56
Q

REVERSED

  • algorithm that indexes names by their sounds when pronounced in English
  • consists of the first letter of the name followed by three numbers
  • Remove all W, H
  • B,F,P,V encoded as 1
  • C,G,J,K,Q,S,X,Z encoded as 2
  • D,T encoded as 3
  • L encoded as 4
  • M,N encoded as 5
  • R encoded as 6
  • Remove vowels
A

What is the phonetic algorithm/sound encoding algorithm?

57
Q

REVERSED

-The Jaccard similarity of two sets is the size of their intersection divided by the size of their union:
sim(C1,C2) = |C1 intersection C2| / |C1 union C2|

A

What is jaccard similarity?

58
Q

REVERSED

d(C1,C2) = 1 - sim(C1,C2)

A

What is jaccard distance?

59
Q

REVERSED

Jaccard similarity but counts repetition of elements
Max is 0.5

A

What is jaccard bag similarity?

60
Q

REVERSED

-a sequence of k-tokens (characters or words) that appear in the document
E.g. for document D1 = abcab, Set of 2-shingles S(D1) = {ab, bc, ca}

A

What is a k-shingle / k-gram?

61
Q

REVERSED

  1. Shingling: convert documents to sets
  2. Min-hashing: convert large sets to short signatures, while preserving similarity
  3. Locality-sensitive hashing: focus on pairs of signatures likely to be from similar documents (candidate pairs)
A

What are the 3 steps for finding similar documents?

62
Q

REVERSED

  • get set of shingles
  • represent as a matrix where rows are shingles and columns are documents
  • matrix has 1 if the shingle is in the document
A

What is the process of shingling?

63
Q

REVERSED

no. of rows where both columns are 1 / no. of rows where either column is 1

A

How do you find the similarity of two sets(documents) from the shingle matrix?

64
Q

REVERSED

find similar columns while computing small signatures

A

What is the goal of hashing?

65
Q

REVERSED

“hash”each column C to a small signature h(C), such that:

  1. H(C) is small enough that the signature fits in RAM
  2. sim(C1,C2) is the same as the “similarity” of signatures h(C1) and h(C2)
A

What is the key idea of hashing?

66
Q

REVERSED

  • start with a permutation for each row of the boolean matrix
  • get the first occurring row in the permuted order that correlates to a 1 in the document and store this as the new column value in the Signature matrix
  • repeat for many permutations
A

How do you perform min-hashing?

67
Q

REVERSED

Define a minhash function h_pi(C) = the index of the first(in the permuted order pi) row which column C has value 1

A

How do you define a minhash function?

68
Q

REVERSED

Pr[h_pi(C1) = h_pi(C2)] = sim(C1, C2)
the probability that the minhash function for a random permutation of rows produces the same value for two sets equals the Jaccard similarity of those sets

A

What is the property of the minhash function?

69
Q

REVERSED

the fraction of the hash functions in which they agree (for which signatures have the same value)

A

What is the similarity of two signatures?

70
Q

REVERSED

Smaller expected error

A

What happens as the signatures get longer?

71
Q

REVERSED

generate from the collection of all elements (signatures) a small list of candidate pairs: pairs of elements whose similarity must be evaluated

A

What is locality sensitive hashing?

72
Q

REVERSED

Divide the signature matrix into b bands, consisting of r rows. When rows are the same in the band, they are hashed to the same bucket. They have b opportunities to be hashed to the same bucket
-signatures hashed to the same bucket are compared

A

How do you perform locality sensitive hashing?

73
Q

REVERSED

  1. Select the most appropriate attribute name(s) with respect to noise and distinctiveness
  2. Transform the corresponding value(s) into a blocking key (BK)
  3. For each BK, create one block that contains all entities having this BK in their transformation
    * works as a hash function - blocks on the equality of BKs
A

What is the standard blocking algorithm?